Using HDFS CLIΒΆ

Let us understand how to use HDFS CLI to interact with HDFS.

  • Typically the cluster contain 3 types of nodes.

    • Gateway nodes or client nodes or edge nodes

    • Master nodes

    • Worker nodes

  • Developers like us will typically have access to Gateway nodes or Client nodes.

  • We can connect to Gateway nodes or Client nodes using SSH.

  • Once login, we can interact with HDFS either by using hadoop fs or hdfs dfs. Both of them are aliases to each other.

  • hadoop have other subcommands than fs and is typically used to interact with HDFS or Map Reduce as developers.

  • hdfs have other subcommands than dfs. It is typically used to not only manage files in HDFS but also administrative tasks related HDFS components such as Namenode, Secondary Namenode, Datanode etc.

  • As deveopers, our scope will be limited to use hdfs dfs or hadoop fs to interact with HDFS.

  • Both have sub commands and each of the sub command take additional control arguments. Let us understand the structure by taking the example of hdfs dfs -ls -l -S -r /public.

    • hdfs is the main command to manage all the components of HDFS.

    • dfs is the sub command to manage files in HDFS.

    • -ls is the file system command to list files in HDFS.

    • -l -S -r are control arguments for -ls to control the run time behavior of the command.

    • /public is the argument for the -ls command. It is path in HDFS. You will understad as you get into the details.

%%sh

hadoop
%%sh

hadoop fs -usage
%%sh

hdfs
%%sh

hdfs dfs -usage