Using HDFS CLIΒΆ
Let us understand how to use HDFS CLI to interact with HDFS.
Typically the cluster contain 3 types of nodes.
Gateway nodes or client nodes or edge nodes
Master nodes
Worker nodes
Developers like us will typically have access to Gateway nodes or Client nodes.
We can connect to Gateway nodes or Client nodes using SSH.
Once login, we can interact with HDFS either by using
hadoop fs
orhdfs dfs
. Both of them are aliases to each other.hadoop
have other subcommands thanfs
and is typically used to interact with HDFS or Map Reduce as developers.hdfs
have other subcommands thandfs
. It is typically used to not only manage files in HDFS but also administrative tasks related HDFS components such as Namenode, Secondary Namenode, Datanode etc.As deveopers, our scope will be limited to use
hdfs dfs
orhadoop fs
to interact with HDFS.Both have sub commands and each of the sub command take additional control arguments. Let us understand the structure by taking the example of
hdfs dfs -ls -l -S -r /public
.hdfs
is the main command to manage all the components of HDFS.dfs
is the sub command to manage files in HDFS.-ls
is the file system command to list files in HDFS.-l -S -r
are control arguments for-ls
to control the run time behavior of the command./public
is the argument for the-ls
command. It is path in HDFS. You will understad as you get into the details.
%%sh
hadoop
%%sh
hadoop fs -usage
%%sh
hdfs
%%sh
hdfs dfs -usage