Using HDFS CLIΒΆ
Let us understand how to use HDFS CLI to interact with HDFS.
Typically the cluster contain 3 types of nodes.
Gateway nodes or client nodes or edge nodes
Master nodes
Worker nodes
Developers like us will typically have access to Gateway nodes or Client nodes.
We can connect to Gateway nodes or Client nodes using SSH.
Once login, we can interact with HDFS either by using
hadoop fsorhdfs dfs. Both of them are aliases to each other.hadoophave other subcommands thanfsand is typically used to interact with HDFS or Map Reduce as developers.hdfshave other subcommands thandfs. It is typically used to not only manage files in HDFS but also administrative tasks related HDFS components such as Namenode, Secondary Namenode, Datanode etc.As deveopers, our scope will be limited to use
hdfs dfsorhadoop fsto interact with HDFS.Both have sub commands and each of the sub command take additional control arguments. Let us understand the structure by taking the example of
hdfs dfs -ls -l -S -r /public.hdfsis the main command to manage all the components of HDFS.dfsis the sub command to manage files in HDFS.-lsis the file system command to list files in HDFS.-l -S -rare control arguments for-lsto control the run time behavior of the command./publicis the argument for the-lscommand. It is path in HDFS. You will understad as you get into the details.
%%sh
hadoop
%%sh
hadoop fs -usage
%%sh
hdfs
%%sh
hdfs dfs -usage