Listing HDFS FilesΒΆ

Now let us walk through different options we have with hdfs ls command to list the files.

  • We can get usage by running hdfs dfs -usage ls.

%%sh

hdfs dfs -usage ls
  • We can get help using hdfs dfs -help ls

%%sh

hdfs dfs -help ls
  • Let us list all the files in /public/nyse_all/nyse_data folder. It is one of the public data sets that are available under /public. By default files and folders are sorted in ascending order by name.

%%sh

hdfs dfs -ls /public/nyse_all/nyse_data
%%sh

hdfs dfs -ls -r /public/nyse_all/nyse_data
  • We can sort the files and directories by time using -t option. By default you will see latest files at top. We can reverse it by using -t -r.

%%sh

hdfs dfs -ls -t /public/nyse_all/nyse_data
%%sh

hdfs dfs -ls -t -r /public/nyse_all/nyse_data
  • We can sort the files and directories by size using -S. By default, the files will be sorted in descending order by size. We can reverse the sorting order using -S -r.

%%sh

hdfs dfs -ls -S /public/nyse_all/nyse_data
%%sh

hdfs dfs -ls -S -r /public/nyse_all/nyse_data
%%sh

hdfs dfs -ls -h /public/nyse_all/nyse_data
%%sh

hdfs dfs -ls -h -t /public/nyse_all/nyse_data
%%sh

hdfs dfs -ls -h -S /public/nyse_all/nyse_data
%%sh

hdfs dfs -ls -h -S -r /public/nyse_all/nyse_data