Previewing data in HDFS FilesΒΆ

Let us see how we can preview the data in HDFS.

  • If we are dealing with files contain text data (files of text file format), we can preview contents of the files using different commands as -tail, -cat etc.

  • -tail can be used to preview last 1 KB of the file

  • -cat can be used to print the whole contents of the file on the screen. Be careful while using -cat as it will take a while for even medium sized files.

  • If you want to get first few lines from file you can redirect output of hadoop fs -cat or hdfs dfs -cat to Linux more command

%%sh

hdfs dfs -ls /user/${USER}/retail_db
Found 6 items
drwxr-xr-x   - itversity students          0 2021-01-17 20:05 /user/itversity/retail_db/categories
drwxr-xr-x   - itversity students          0 2021-01-17 20:05 /user/itversity/retail_db/customers
drwxr-xr-x   - itversity students          0 2021-01-17 20:04 /user/itversity/retail_db/departments
drwxr-xr-x   - itversity students          0 2021-01-17 20:03 /user/itversity/retail_db/order_items
drwxr-xr-x   - itversity students          0 2021-01-17 20:03 /user/itversity/retail_db/orders
drwxr-xr-x   - itversity students          0 2021-01-17 20:04 /user/itversity/retail_db/products
%%sh

hdfs dfs -ls -R /user/${USER}/retail_db
drwxr-xr-x   - itversity students          0 2021-01-17 20:05 /user/itversity/retail_db/categories
-rw-r--r--   2 itversity students       1029 2021-01-17 20:05 /user/itversity/retail_db/categories/part-00000
drwxr-xr-x   - itversity students          0 2021-01-17 20:05 /user/itversity/retail_db/customers
-rw-r--r--   2 itversity students     953719 2021-01-17 20:05 /user/itversity/retail_db/customers/part-00000
drwxr-xr-x   - itversity students          0 2021-01-17 20:04 /user/itversity/retail_db/departments
-rw-r--r--   2 itversity students         60 2021-01-17 20:04 /user/itversity/retail_db/departments/part-00000
drwxr-xr-x   - itversity students          0 2021-01-17 20:03 /user/itversity/retail_db/order_items
-rw-r--r--   2 itversity students    5408880 2021-01-17 20:03 /user/itversity/retail_db/order_items/part-00000
drwxr-xr-x   - itversity students          0 2021-01-17 20:03 /user/itversity/retail_db/orders
-rw-r--r--   2 itversity students    2999944 2021-01-17 20:03 /user/itversity/retail_db/orders/part-00000
drwxr-xr-x   - itversity students          0 2021-01-17 20:04 /user/itversity/retail_db/products
-rw-r--r--   2 itversity students     174155 2021-01-17 20:04 /user/itversity/retail_db/products/part-00000
%%sh

hdfs dfs -put -f /data/retail_db /user/${USER}/
%%sh

hdfs dfs -help tail
-tail [-f] <file> :
  Show the last 1KB of the file.
                                             
  -f  Shows appended data as the file grows. 
%%sh

hdfs dfs -tail /user/${USER}/retail_db/orders/part-00000
014-06-12 00:00:00.0,4229,PENDING
68861,2014-06-13 00:00:00.0,3031,PENDING_PAYMENT
68862,2014-06-15 00:00:00.0,7326,PROCESSING
68863,2014-06-16 00:00:00.0,3361,CLOSED
68864,2014-06-18 00:00:00.0,9634,ON_HOLD
68865,2014-06-19 00:00:00.0,4567,SUSPECTED_FRAUD
68866,2014-06-20 00:00:00.0,3890,PENDING_PAYMENT
68867,2014-06-23 00:00:00.0,869,CANCELED
68868,2014-06-24 00:00:00.0,10184,PENDING
68869,2014-06-25 00:00:00.0,7456,PROCESSING
68870,2014-06-26 00:00:00.0,3343,COMPLETE
68871,2014-06-28 00:00:00.0,4960,PENDING
68872,2014-06-29 00:00:00.0,3354,COMPLETE
68873,2014-06-30 00:00:00.0,4545,PENDING
68874,2014-07-03 00:00:00.0,1601,COMPLETE
68875,2014-07-04 00:00:00.0,10637,ON_HOLD
68876,2014-07-06 00:00:00.0,4124,COMPLETE
68877,2014-07-07 00:00:00.0,9692,ON_HOLD
68878,2014-07-08 00:00:00.0,6753,COMPLETE
68879,2014-07-09 00:00:00.0,778,COMPLETE
68880,2014-07-13 00:00:00.0,1117,COMPLETE
68881,2014-07-19 00:00:00.0,2518,PENDING_PAYMENT
68882,2014-07-22 00:00:00.0,10000,ON_HOLD
68883,2014-07-23 00:00:00.0,5533,COMPLETE
%%sh

hdfs dfs -help cat
-cat [-ignoreCrc] <src> ... :
  Fetch all files that match the file pattern <src> and display their content on
  stdout.
%%sh

hdfs dfs -cat /user/${USER}/retail_db/departments/part-*
2,Fitness
3,Footwear
4,Apparel
5,Golf
6,Outdoors
7,Fan Shop
  • You can run the following command using terminal or CLI to see first few lines in a file.

hdfs dfs -cat /user/${USER}/retail_db/orders/part-00000|more