Previewing data in HDFS FilesΒΆ
Let us see how we can preview the data in HDFS.
If we are dealing with files contain text data (files of text file format), we can preview contents of the files using different commands as
-tail
,-cat
etc.-tail
can be used to preview last 1 KB of the file-cat
can be used to print the whole contents of the file on the screen. Be careful while using-cat
as it will take a while for even medium sized files.If you want to get first few lines from file you can redirect output of
hadoop fs -cat
orhdfs dfs -cat
to Linuxmore
command
%%sh
hdfs dfs -ls /user/${USER}/retail_db
Found 6 items
drwxr-xr-x - itversity students 0 2021-01-17 20:05 /user/itversity/retail_db/categories
drwxr-xr-x - itversity students 0 2021-01-17 20:05 /user/itversity/retail_db/customers
drwxr-xr-x - itversity students 0 2021-01-17 20:04 /user/itversity/retail_db/departments
drwxr-xr-x - itversity students 0 2021-01-17 20:03 /user/itversity/retail_db/order_items
drwxr-xr-x - itversity students 0 2021-01-17 20:03 /user/itversity/retail_db/orders
drwxr-xr-x - itversity students 0 2021-01-17 20:04 /user/itversity/retail_db/products
%%sh
hdfs dfs -ls -R /user/${USER}/retail_db
drwxr-xr-x - itversity students 0 2021-01-17 20:05 /user/itversity/retail_db/categories
-rw-r--r-- 2 itversity students 1029 2021-01-17 20:05 /user/itversity/retail_db/categories/part-00000
drwxr-xr-x - itversity students 0 2021-01-17 20:05 /user/itversity/retail_db/customers
-rw-r--r-- 2 itversity students 953719 2021-01-17 20:05 /user/itversity/retail_db/customers/part-00000
drwxr-xr-x - itversity students 0 2021-01-17 20:04 /user/itversity/retail_db/departments
-rw-r--r-- 2 itversity students 60 2021-01-17 20:04 /user/itversity/retail_db/departments/part-00000
drwxr-xr-x - itversity students 0 2021-01-17 20:03 /user/itversity/retail_db/order_items
-rw-r--r-- 2 itversity students 5408880 2021-01-17 20:03 /user/itversity/retail_db/order_items/part-00000
drwxr-xr-x - itversity students 0 2021-01-17 20:03 /user/itversity/retail_db/orders
-rw-r--r-- 2 itversity students 2999944 2021-01-17 20:03 /user/itversity/retail_db/orders/part-00000
drwxr-xr-x - itversity students 0 2021-01-17 20:04 /user/itversity/retail_db/products
-rw-r--r-- 2 itversity students 174155 2021-01-17 20:04 /user/itversity/retail_db/products/part-00000
%%sh
hdfs dfs -put -f /data/retail_db /user/${USER}/
%%sh
hdfs dfs -help tail
-tail [-f] <file> :
Show the last 1KB of the file.
-f Shows appended data as the file grows.
%%sh
hdfs dfs -tail /user/${USER}/retail_db/orders/part-00000
014-06-12 00:00:00.0,4229,PENDING
68861,2014-06-13 00:00:00.0,3031,PENDING_PAYMENT
68862,2014-06-15 00:00:00.0,7326,PROCESSING
68863,2014-06-16 00:00:00.0,3361,CLOSED
68864,2014-06-18 00:00:00.0,9634,ON_HOLD
68865,2014-06-19 00:00:00.0,4567,SUSPECTED_FRAUD
68866,2014-06-20 00:00:00.0,3890,PENDING_PAYMENT
68867,2014-06-23 00:00:00.0,869,CANCELED
68868,2014-06-24 00:00:00.0,10184,PENDING
68869,2014-06-25 00:00:00.0,7456,PROCESSING
68870,2014-06-26 00:00:00.0,3343,COMPLETE
68871,2014-06-28 00:00:00.0,4960,PENDING
68872,2014-06-29 00:00:00.0,3354,COMPLETE
68873,2014-06-30 00:00:00.0,4545,PENDING
68874,2014-07-03 00:00:00.0,1601,COMPLETE
68875,2014-07-04 00:00:00.0,10637,ON_HOLD
68876,2014-07-06 00:00:00.0,4124,COMPLETE
68877,2014-07-07 00:00:00.0,9692,ON_HOLD
68878,2014-07-08 00:00:00.0,6753,COMPLETE
68879,2014-07-09 00:00:00.0,778,COMPLETE
68880,2014-07-13 00:00:00.0,1117,COMPLETE
68881,2014-07-19 00:00:00.0,2518,PENDING_PAYMENT
68882,2014-07-22 00:00:00.0,10000,ON_HOLD
68883,2014-07-23 00:00:00.0,5533,COMPLETE
%%sh
hdfs dfs -help cat
-cat [-ignoreCrc] <src> ... :
Fetch all files that match the file pattern <src> and display their content on
stdout.
%%sh
hdfs dfs -cat /user/${USER}/retail_db/departments/part-*
2,Fitness
3,Footwear
4,Apparel
5,Golf
6,Outdoors
7,Fan Shop
You can run the following command using terminal or CLI to see first few lines in a file.
hdfs dfs -cat /user/${USER}/retail_db/orders/part-00000|more