Copying files from HDFS to Local

We can copy files from HDFS to local file system either by using copyToLocal or get command.

  • hdfs dfs -copyToLocal or hdfs dfs -get – to copy files or directories from HDFS to local filesystem.

  • It will read all the blocks using index in sequence and construct the file in local file system.

  • If the target file or directory already exists in the local file system, get will fail saying already exists

%%sh

hdfs dfs -help get
%%sh

hdfs dfs -help copyToLocal

Warning

This will copy the entire folder from /user/${USER}/retail_db to local home directory and you will see /home/${USER}/retail_db.

%%sh

hdfs dfs -ls /user/${USER}/retail_db
%%sh

ls -ltr /home/${USER}/
%%sh

mkdir /home/${USER}/retail_db
%%sh

hdfs dfs -get /user/${USER}/retail_db/* /home/${USER}/retail_db
%%sh

ls -ltr /home/${USER}/retail_db

Note

This will fail as retail_db folder already exists.

%%sh

hdfs dfs -get /user/${USER}/retail_db /home/${USER}

Note

Alternative approach, where the folder and contents are copied directly.

%%sh

rm -rf /home/${USER}/retail_db
%%sh

ls -ltr /home/${USER}
%%sh

hdfs dfs -get /user/${USER}/retail_db /home/${USER}
%%sh

ls -ltr /home/${USER}/retail_db/*
  • We can also use patterns while using get command to get files from HDFS to local file system. Also, we can pass multiple files or folders in HDFS to get command.

%%sh

rm -rf /home/${USER}/retail_db
%%sh

ls -ltr /home/${USER}
%%sh

mkdir /home/${USER}/retail_db
%%sh

hdfs dfs -get /user/${USER}/retail_db/order* /home/${USER}/retail_db
%%sh

ls -ltr /home/${USER}/retail_db
%%sh

hdfs dfs -get /user/${USER}/retail_db/departments /user/${USER}/retail_db/products /home/${USER}/retail_db
%%sh

ls -ltr /home/${USER}/retail_db
%%sh

hdfs dfs -get /user/${USER}/retail_db/categories /user/${USER}/retail_db/customers /home/${USER}/retail_db
%%sh

ls -ltr /home/${USER}/retail_db