Copying files from local to HDFS¶
We can copy files from local file system to HDFS either by using copyFromLocal
or put
command.
hdfs dfs -copyFromLocal
orhdfs dfs -put
– to copy files or directories from local filesystem into HDFS. We can also usehadoop fs
in place ofhdfs dfs
.However, we will not be able to update or fix data in files when they are in HDFS. If we have to fix any data, we have to move file to local file system, fix data and then copy back to HDFS.
Files will be divided into blocks and will be stored on Datanodes in distributed fashion based on block size and replication factor. We will get into the details later.
%%sh
hdfs dfs -ls /user/${USER}
%%sh
hdfs dfs -mkdir /user/${USER}/retail_db
%%sh
hdfs dfs -ls /user/${USER}
%%sh
hdfs dfs -ls /user/${USER}/retail_db
%%sh
hdfs dfs -help put
%%sh
hdfs dfs -help copyFromLocal
Warning
This will copy the entire folder to /user/${USER}/retail_db
and you will see /user/${USER}/retail_db/retail_db
. You can use the next command to get files as expected.
%%sh
ls -ltr /data/retail_db
%%sh
hdfs dfs -put /data/retail_db /user/${USER}/retail_db
%%sh
hdfs dfs -ls /user/${USER}/retail_db
%%sh
hdfs dfs -ls /user/${USER}/retail_db/retail_db
Note
Let’s drop this folder and make sure files are copied as expected. As the folder is pre-created, we can use patterns to copy the sub folders.
%%sh
hdfs dfs -help rm
%%sh
hdfs dfs -rm -R -skipTrash /user/${USER}/retail_db/retail_db
%%sh
hdfs dfs -ls /user/${USER}/retail_db/
%%sh
hdfs dfs -put /data/retail_db/order* /user/${USER}/retail_db
%%sh
hdfs dfs -ls /user/${USER}/retail_db/
%%sh
hdfs dfs -put -f /data/retail_db/* /user/${USER}/retail_db
%%sh
hdfs dfs -ls /user/${USER}/retail_db/
%%sh
hdfs dfs -ls -R /user/${USER}/retail_db/
Note
Alternatively you can use copyFromLocal
as well.
%%sh
hdfs dfs -rm -R -skipTrash /user/${USER}/retail_db
%%sh
hdfs dfs -mkdir /user/${USER}/retail_db
%%sh
hdfs dfs -ls /user/itversity/retail_db/
%%sh
hdfs dfs -copyFromLocal /data/retail_db/* /user/${USER}/retail_db
%%sh
hdfs dfs -ls /user/${USER}/retail_db
Note
We can also use this alternative approach to directly copy the folder /data/retail_db
to /user/${USER}/retail_db
. Let us first delete /user/${USER}/retail_db
using skipTrash
.
%%sh
hdfs dfs -rm -R -skipTrash /user/${USER}/retail_db
Note
We can specify the target location as /user/${USER}
. It will create the retail_db folder and its contents.
%%sh
hdfs dfs -put /data/retail_db /user/${USER}
%%sh
hdfs dfs -ls /user/${USER}/retail_db
If we try to run
hdfs dfs -put /data/retail_db /user/${USER}
again it will fail as the target folder already exists.
%%sh
hdfs dfs -put /data/retail_db /user/${USER}
We can use
-f
as part ofput
orcopyFromLocal
to replace existing folder.
%%sh
hdfs dfs -put -f /data/retail_db /user/${USER}
%%sh
hdfs dfs -ls /user/${USER}/retail_db
%%sh
hdfs dfs -ls -R /user/${USER}/retail_db