BIG
DATA

JAVA

Writing A File To HDFS - Java Program

Read more about »
  • Java 9 features
  • Read about Hadoop
  • Read about Storm
  • Read about Storm
 

HDFS Command - copyFromLocal

Hadoop file system (fs) shell commands are used to perform various file operations. One of them is copyFromLocal. The hadoop copyFromLocal command is used to copy a file from the local file system to the hadoop hdfs. The syntax and usage example are shown below:

hadoop fs -copyFromLocal  URI
Example:
hadoop fs -copyFromLocal /home/corejavaguru/abc.txt  /user/corejavaguru/abc.txt

In this article we will write our own Java program to write the file from local file system to HDFS.

How to write a file to HDFS using Java program

public class FileWriteToHDFS {
 
public static void main(String[] args) throws Exception {
 
	//Source file in the local file system
	String localSrc = args[0];
	//Destination file in HDFS
	String dst = args[1];
	 
	//Input stream for the file in local file system to be written to HDFS
	InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
	 
	//Get configuration of Hadoop system
	Configuration conf = new Configuration();
	System.out.println("Connecting to -- "+conf.get("fs.defaultFS"));
	 
	//Destination file in HDFS
	FileSystem fs = FileSystem.get(URI.create(dst), conf);
	OutputStream out = fs.create(new Path(dst));
	 
	//Copy file from local to HDFS
	IOUtils.copyBytes(in, out, 4096, true);
	 
	System.out.println(dst + " copied to HDFS");
}
}

The program takes in 2 parameters. The first paramter is the file and its location in the local file system that will be copied to the location mentioned in the second parameter in HDFS.

//Source file in the local file system
String localSrc = args[0];
//Destination file in HDFS
String dst = args[1];

We will create a InputStream using the BufferedInputStream object by using the first parameter which is the location of the file in the local file system. The input stream objects are regular java.io stream objects and not hadoop libraries because we are still referencing a file from the local file system and not HDFS.

//Input stream for the file in local file system to be written to HDFS
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));

Now we need to create an output stream to the file location in HDFS where we can write the contents of the file from the local file system. The very first thing we need to know is few key information about the cluster, like the name node details etc. The details are already specified in the configuration files during cluster setup.

The easiest way to get the configuration of the cluster is by instantiating the Configuration object and this will read the configuration files from the classpath and read and load all the information that is needed by the program.

//Get configuration of Hadoop system
Configuration conf = new Configuration();
System.out.println("Connecting to -- "+conf.get("fs.defaultFS"));

//Destination file in HDFS
FileSystem fs = FileSystem.get(URI.create(dst), conf);
OutputStream out = fs.create(new Path(dst));

In the next line we will get the File System object using the URL that we passed as the program’s input and the configuration that we just created. The file system that will be returned is the DistributedFileSystem object. Once we have the file system object the next thing we need is the outputstream to the file that we would like to write the contents of the file from the local file system.

We will then call the create method on the file system object using the location of the file in HDFS which we passed to the program as the second parameter.

//Copy file from local to HDFS
IOUtils.copyBytes(in, out, 4096, true);

Finally we will use copyBytes method from hadoop’s IOUtils class and we will supply the input and output stream object. We will then read 4096 bytes at a time from the input stream and write it to the output stream which will copy the entire file from the local file system to HDFS.