BIG
DATA

JAVA

Reading A File From HDFS - Java Program

Read more about »
  • Java 9 features
  • Read about Hadoop
  • Read about Storm
  • Read about Storm
 

HDFS Command - copyToLocal

The hadoop copyToLocal command is used to copy a file from the hdfs to the local file system. The syntax and usage example is shown below:

Usage:
hadoop fs -copyToLocal [-ignorecrc] [-crc] URI 

Example:
hadoop fs -copyToLocal /user/hadoop/corejavaguru/article articledemo

In this article we will write our own Java program to read the file from HDFS.

How to read a file from HDFS using Java program

public class FileReadFromHDFS {

	public static void main(String[] args) throws Exception {

		// File to read in HDFS
		String uri = args[0];

		Configuration conf = new Configuration();

		// Get the filesystem - HDFS
		FileSystem fs = FileSystem.get(URI.create(uri), conf);
		FSDataInputStream in = null;

		try {
			// Open the path mentioned in HDFS
			in = fs.open(new Path(uri));
			IOUtils.copyBytes(in, System.out, 4096, false);

			System.out.println("End Of file: HDFS file read complete");

		} finally {
			IOUtils.closeStream(in);
		}
	}
}

This program will take in an argument which is nothing but the fully qualified HDFS path to a file which we would read and display the contents of the file on the screen. This program will simulate the hadoop fs -cat command.

//File to read in HDFS
String uri = args[0];

We need to know is few key information about the cluster, like the name node details etc. The details are already specified in the configuration files during cluster setup.

Configuration conf = new Configuration();

The easiest way to get the configuration of the cluster is by instantiating the Configuration object and this will read the configuration files from the classpath and read and load all the information that is needed by the program.

//Get the filesystem - HDFS
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null;

In the next line we will get the FileSystem object using the URL that we passed as the program input and the configuration that we just created. This will return the DistributedFileSystem object and once we have the file system object the next thing we need is the input stream to the file that we would like to read.

in = fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out, 4096, false);

We can get the input stream by calling the open method on the file system object by supplying the HDFS URL of the file we would like to read. Then we will use copyBytes method from the Hadoop’s IOUtils class to read the entire file’s contents from the input stream and print it on the screen.