☰
Usually large Hadoop clusters are configured in multiple racks. Communication between two data nodes on the same rack is efficient than the same between two nodes on different racks. In large clusters of Hadoop, in order to improve network traffic while reading/writing HDFS files, NameNode chooses data nodes which are on the same rack or a near by rack to read/write request.
NameNode achieves this rack information by maintaining rack ids of each data node. This concept of choosing closer data nodes based on racks information is called Rack Awareness in Hadoop. A default Hadoop installation assumes all the nodes belong to the same rack.
HDFS block placement will use rack awareness for fault tolerance by placing one block replica on a different rack. This provides data availability in the event of a network switch failure or partition within the cluster.
The placement of replicas is critical to HDFS reliability and performance. Optimizing replica placement distinguishes HDFS from most other distributed file systems. This is a feature that needs lots of tuning and experience. The purpose of a rack-aware replica placement policy is to improve data reliability, availability, and network bandwidth utilization. The current implementation for the replica placement policy is a first effort in this direction. The short-term goals of implementing this policy are to validate it on production systems, learn more about its behavior, and build a foundation to test and research more sophisticated policies.
Large HDFS instances run on a cluster of computers that commonly spread across many racks. Communication between two nodes in different racks has to go through switches. In most cases, network bandwidth between machines in the same rack is greater than network bandwidth between machines in different racks.
The NameNode determines the rack id each DataNode belongs to via the process outlined in Hadoop Rack Awareness. A simple but non-optimal policy is to place replicas on unique racks. This prevents losing data when an entire rack fails and allows use of bandwidth from multiple racks when reading data. This policy evenly distributes replicas in the cluster which makes it easy to balance load on component failure. However, this policy increases the cost of writes because a write needs to transfer blocks to multiple racks.
On Multiple rack cluster, block replications are maintained with a policy that no more than one replica is placed on one node and no more than two replicas are placed in the same rack with a constraint that number of racks used for block replication should be always less than total no of block replicas.
For the common case, when the replication factor is three, HDFS’s placement policy is to put one replica on one node in the local rack, another on a different node in the local rack, and the last on a different node in a different rack. This policy cuts the inter-rack write traffic which generally improves write performance. The chance of rack failure is far less than that of node failure; this policy does not impact data reliability and availability guarantees. However, it does reduce the aggregate network bandwidth used when reading data since a block is placed in only two unique racks rather than three. With this policy, the replicas of a file do not evenly distribute across the racks. One third of replicas are on one node, two thirds of replicas are on one rack, and the other third are evenly distributed across the remaining racks.
For example,
This policy improves write performance without compromising data reliability or read performance.
We are aware of the fact that hadoop replicates the data into multiple file blocks and stores them on different machines. If Rack Awareness is not configured, there may be a possibility that hadoop will place all the copies of the block in same rack which results in loss of data when that rack fails.
Although rare, as rack failure is not as frequent as node failure, this can be avoided by explicitly configuring the Rack Awareness in conf-site.xml.
Rack awareness is configured using the property “topology.script.file.name” in the core-site.xml. If “topology.script.file.name” is not configured, /default-rack is passed for any ip address i.e., all nodes are placed on same rack.
Use the following instructions to configure rack awareness
Topology scripts are used by Hadoop to determine the rack location of nodes. Create a topology script and data file.
File name: rack-topology.sh
#!/bin/bash
# Adjust/Add the property "net.topology.script.file.name"
# to core-site.xml with the "absolute" path the this
# file. ENSURE the file is "executable".
# Supply appropriate rack prefix
RACK_PREFIX=default
# To test, supply a hostname as script input:
if [ $# -gt 0 ]; then
CTL_FILE=${CTL_FILE:-"rack_topology.data"}
HADOOP_CONF=${HADOOP_CONF:-"/etc/hadoop/conf"}
if [ ! -f ${HADOOP_CONF}/${CTL_FILE} ]; then
echo -n "/$RACK_PREFIX/rack "
exit 0
fi
while [ $# -gt 0 ] ; do
nodeArg=$1
exec< ${HADOOP_CONF}/${CTL_FILE}
result=""
while read line ; do
ar=( $line )
if [ "${ar[0]}" = "$nodeArg" ] ; then
result="${ar[1]}"
fi
done
shift
if [ -z "$result" ] ; then
echo -n "/$RACK_PREFIX/rack "
else
echo -n "/$RACK_PREFIX/rack_$result "
fi
done
else
echo -n "/$RACK_PREFIX/rack "
fi
Sample Topology Data File
File name: rack_topology.data
# This file should be:
# - Placed in the /etc/hadoop/conf directory
# - On the Namenode (and backups IE: HA, Failover, etc)
# - On the Job Tracker OR Resource Manager (and any Failover JT's/RM's)
# This file should be placed in the /etc/hadoop/conf directory.
# Add Hostnames to this file. Format
192.168.2.10 01
192.168.2.11 02
192.168.2.12 03
Copy both of these files to the /etc/hadoop/conf directory on all cluster nodes. Then run the rack-topology.sh script.
Stop HDFS. Add the following property to core-site.xml:
<property>
<name>net.topology.script.file.name</name>
<value>/etc/hadoop/conf/rack-awareness.sh</value>
</property>
Restart HDFS and MapReduce. After the services have started,verify that rack awareness has been activated. You can use the Hadoop fsck and dfsadmin -report command to verify.
Hence to get maximum performance, it is important to configure Hadoop so that it knows the topology of your network. Network locations such as hosts and racks are represented in a tree, which reflects the network “distance” between locations. HDFS will use the network location to be able to place block replicas more intelligently to trade off performance and resilience.