Thing which seemed very Thingish inside you is quite different when it gets out into the open and has other people looking at it

Tuesday, February 19, 2013

How to configure a hadoop cluster


Before we begin there are couple of softwares you need to install  along with the hadoop user, before installing hadoop .

Java - Install Java into a location where all the user groups can access.
Eg: opt/java/jdk-1.6_29

rsync - Install rsync using apt-get (This is to copy the Hadoop distribution of Name Node across all the other nodes)

Create hadoop user -  Navigate to /home. Create the user “hadoop”
To login as user hadoop using the command.
su - hadoop

PS: Above steps need to be performed (software should be installed) on all the other Hadoop nodes as well

Setup public key login from master to slave nodes

Create ssh public keys for each user (ssh-keygen -t rsa -b 2048) and added the public key (*.pub) to the authorized_keys file in master and slave nodes.

Key Exchange for Passphraseless SSH

1. We need to have password / passphraseless SSH to communicate with other Hadoop nodes in the

Try to SSH to another node

2. Generate a key for the Name Node using the following command.


This will generate an output similar to below.

Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/
8207:9e:1e:60:37:28:03:a6:18:b3:b6:f1:e4:2f:ef hadoop@bam01

3. This will create a .ssh directory inside the ‘hadoop’ user account. Navigate and to .ssh directory. It will contain a file with the generated public key. Inspect the public key stored in the ‘’ file

with the command,

It will display the public keys …

4. This public key of the Name Node should appended to the ‘authorized_keys’ file in the other Data
Nodes. Execute the following command and copy the file into the other nodes.


5. Login to the second Hadoop node’s ‘hadoop’ user account. Try to SSH to another node from this.


This will create the .ssh directory in the hadoop account.

6. Append the copied public key to the ‘authorized_key’ file in the hadoop account of this Data node.
Execute the following commands.

               cat /root/ > authorized_keys
        chown hadoop:hadoop authorized_keys
        chmod 600 authorized_keys

7. Now you can ssh to this Data node from the earlier configured Master node. Login to the Master
node. From the hadoop account, login to the Data node with the following command.

ssh -i id_rsa

Setup Hadoop
  1. Download and extract hadoop (tar xvfz hadoop-x.x.x.tar.gz -C /mnt/)
  2. Change the permission of the extracted directory if necessary (chown -R user:user /mnt/hadoop-x.x.x)
  3. [optional] If IPv6 is not used disable it.
- add 'net.ipv6.conf.all.disable_ipv6 = 1'

Configure Hadoop

Configuration files $HADOOP_HOME/conf/
  • Set JAVA_HOME in $HADOOP_HOME/conf/ (Add export JAVA_HOME=/path/to/javahome)   
    • eg: export JAVA_HOME=/opt/java/jdk1.6.0_29
  • Edit he HADOOP_HOME/conf/core-site.xml as follows:


  • Edit the $HADOOP_HOME/conf/hdfs-site.xml as follows:



  • Edit the $HADOOP_HOME/conf/mapred-site.xml as follows
  • Edit the $HADOOP_HOME/conf/hadoop-policy.xml
 By default the value for 'security.job.submission.protocol.acl' is *
change it to a user group or a name


* Change the 'masters' and 'slaves' files (Master node only; slave machines does not need this configurations)

- $HADOOP_HOME/conf/maseters (masters file contain secondary namenode servers)

- $HADOOP_HOME/conf/slaves (slaves file contain slave servers ; datanodes and task trackers)

Setting up hadoop cluster

Format the namenode before starting the cluster

$HADOOP_HOME/bin/hadoop namenode -format

start the services


It will start namenode,jobtracker, secondarynamenode in master node and datanode and tasktracker on slave nodes. (To check the services run $JAVA_HOME/bin/jps)

stop the services


It will stop namenode,jobtracker, secondarynamenode in master node and datanode and tasktracker on slave nodes. (To check the services run $JAVA_HOME/bin/jps)


  1. This website is very helpful for the students who need info about the Hadoop courses.i appreciate for your post. thanks for shearing it with us. keep it up.
    Hadoop Training in hyderabad

  2. Nice piece of article you have shared here, my dream of becoming a hadoop professional become true with the help of hadoop training in velachery, keep up your good work of sharing quality articles.

  3. Cloud is one of the tremendous technology that any company in this world would rely on(Salesforce Training). Using this technology many tough tasks can be accomplished easily in no time. Your content are also explaining the same(Salesforce administrator training in chennai). Thanks for sharing this in here. You are running a great blog, keep up this good work.

  4. Excellent post on iOS mobile apps development!!! The future of mobile application development is on positive note. You can make most it by having in-depth knowledge on mobile application development platform and other stunning features. iOS Training in Chennai | iOS Training Institutes in Chennai

  5. Truely a very good article on how to handle the future technology. This content creates a new hope and inspiration within me. Thanks for sharing article like this. The way you have stated everything above is quite awesome. Keep blogging like this. Thanks :)

    Software testing training in chennai | Software testing training institutes in chennai | Software testing training institute in chennai