Thing which seemed very Thingish inside you is quite different when it gets out into the open and has other people looking at it

Tuesday, February 19, 2013

How to configure a hadoop cluster



Prerequisites

Before we begin there are couple of softwares you need to install  along with the hadoop user, before installing hadoop .

Java - Install Java into a location where all the user groups can access.
Eg: opt/java/jdk-1.6_29

rsync - Install rsync using apt-get (This is to copy the Hadoop distribution of Name Node across all the other nodes)

Create hadoop user -  Navigate to /home. Create the user “hadoop”
To login as user hadoop using the command.
su - hadoop
bash

PS: Above steps need to be performed (software should be installed) on all the other Hadoop nodes as well

Setup public key login from master to slave nodes

Create ssh public keys for each user (ssh-keygen -t rsa -b 2048) and added the public key (*.pub) to the authorized_keys file in master and slave nodes.

Key Exchange for Passphraseless SSH

1. We need to have password / passphraseless SSH to communicate with other Hadoop nodes in the
cluster.

Try to SSH to another node
          ssh hadoop@amani26.poohdedoo.com

2. Generate a key for the Name Node using the following command.

                ssh-keygen

This will generate an output similar to below.

Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
8207:9e:1e:60:37:28:03:a6:18:b3:b6:f1:e4:2f:ef hadoop@bam01

3. This will create a .ssh directory inside the ‘hadoop’ user account. Navigate and to .ssh directory. It will contain a file with the generated public key. Inspect the public key stored in the ‘id_rsa.pub’ file

with the command,
cat id_rsa.pub

It will display the public keys …

4. This public key of the Name Node should appended to the ‘authorized_keys’ file in the other Data
Nodes. Execute the following command and copy the id_rsa.pub file into the other nodes.

                scp id_rsa.pub root@amani276.poohdedoo.com:/root

5. Login to the second Hadoop node’s ‘hadoop’ user account. Try to SSH to another node from this.

                ssh hadoop@amani26.poohdedoo.com

This will create the .ssh directory in the hadoop account.

6. Append the copied public key to the ‘authorized_key’ file in the hadoop account of this Data node.
Execute the following commands.

               cat /root/id_rsa.pub > authorized_keys
        chown hadoop:hadoop authorized_keys
        chmod 600 authorized_keys

7. Now you can ssh to this Data node from the earlier configured Master node. Login to the Master
node. From the hadoop account, login to the Data node with the following command.

ssh -i id_rsa hadoop@amani27.poohdedoo.com
                or
ssh hadoop@amani27.poohdedoo.com

Setup Hadoop
  1. Download and extract hadoop (tar xvfz hadoop-x.x.x.tar.gz -C /mnt/)
  2. Change the permission of the extracted directory if necessary (chown -R user:user /mnt/hadoop-x.x.x)
  3. [optional] If IPv6 is not used disable it.
- add 'net.ipv6.conf.all.disable_ipv6 = 1'

Configure Hadoop

Configuration files $HADOOP_HOME/conf/
  • Set JAVA_HOME in $HADOOP_HOME/conf/hadoop-env.sh (Add export JAVA_HOME=/path/to/javahome)   
    • eg: export JAVA_HOME=/opt/java/jdk1.6.0_29
  • Edit he HADOOP_HOME/conf/core-site.xml as follows:
<configuration>

<property>

<name>fs.default.name</name>
<value>hdfs://hadoop0.poohdedoo.com:9000</value>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/mnt/hadoop_tmp</value>
</property>
</configuration>
  • Edit the $HADOOP_HOME/conf/hdfs-site.xml as follows:
<configuration>

<property>

<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/mnt/hadoop_data/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/mnt/hadoop_data/dfs/data</value>
</property>
</configuration>

  • Edit the $HADOOP_HOME/conf/mapred-site.xml as follows
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoop0.poohdedoo.com:9001</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/mnt/hadoop_data/mapred/system</value>
</property>
</configuration>
  • Edit the $HADOOP_HOME/conf/hadoop-policy.xml
 By default the value for 'security.job.submission.protocol.acl' is *
change it to a user group or a name

<property>
<name>security.job.submission.protocol.acl</name>
<value>adminuser</value>

* Change the 'masters' and 'slaves' files (Master node only; slave machines does not need this configurations)

- $HADOOP_HOME/conf/maseters (masters file contain secondary namenode servers)

hadoop0.poohdedoo.com

- $HADOOP_HOME/conf/slaves (slaves file contain slave servers ; datanodes and task trackers)

hadoop1.poohdedoo.com
hadoop2.poohdedoo.com

Setting up hadoop cluster

Format the namenode before starting the cluster

$HADOOP_HOME/bin/hadoop namenode -format

start the services

$HADOOP_HOME/bin/start-all.sh

It will start namenode,jobtracker, secondarynamenode in master node and datanode and tasktracker on slave nodes. (To check the services run $JAVA_HOME/bin/jps)

stop the services

$HADOOP_HOME/bin/stop-all.sh

It will stop namenode,jobtracker, secondarynamenode in master node and datanode and tasktracker on slave nodes. (To check the services run $JAVA_HOME/bin/jps)

4 comments:

  1. Cloud is one of the tremendous technology that any company in this world would rely on(Salesforce Training). Using this technology many tough tasks can be accomplished easily in no time. Your content are also explaining the same(Salesforce administrator training in chennai). Thanks for sharing this in here. You are running a great blog, keep up this good work.

    ReplyDelete
  2. Truely a very good article on how to handle the future technology. This content creates a new hope and inspiration within me. Thanks for sharing article like this. The way you have stated everything above is quite awesome. Keep blogging like this. Thanks :)

    Software testing training in chennai | Software testing training institutes in chennai | Software testing training institute in chennai

    ReplyDelete
  3. Microsoft Office 2019 Pro Plus Free Download for Windows supporting 32 bit and 64 bit. Setup file is completely standalone and also its an offline. Microsoft Office 2019 Download

    ReplyDelete
  4. DiskBoss Crack is an automated rule-based data analysis and file management solution that enables you to perform disk space analysis,.DiskBoss 16.2.0.30 Crack + Activation

    ReplyDelete