Thursday, September 4, 2014

How to Install Apache Hadoop Cluster & High Availability - part 1

http://solutionsatexperts.com/installing-hadoop-2-3-0-chapter-1/

Step 1 : Installing jdk & configure ssh password less login between Hadoop cluster nodes.
Step 2 : Download Apache Hadoop 2.3.0 and configure the cluster nodes


Let's see the steps in Details  :

Step 1 :  Installing jdk, setting environmental variables, configure ssh password less login between Hadoop cluster nodes.

1.1  Log in to mn1, mn2 & dn1 as root user and create a user account (hduser), repeat below step in all the Hadoop cluster nodes(mn1,mn2 & dn1).
#useradd hduser
Set the password for hduser
#passwd hduser

1.2   Add hosts entries in /etc/hosts in all the nodes in the cluster, repeat below steps in all the nodes in the cluster.
[root@mn1 ~]#vi /etc/hosts
192.168.1.39  mn1
192.168.1.57 mn2
192.168.1.72  dn1
Save and Exit!

1.3  Download jdk-7u51-linux-i586.rpm in mn1 and copy to all cluster nodes (/opt/) , repeat below step in all the nodes in cluster (mn1,mn2 & dn1)
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html
#cd /opt/
#wget –no-check-certificate –no-cookies –header “Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com” “http://download.oracle.com/otn-pub/java/jdk/7u51-b13/jdk-7u51-linux-i586.rpm

Install jdk in mn1 and mn2 and dn1
#rpm –ivh /opt/ jdk-7u51-linux-i586.rpm
copy downloaded jdk package from mn1 to mn2 and dn1 as root user
#scp jdk-7u51-linux-i586.rpm mn2:/opt
#scp jdk-7u51-linux-i586.rpm dn1:/opt
Note :- Make sure to remove the existing version of the jdk / jre

1.4   Log in to mn1, mn2 & dn1 as hduser and generate ssh key (public and private keys) in all the cluster nodes.
switch to hduser account from root user in all the nodes
mn1
[hduser@mn1~]$ ssh-keygen -t rsa
mn2
[hduser@mn2~]$ ssh-keygen -t rsa
Dn1
[hduser@dn1~]$ ssh-keygen -t rsa

1.5  Copy from mn1 id_rsa.pub contents in mn1, mn2 and dn1 in authorized_keys, apply below steps for all the nodes in the cluster and change the file permission for id_rsa.pub and authorized keys
[hduser@mn1~]$cd  .ssh
[hduser@mn1 .ssh]$ ssh-copy-id hduser@mn1
[hduser@mn1 .ssh]$ ssh-copy-id hduser@mn2
[hduser@mn1 .ssh]$ ssh-copy-id hduser@dn1

1.6  Set the file permission for id_rsa.pub and authorized_keys in the all the nodes in the cluster, repeat below steps in (mn2 & dn1)
[hduser@mn1 .ssh]$chmod 600 id_rsa.pub  authorized_keys

1.7  Verify ssh password less login between cluster nodes (mn1,mn2 & dn1), repeat below step in  all the nodes in the cluster.
[hduser@mn1~]$ssh mn1
[hduser@mn1~]$ssh mn2
[hduser@mn1~]$ssh dn1

Step 2 :  Download Apache Hadoop 2.3.0 and configure the cluster nodes

2.1   Download latest build for Apache Hadoop from below location, in our guide, we are using hadoop-2.3.0.tar.gz
http://apache.mirrors.pair.com/hadoop/common/hadoop-2.3.0/hadoop-2.3.0.tar.gz
Download apache hadoop source package using wget command as shown below.
[hduser@mn1~]$wget http://apache.mirrors.pair.com/hadoop/common/hadoop-2.3.0/hadoop-2.3.0.tar.gz
Extract
[hduser@mn1~]$tar –zxvf hadoop-2.3.0.tar.gz
Rename Hadoop-2.3.0 to 2.3.0
[hduser@mn1~]$mv hadoop-2.3.0  2.3.0

2.2  Modify  (.bash_profile)  to set environmental variables required for Hadoop and jdk.
[hduser@mn1~]$vi .bash_profile
export JAVA_HOME=/usr/java/jdk1.7.0_45/
export HADOOP_PREFIX=”$HOME/2.3.0″
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}
# Native Path
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_PREFIX/lib”
Save and Exit!
Source the .bash_profile to load the new settings
[hduser@mn1~]$source .bash_profile

2.3
·  Apache Hadoop cluster main configuration files locates in /home/hduser/2.3.0/etc/hadoop
·  core-site.xml , hadoop-env.sh, hdfs-site.xml, mapred-site.xml, yarn-env.sh, yarn-env.sh, slaves
·  Apache Hadoop cluster executable scripts are located in /home/hduser/2.3.0/sbin and /home/hduser/2.3.0/bin
·  Apache Hadoop cluster logs  are located in /home/hduser/2.3.0 /logs (logs will be created once hadoop started..)

2.4  Apache Hadoop cluster main configuration files are as shown below.
·  core-site.xml
·  hadoop-env.sh
·  hdfs-site.xml
·  mapred-site.xml
·  yarn-env.sh
·  yarn-env.sh
·  slaves

That's it..

No comments:

Post a Comment