Thursday, September 4, 2014

How to install Apache Hadoop Cluster & High Availability - part 2

http://solutionsatexperts.com/configuring-apache-hadoop-cluster-high-availability-chapter-2/

Step1: download and configure Zookeeper
Step2: Hadoop configuration and high availability settings
Step3: creating  folders for Hadoop cluster and  file permissions
Step4: hdfs service and file system format


Let's see the Steps in Details :
Step 1 : download and configure Zookeeper

1.1  Download and configure Zookeeper software package from.( https://www.apache.org/dist/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz)
[hduser@mn1~]$wget https://www.apache.org/dist/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz
Extract source
[hduser@mn1~]$tar –zxvf zookeeper-3.4.5.tar.gz

1.2  Zookeeper related configuration files are located
Configuration files     : /home/hduser/zookeeper-3.4.5/conf
Binary executables      : /home/hduser/zookeeper-3.4.5/bin
The Main configuration file
/home/hduser/zookeeper-3.4.5/conf/zoo.cfg
cp -rp zoo_sample.cfg zoo.cfg
Modifying zoo.cfg as per our installation guide
[hduser@mn1~]$vi /home/hduser/zookeeper-3.4.5/conf/zoo.cfg
tickTime=2000
clientPort=2181
initLimit=5
syncLimit=2
dataDir=/home/hduser/zookeeper/data/
dataLogDir=/home/hduser/zookeeper/log/
server.1=mn1:2888:3888
server.2=mn2:2889:3889
server.3=dn1:2890:3890
Save & Exit!

Note :-  Each of the servers hosted in the same physical machine as virtual instance , every server port number has changed to mn1:2888:3888 , mn2: 2889:3889 & dn1:2890:3890
Create the myid file in /home/hduser/zookeeper/data/ and assign the value of each of the nodes in cluster. (mn1=1,mn2=2 & dn1=3)

create directory for data and log refer step3
[hduser@mn1~]$vi /home/hduser/zookeeper/data/myid
1
Save and Exit!
[hduser@mn2~]$vi /home/hduser/zookeeper/data/myid
2
Save & Exit!
[hduser@dn1~]$vi /home/hduser/zookeeper/data/myid
3
Save & Exit!

Step 2 : Hadoop configuration and high availability settings
2.1  Add / modify , following lines in hadoop-env.sh file to apply environment variable settings.
[hduser@mn1~]$ vi /home/hduser/2.3.0/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_45/
export HADOOP_COMMON_LIB_NATIVE_DIR=/home/hduser/2.3.0/lib/native/
export HADOOP_OPTS=”-Djava.library.path=/home/hduser/2.3.0/lib/native/”
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-”/home/hduser/2.3.0/etc/hadoop”}

2.2   Add following lines in cores-site.xml file to configure  journaling , default FS , temp directory & hdfs cluster. Within the <configuration> tag.
[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hduser/journal/node/local/data</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/tmp</value>
</property>

2.3  Add following lines in hdfs-site.xml file to configure  dfs nameservice , cluster , dfs high availability, zookeper & failover. Within the <configuration> tag.
[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
<final>true</final>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>mn1,mn2</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.mn1</name>
<value>mn1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.mn2</name>
<value>mn2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.mn1</name>
<value>mn1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.mn2</name>
<value>mn2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://mn1:8485;dn1:8485;mn2:8485/mycluster</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>mn1:2181,mn2:2181,dn1:2181</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hduser/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>3000</value>
</property>

2.4  Add datanodes in the slaves configuration file as shown below.
[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/slaves
mn1
mn2
dn1
Save & Exit!

2.5  Add the following lines for applying mapreduce settings, within the <configuration> tag.
[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
Save & Exit!.

Step 3 : creating  folders for Hadoop cluster and set file permissions
3.1  Create folder structure for journalnode as defined in core-site.xml, repeat following step in all the cluster nodes  (mn1, mn2 & dn1)
[hduser@mn1~]$mkdir –p /home/hduser/journal/node/local/data

3.2  Create temp folder for hadoop cluster as defined in core-site.xml , repeat following step  in all the  cluster nodes (mn1,mn2 & dn1)
[hduser@mn1~]$mkdir /home/hduser/tmp

3.3  Create the folder structure for Zookeeper data and logs as defined in zoo.cfg , repeat following step in all the nodes in the cluster (mn1, mn2 & dn1)
[hduser@mn1~]$mkdir –p /home/hduser/zookeeper/data/
[hduser@mn1~]$mkdir –p /home/hduser/zookeeper/log/
already completed…

3.4   Copy hadoop source and zookeper .bahs_profile configured in mn1 node to mn2 and dn1
Compress using tar
[hduser@mn1~]$tar -zcvf hadoopmove.tgz 2.3.0 zookeeper-3.4.5 .bash_profile
Copy hadoopmove.tgz  to mn2 and dn1
[hduser@mn1~]$scp hadoopmove.tgz hduser@mn2
[hduser@mn1~]$scp hadoopmove.tgz hduser@dn1
Log in mn2 and dn1 extract hadoopmove.tgz
[hduser@mn2~]$tar –zxvf hadoopmove.tgz
[hduser@dn1~]$tar –zxvf hadoopmove.tgz

Step 4 : hdfs service and file system format
4.1  Start zookeeper service in all the nodes in cluster used for zookeeper , repeat below step in all the cluster nodes running zookper(mn1,mn2 & dn1).
[hduser@mn1~]$./zkServer.sh start
[hduser@mn2~]$./zkServer.sh start
[hduser@dn1~]$./zkServer.sh start

4.2  Format Zookeepr file system in mn1
[hduser@mn1~]$hdfs zkfc –formatZK
before format start journalnode in all the cluster nodes (mn1,mn2 & dn1)
$hadoop-daemon.sh start journalnode

4.3 Format namenode in mn1
[hduser@mn1~]$hdfs namenode –format
4.4  Copy meta data information in slave name node in our guide (mn2), run below command in
mn2(slave).
make sure that namenode service running in master node….
$hadoop-daemon.sh start namenode
next…

[hduser@mn2~]$hdfs namenode –bootstrapStandby
start hadoop service
$cd /home/hduser/2.3.0/sbin
./stop-all.sh
and start again.
./start-all.sh
run jps to check services running in mn1 , mn2 and dn1
hostname incorrectly configure in /etc/sysconfig/network and restarted all nodes changes to take effect…

[hduser@mn1 sbin]$ jps
1597 QuorumPeerMain
1990 JournalNode
1835 DataNode
2358 NodeManager
2256 ResourceManager
1743 NameNode
2570 Jps
2168 DFSZKFailoverController

[hduser@mn2 bin]$ jps
1925 DFSZKFailoverController
2035 NodeManager
1833 JournalNode
1667 NameNode
2075 Jps
1573 QuorumPeerMain
1743 DataNode

[hduser@dn1 bin]$ jps
1958 Jps
1595 QuorumPeerMain
1711 JournalNode
1655 DataNode
1840 NodeManager

That's it..

No comments:

Post a Comment