http://solutionsatexperts.com/configuring-apache-hadoop-cluster-high-availability-chapter-2/
Step1: download and configure Zookeeper
Step2: Hadoop configuration and high availability settings
Step3: creating folders for Hadoop cluster and file permissions
Step4: hdfs service and file system format
Let's see the Steps in Details :
Step 1 : download and configure Zookeeper
1.1 Download and configure Zookeeper software package from.( https://www.apache.org/dist/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz)
[hduser@mn1~]$wget https://www.apache.org/dist/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz
Extract source
[hduser@mn1~]$tar –zxvf zookeeper-3.4.5.tar.gz
1.2 Zookeeper related configuration files are located
Configuration files : /home/hduser/zookeeper-3.4.5/conf
Binary executables : /home/hduser/zookeeper-3.4.5/bin
The Main configuration file
/home/hduser/zookeeper-3.4.5/conf/zoo.cfg
cp -rp zoo_sample.cfg zoo.cfg
Modifying zoo.cfg as per our installation guide
[hduser@mn1~]$vi /home/hduser/zookeeper-3.4.5/conf/zoo.cfg
tickTime=2000
clientPort=2181
initLimit=5
syncLimit=2
dataDir=/home/hduser/zookeeper/data/
dataLogDir=/home/hduser/zookeeper/log/
server.1=mn1:2888:3888
server.2=mn2:2889:3889
server.3=dn1:2890:3890
Save & Exit!
Note :- Each of the servers hosted in the same physical machine as virtual instance , every server port number has changed to mn1:2888:3888 , mn2: 2889:3889 & dn1:2890:3890
Create the myid file in /home/hduser/zookeeper/data/ and assign the value of each of the nodes in cluster. (mn1=1,mn2=2 & dn1=3)
create directory for data and log refer step3
[hduser@mn1~]$vi /home/hduser/zookeeper/data/myid
1
Save and Exit!
[hduser@mn2~]$vi /home/hduser/zookeeper/data/myid
2
Save & Exit!
[hduser@dn1~]$vi /home/hduser/zookeeper/data/myid
3
Save & Exit!
Step 2 : Hadoop configuration and high availability settings
2.1 Add / modify , following lines in hadoop-env.sh file to apply environment variable settings.
[hduser@mn1~]$ vi /home/hduser/2.3.0/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_45/
export HADOOP_COMMON_LIB_NATIVE_DIR=/home/hduser/2.3.0/lib/native/
export HADOOP_OPTS=”-Djava.library.path=/home/hduser/2.3.0/lib/native/”
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-”/home/hduser/2.3.0/etc/hadoop”}
2.2 Add following lines in cores-site.xml file to configure journaling , default FS , temp directory & hdfs cluster. Within the <configuration> tag.
[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hduser/journal/node/local/data</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/tmp</value>
</property>
2.3 Add following lines in hdfs-site.xml file to configure dfs nameservice , cluster , dfs high availability, zookeper & failover. Within the <configuration> tag.
[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
<final>true</final>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>mn1,mn2</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.mn1</name>
<value>mn1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.mn2</name>
<value>mn2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.mn1</name>
<value>mn1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.mn2</name>
<value>mn2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://mn1:8485;dn1:8485;mn2:8485/mycluster</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>mn1:2181,mn2:2181,dn1:2181</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hduser/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>3000</value>
</property>
2.4 Add datanodes in the slaves configuration file as shown below.
[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/slaves
mn1
mn2
dn1
Save & Exit!
2.5 Add the following lines for applying mapreduce settings, within the <configuration> tag.
[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
Save & Exit!.
Step 3 : creating folders for Hadoop cluster and set file permissions
3.1 Create folder structure for journalnode as defined in core-site.xml, repeat following step in all the cluster nodes (mn1, mn2 & dn1)
[hduser@mn1~]$mkdir –p /home/hduser/journal/node/local/data
3.2 Create temp folder for hadoop cluster as defined in core-site.xml , repeat following step in all the cluster nodes (mn1,mn2 & dn1)
[hduser@mn1~]$mkdir /home/hduser/tmp
3.3 Create the folder structure for Zookeeper data and logs as defined in zoo.cfg , repeat following step in all the nodes in the cluster (mn1, mn2 & dn1)
[hduser@mn1~]$mkdir –p /home/hduser/zookeeper/data/
[hduser@mn1~]$mkdir –p /home/hduser/zookeeper/log/
already completed…
3.4 Copy hadoop source and zookeper .bahs_profile configured in mn1 node to mn2 and dn1
Compress using tar
[hduser@mn1~]$tar -zcvf hadoopmove.tgz 2.3.0 zookeeper-3.4.5 .bash_profile
Copy hadoopmove.tgz to mn2 and dn1
[hduser@mn1~]$scp hadoopmove.tgz hduser@mn2
[hduser@mn1~]$scp hadoopmove.tgz hduser@dn1
Log in mn2 and dn1 extract hadoopmove.tgz
[hduser@mn2~]$tar –zxvf hadoopmove.tgz
[hduser@dn1~]$tar –zxvf hadoopmove.tgz
Step 4 : hdfs service and file system format
4.1 Start zookeeper service in all the nodes in cluster used for zookeeper , repeat below step in all the cluster nodes running zookper(mn1,mn2 & dn1).
[hduser@mn1~]$./zkServer.sh start
[hduser@mn2~]$./zkServer.sh start
[hduser@dn1~]$./zkServer.sh start
4.2 Format Zookeepr file system in mn1
[hduser@mn1~]$hdfs zkfc –formatZK
before format start journalnode in all the cluster nodes (mn1,mn2 & dn1)
$hadoop-daemon.sh start journalnode
4.3 Format namenode in mn1
[hduser@mn1~]$hdfs namenode –format
4.4 Copy meta data information in slave name node in our guide (mn2), run below command in
mn2(slave).
make sure that namenode service running in master node….
$hadoop-daemon.sh start namenode
next…
[hduser@mn2~]$hdfs namenode –bootstrapStandby
start hadoop service
$cd /home/hduser/2.3.0/sbin
./stop-all.sh
and start again.
./start-all.sh
run jps to check services running in mn1 , mn2 and dn1
hostname incorrectly configure in /etc/sysconfig/network and restarted all nodes changes to take effect…
[hduser@mn1 sbin]$ jps
1597 QuorumPeerMain
1990 JournalNode
1835 DataNode
2358 NodeManager
2256 ResourceManager
1743 NameNode
2570 Jps
2168 DFSZKFailoverController
[hduser@mn2 bin]$ jps
1925 DFSZKFailoverController
2035 NodeManager
1833 JournalNode
1667 NameNode
2075 Jps
1573 QuorumPeerMain
1743 DataNode
[hduser@dn1 bin]$ jps
1958 Jps
1595 QuorumPeerMain
1711 JournalNode
1655 DataNode
1840 NodeManager
That's it..
Step1: download and configure Zookeeper
Step2: Hadoop configuration and high availability settings
Step3: creating folders for Hadoop cluster and file permissions
Step4: hdfs service and file system format
Let's see the Steps in Details :
Step 1 : download and configure Zookeeper
1.1 Download and configure Zookeeper software package from.( https://www.apache.org/dist/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz)
[hduser@mn1~]$wget https://www.apache.org/dist/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz
Extract source
[hduser@mn1~]$tar –zxvf zookeeper-3.4.5.tar.gz
1.2 Zookeeper related configuration files are located
Configuration files : /home/hduser/zookeeper-3.4.5/conf
Binary executables : /home/hduser/zookeeper-3.4.5/bin
The Main configuration file
/home/hduser/zookeeper-3.4.5/conf/zoo.cfg
cp -rp zoo_sample.cfg zoo.cfg
Modifying zoo.cfg as per our installation guide
[hduser@mn1~]$vi /home/hduser/zookeeper-3.4.5/conf/zoo.cfg
tickTime=2000
clientPort=2181
initLimit=5
syncLimit=2
dataDir=/home/hduser/zookeeper/data/
dataLogDir=/home/hduser/zookeeper/log/
server.1=mn1:2888:3888
server.2=mn2:2889:3889
server.3=dn1:2890:3890
Save & Exit!
Note :- Each of the servers hosted in the same physical machine as virtual instance , every server port number has changed to mn1:2888:3888 , mn2: 2889:3889 & dn1:2890:3890
Create the myid file in /home/hduser/zookeeper/data/ and assign the value of each of the nodes in cluster. (mn1=1,mn2=2 & dn1=3)
create directory for data and log refer step3
[hduser@mn1~]$vi /home/hduser/zookeeper/data/myid
1
Save and Exit!
[hduser@mn2~]$vi /home/hduser/zookeeper/data/myid
2
Save & Exit!
[hduser@dn1~]$vi /home/hduser/zookeeper/data/myid
3
Save & Exit!
Step 2 : Hadoop configuration and high availability settings
2.1 Add / modify , following lines in hadoop-env.sh file to apply environment variable settings.
[hduser@mn1~]$ vi /home/hduser/2.3.0/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_45/
export HADOOP_COMMON_LIB_NATIVE_DIR=/home/hduser/2.3.0/lib/native/
export HADOOP_OPTS=”-Djava.library.path=/home/hduser/2.3.0/lib/native/”
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-”/home/hduser/2.3.0/etc/hadoop”}
2.2 Add following lines in cores-site.xml file to configure journaling , default FS , temp directory & hdfs cluster. Within the <configuration> tag.
[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hduser/journal/node/local/data</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/tmp</value>
</property>
2.3 Add following lines in hdfs-site.xml file to configure dfs nameservice , cluster , dfs high availability, zookeper & failover. Within the <configuration> tag.
[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
<final>true</final>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>mn1,mn2</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.mn1</name>
<value>mn1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.mn2</name>
<value>mn2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.mn1</name>
<value>mn1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.mn2</name>
<value>mn2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://mn1:8485;dn1:8485;mn2:8485/mycluster</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>mn1:2181,mn2:2181,dn1:2181</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hduser/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>3000</value>
</property>
2.4 Add datanodes in the slaves configuration file as shown below.
[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/slaves
mn1
mn2
dn1
Save & Exit!
2.5 Add the following lines for applying mapreduce settings, within the <configuration> tag.
[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
Save & Exit!.
Step 3 : creating folders for Hadoop cluster and set file permissions
3.1 Create folder structure for journalnode as defined in core-site.xml, repeat following step in all the cluster nodes (mn1, mn2 & dn1)
[hduser@mn1~]$mkdir –p /home/hduser/journal/node/local/data
3.2 Create temp folder for hadoop cluster as defined in core-site.xml , repeat following step in all the cluster nodes (mn1,mn2 & dn1)
[hduser@mn1~]$mkdir /home/hduser/tmp
3.3 Create the folder structure for Zookeeper data and logs as defined in zoo.cfg , repeat following step in all the nodes in the cluster (mn1, mn2 & dn1)
[hduser@mn1~]$mkdir –p /home/hduser/zookeeper/data/
[hduser@mn1~]$mkdir –p /home/hduser/zookeeper/log/
already completed…
3.4 Copy hadoop source and zookeper .bahs_profile configured in mn1 node to mn2 and dn1
Compress using tar
[hduser@mn1~]$tar -zcvf hadoopmove.tgz 2.3.0 zookeeper-3.4.5 .bash_profile
Copy hadoopmove.tgz to mn2 and dn1
[hduser@mn1~]$scp hadoopmove.tgz hduser@mn2
[hduser@mn1~]$scp hadoopmove.tgz hduser@dn1
Log in mn2 and dn1 extract hadoopmove.tgz
[hduser@mn2~]$tar –zxvf hadoopmove.tgz
[hduser@dn1~]$tar –zxvf hadoopmove.tgz
Step 4 : hdfs service and file system format
4.1 Start zookeeper service in all the nodes in cluster used for zookeeper , repeat below step in all the cluster nodes running zookper(mn1,mn2 & dn1).
[hduser@mn1~]$./zkServer.sh start
[hduser@mn2~]$./zkServer.sh start
[hduser@dn1~]$./zkServer.sh start
4.2 Format Zookeepr file system in mn1
[hduser@mn1~]$hdfs zkfc –formatZK
before format start journalnode in all the cluster nodes (mn1,mn2 & dn1)
$hadoop-daemon.sh start journalnode
4.3 Format namenode in mn1
[hduser@mn1~]$hdfs namenode –format
4.4 Copy meta data information in slave name node in our guide (mn2), run below command in
mn2(slave).
make sure that namenode service running in master node….
$hadoop-daemon.sh start namenode
next…
[hduser@mn2~]$hdfs namenode –bootstrapStandby
start hadoop service
$cd /home/hduser/2.3.0/sbin
./stop-all.sh
and start again.
./start-all.sh
run jps to check services running in mn1 , mn2 and dn1
hostname incorrectly configure in /etc/sysconfig/network and restarted all nodes changes to take effect…
[hduser@mn1 sbin]$ jps
1597 QuorumPeerMain
1990 JournalNode
1835 DataNode
2358 NodeManager
2256 ResourceManager
1743 NameNode
2570 Jps
2168 DFSZKFailoverController
[hduser@mn2 bin]$ jps
1925 DFSZKFailoverController
2035 NodeManager
1833 JournalNode
1667 NameNode
2075 Jps
1573 QuorumPeerMain
1743 DataNode
[hduser@dn1 bin]$ jps
1958 Jps
1595 QuorumPeerMain
1711 JournalNode
1655 DataNode
1840 NodeManager
That's it..
No comments:
Post a Comment