2016年1月31日星期日

Cluster setup in hbase (zz)

Cluster setup in hbase
Before Starting hbase cluster
To configure HBase, we need to have a running Hadoop cluster, which will be the storage for hbase(Hbase store data in HDFS). Please refere to   Installing and configuring hadoop cluster .And plese make sure that user name of all machines and the path where the hbase is installed are same in all machines.In my case user is hduser.
These are the steps ,how to setup and  run Hbase cluster.We have  build hbase cluster using three Ubuntu machine.A distributed HBase depends on a running ZooKeeper cluster.we are using default ZooKeeper cluster, which is manage by Hbase.
There are basically three type of node.
1. Hbase Master:- The HbaseMaster is responsible for assigning regions to HbaseRegionserver, monitors the health of each HbaseRegionserver.
2. Zookeeper: – For any distributed application, ZooKeeper is a centralized service for maintaining configuration information, naming, providing        distributed synchronization, and providing group services.
3Hbase Regionserver:- The HbaseRegionserver is responsible for handling client read and write requests. It communicates with the Hbasemaster to get a list of regions to serve and to tell the master that it is alive.
In our Example, one machine in the cluster is designated as Hbase master and Zookeeper. The rest of machine in the cluster act as a Regionserver.

INSTALLING AND CONFIGURING HBASE MASTER
1. Download hbase-1.1.2tar.gz from http://www.apache.org/dyn/closer.cgi/hbase/ and extract it in some directory in your computer. Now this path is called      as $HBASE_INSTALL_DIR.
2. Edit the file /etc/hosts on the master machine and add the following lines.
192.168.35.16 bsw-HbaseMaster     bsw-HbaseMaster
     Hbase Master and Hadoop Namenode(master machine in hadoop clustering) is configure on same machine
 192.168.35.17 bsw-data1
 192.168.35.25 bsw-data2
Note: Run the command “ping bsw-HbaseMaster”. This command is run to check whether the bsw-HbaseMaster machine IP is being resolved to actual IP not localhost IP.
Here bsw-data1 and bsw-data2 are the machine where region server is running and bsw-HbaseMaster is the machine where hbase-master is running
3. We have needed to configure password less login from bsw-HbaseMaster to all regionserver machines.
          Execute the following commands on bsw-HbaseMaster  machine.
 $ssh-keygen -t rsa
 $scp .ssh/id_rsa.pub hduser@bsw-data1/.ssh/authorized_keys
 $scp .ssh/id_rsa.pub hduser@bsw-data2/.ssh/authorized_keys
4. Open the file $HBASE_INSTALL_DIR/conf/hbase-env.sh and set the $JAVA_HOME.
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_25

5. Open the file $HBASE_INSTALL_DIR/conf/hbase-site.xml and add the following properties.
 
                
                                hbase-master
                                bsw-HbaseMaster:60000
                                The host and port that the HBase master runs at.
                               
                
                
                                hbase.rootdir
                                hdfs://bsw-HbaseMaster:9000/hadoop-datastore
                                The directory shared by region servers.
                
                
                                hbase.cluster.distributed
                                true
                                 Possible values are
                                false: standalone and pseudo-distributed setups with managed
                                Zookeeper true: fully-distributed with unmanaged Zookeeper
                                Quorum (see hbase-env.sh)
                                
                
                
                                hbase.zookeeper.property.clientPort
                                2181
                
                
                                hbase.zookeeper.quorum
                                bsw-HbaseMaster
                
 
Note:-
In our Example, Zookeeper and hbase master both are running in same machine.
6. Open the file $HBASE_INSTALL_DIR/conf/hbase-env.sh and uncomment the following line:
 export HBASE_MANAGES_ZK=true

7. Open the file $HBASE_INSTALL_DIR/conf/regionservers and add all the regionserver machine names.
   bsw-data1
   bsw-data2
   bsw-HbaseMaster
Note: Add bsw-HbaseMaster machine name only if you are running a regionserver on bsw-HbaseMaster machine.

    INSTALLING AND CONFIGURING HBASE REGIONSERVER

1. Download hbase-1.1.2tar.gz from http://www.apache.org/dyn/closer.cgi/hbase/ and extract it in some directory in your computer. Now this path is called as $HBASE_INSTALL_DIR.
2. Edit the file /etc/hosts on the hbase-regionserver machine and add the following lines.
 192.168.35.16 bsw-HbaseMaster       bsw-HbaseMaster
Note: In my case, bsw-HbaseMaster and hadoop-namenode are running on same machine.
Note: Run the command “ping bsw-HbaseMaster”. This command is run to check whether the bsw-HbaseMaster machine IP is being resolved to actual IP not localhost IP.

3.We have needed to configure password less login from bsw-data1 and bsw-data2 to bsw-HbaseMaster machine.
                 Execute the following commands on bsw-data1 and bsw-data2 machine.
$ssh-keygen -t rsa
$scp .ssh/id_rsa.pub hduser@bsw-HbaseMaster/.ssh/authorized_keys2

4. Open the file $HBASE_INSTALL_DIR/conf/hbase-env.sh and set the $JAVA_HOME.
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_25
Note:  If you are using open jdk , then give the path of open jdk.

5. Open the file $HBASE_INSTALL_DIR/conf/hbase-site.xml and add the following properties.

                
                                hbase-master
                                bsw-HbaseMaster:60000
                                The host and port that the HBase master runs at.
                               
                
                
                                hbase.rootdir
                                hdfs://bsw-HbaseMaster:9000/hadoop-datastore
                                The directory shared by region servers.
                

                                hbase.cluster.distributed
                                true
                                 Possible values are
                                false: standalone and pseudo-distributed setups with managed
                                Zookeeper true: fully-distributed with unmanaged Zookeeper
                                Quorum (see hbase-env.sh)
                                
                
                
                                hbase.zookeeper.property.clientPort
                                2181
                  

                
                hbase.zookeeper.quorum
                bsw-HbaseMaster
             
    

6. Open the file $HBASE_INSTALL_DIR/conf/hbase-env.sh and uncomment the following line:
export HBASE_MANAGES_ZK=true

Note:-
Above steps is required on all the datanode in the hadoop cluster.

START AND STOP HBASE CLUSTER

1. Starting the Hbase Cluster:-
we have need to start the daemons only on the bsw-HbaseMaster machine, it will start the daemons in all regionserver machines. Execute the following  command to start the hbase cluster.
$HBASE_INSTALL_DIR/bin/start-hbase.sh
Note:-
           At this point, the following Java processes should run on hbase-master machine.
 hduser@bsw-HbaseMaster:$jps
               14143 Jps
               14007 HQuorumPeer
               14066 HMaster
and the following java processes should run on hbase-regionserver machine.
              23026 HRegionServer
              23171 Jps
2. Starting the hbase shell:-
$HBASE_INSTALL_DIR/bin/hbase shell
                HBase Shell; enter 'help' for list of supported commands.
                Version: 0.20.6, r965666, Mon Jul 19 16:54:48 PDT 203
                hbase(main):001:0>
                Now,create table in hbase.
hbase(main):001:0>create 't1','f1'
                0 row(s) in 1.2910 seconds
                hbase(main):002:0>
Note: – If table is created successfully, then everything is running fine.

3. Stoping the Hbase Cluster:-
    Execute the following command on hbase-master machine to stop the hbase cluster.
   $HBASE_INSTALL_DIR/bin/stop-hbase.sh

2016年1月26日星期二

build hadoop-2.7.1 from source code on Ubuntu-15.10

1. Download hadoop-2.7.1-src.tar.gz and untar it

2. Folloing BUILDING.txt to install dependencies

3. But instead of installing oracle-java7-installer (which Oracle already restricts), install oracle-java8-installer

4. Do not install libprotobuf-dev and protobuf-compiler from apt-get, as it will pull version 2.6.1, but this version of hadoop requires 2.5.0. Instead download protobuf-2.5.0 from web, and run protobuf_arm64_patch.sh (attached below) to patch it, then do './configure; make; make install; ldconfig'

5. Do 'cd hadoop-maven-plugins; mvn install' before building hadoop. This is required for building any hadoop modules (not just eclipse support), otherwise you will run into mvn plugin error

6. Run 'mvn clean install -DskipTests -Pdist -Pnative' to build hadoop, you should find hadoop-2.7.1 directory under hadoop-dist/target

7. Then  follow http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html for single node setup. Remember to modify hadoop-env.sh to set JAVA_HOME variable


Contents of protobuf_arm64_patch.sh:

cd protobuf-2.5.0/
wget https://gist.github.com/BennettSmith/7111094/raw/171695f70b102de2301f5b45d9e9ab3167b4a0e8/0001-Add-generic-GCC-support-for-atomic-operations.patch -O /tmp/0001-Add-generic-GCC-support-for-atomic-operations.patch
wget https://gist.github.com/BennettSmith/7111094/raw/a4e85ffc82af00ae7984020300db51a62110db48/0001-Add-generic-gcc-header-to-Makefile.am.patch -O /tmp/0001-Add-generic-gcc-header-to-Makefile.am.patch
patch -p1 < /tmp/0001-Add-generic-GCC-support-for-atomic-operations.patch
patch -p1 < /tmp/0001-Add-generic-gcc-header-to-Makefile.am.patch
rm /tmp/0001-Add-generic-GCC-support-for-atomic-operations.patch
rm /tmp/0001-Add-generic-gcc-header-to-Makefile.am.patch