Building and using HBase cluster

Feedback


HBase is an open source distributed column-oriented database built on the Hadoop file system with a scale-out architecture. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. With HBase, users can access(read/write) big data randomly in real time.

This article will walk you through the process of building an HBase cluster system.

Preparing the environment

Prepare three ubuntu-15.10 operating systems, one as the master node of the cluster, and the other two as the workers. The IPs of the three devices used in this example are:

Configure hosts on three ubuntus as described above:

192.168.13.105 master

192.168.13.52 worker1

192.168.13.199 worker2

Recommended hardware configuration

Installing JDK

The JDK needs to be installed on each device. You can download it from this link: https://www.oracle.com/technetwork/java/javase/downloads/index.html. This article will lead you to install jdk-8u111-linux-x64.tar.gz, the steps are as follows:

  1. Unzip tar -zxvf jdk-8u111-linux-x64.tar.gz
  2. Move it to a specified directory (optional):

mkdir /usr/lib/jdk

mv jdk1.8.0_111  /usr/lib/jdk/jdk1.8

  1. Set environment variables:

Method 1: Modify the global configuration file, which can be applied to all users. Enter the following command to open the global configuration file:

vi /etc/profile

Enter the following content:

export JAVA_HOME=/usr/lib/jdk/jdk1.8

export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib

export PATH=.:${JAVA_HOME}/bin:$PATH

Method 2: Modify the current user configuration file, only for the current user, enter the following command to start editing:

vi ~/.bashrc

The input content is the same as above.

  1. Make the modified configuration take effect immediately:

source /etc/profile

or

source ~/.bashrc

  1. Enter the version view command to check whether the installation is successful:

java -version

Configuring SSH and password-free login

  1. Configure each node's own hostname in etc/hostname, and add the IP and hostname of each node to their own etc/hosts file.
  1. Configure password-free login

Execute the command on master and worker nodes respectively:

ssh-keygen -t rsa -P

Among them, -P means password, an optional parameter. By default, you need to type in "Enter" key three times.

After the execution, two files will be generated in the /home/hdfs/.ssh directory:

Execute the following commands on the master and worker nodes respectively:

ssh-copy-id -i /home/hdfs/.ssh/id_rsa.pub [ip] (your own ip)

On the master node, you also need to execute:

ssh-copy-id -i /home/hdfs/.ssh/id_rsa.pub ip (ip of the worker)

Verify whether the configuration was successful on the master and worker nodes respectively:

ssh worker/master

Deploying the HDFS storage system

  1. Install Hadoop in the master node. You can download Hadoop from this link: http://archive.apache.org/dist/hadoop/core/Unzip. This article extracts Hadoop to /home/Hbase/hadoop-2.7.3, and uses ../ to replace the root directory. You can adjust it according to your actual situation.
  2. Create 4 new folders in the hadoop-2.7.3 folder:
  3. ../nfs
  4. ../tmp
  5. ../ nfs /name
  6. ../ nfs /data
  1. Configure Hadoop

First enter the path of the configuration files, for example: ../etc/hadoop, then enter the "ls" command to view the list of files in the path.

The files in the red box above need to be configured.

    1. Configure the core-site.xml file. Enter the following command to edit the file:

vi core-site.xml

Content to be edited:

Note: The value in the hadoop.tmp.dir property is required to be the same with the .. /tmp path created in step 2.

    1. Configure the hadoop-env.sh file. Enter the following command to edit the file:

vi hadoop-env.sh

Configure the JAVA_HOME as the native JAVA_HOME path

  1. Configure yarn-env.sh. Enter the following command to start editing:

vi yarn-env.sh

Modify the JAVA_HOME to the native JAVA_HOME path (You need remove the "#" of this line)

    1.  Configure hdfs-site.xml. Enter the following command to edit the file:

vi hdfs-site.xml

Add the following code to <configuration></configuration>:


Note: The values of dfs.namenode.name.dir and dfs.datanode.data.dir are required to be the same with those of ../nfs/name and ..nfs/data created in step 2; since there are only two child nodes, so dfs.replication is set to 2.

  1. Copy the mapred-site.xml.template file and rename as mapred-site.xml:

cp mapred-site.xml.template mapred-site.xml

  1. Edit mapred-site.xml:

vi mapred-site.xml

Add the following code in the tag <configuration>:

  1. Configure yarn-site.xml:

vi yarn-site.xml

Add the following code to the <configuration> tag:

  1. Configure the slaves file:

vi slaves

Delete the original localhost and change it as the hostnames of the two child nodes:

  1. Configure the masters file:

vi masters

Change to the host name of the primary node

  1. Configure the environment variables of Hadoop, which is similar to JDK. First edit the configuration file:

vi /etc/profile

Type the command source /etc/profile to make the configuration take effect immediately

  1. Next, copy Hadoop package to the same location in the worker1 and worker2 nodes:

scp -r hadoop-2.7.3 root@worker1: /home/hbase/hadoop-2.7.3

scp -r hadoop-2.7.3 root@worker1: /home/hbase/ hadoop-2.7.3

Note: root is the username of ubuntu, which is set when worker1 and worker2 are created.

After the copy finished, perform path configuration on worker1 and worker2 as shown in step 10.

  1. Initialize Hadoop

[root@master bin]$ ./hadoop namenode –format

  1. Enable Hadoop

[root@master sbin]$ ./start-dfs.sh

[root@master sbin]$ ./start-yarn.sh

  1. Check whether the cluster is built successfully

After typing jps on mater, it looks like this:

After typing jps in the worker1 and worker2 nodes, the following is displayed:

which indicates that the cluster was built successfully.

At this point, you can access the service on master node with http://192.168.13.105:50070. If the following interface appears, it shows the Hadoop cluster has been successfully built.

 

  1. Create the HBase directory

Create a new /hbase directory in the Hadoop cluster (for building the Hbase cluster)

hadoop fs –mkdir /hbase

View by clicking Browse the file system

 

If the connection cannot be created, the connection fails. Please check the ../conf/hdfs-site.xml and /etc/hosts files for redundant connection paths.

Deploying the zookeeper cluster

ZooKeeper is a high-performance coordination service for distributed applications and is an important component of Hadoop and Hbase. Its architecture is as follows:

Among them, followers are responsible for responding to read requests, and leader is responsible for submitting write requests.

Below shows how to install the Zookeeper:

  1. Download zookeeper and extract it. This article extracts to the directory: /home/hbase.

tar -zxvf zookeeper-3.4.10.tar.gz

Increase file permissions:

chmod +wxr zookeeper-3.4.10

  1. Modify the zookeeper configuration file, create the data directory and log directory (root directory: /home/hbase).

cd zookeeper-3.4.10

mkdir data

mkdir logs

Note: Rename zoo_sample.cfg to zoo.cfg in the ../ zookeeper-3.4.10/conf/ directory (only one can exist).

    1. Edit the renamed zoo.cfg file

vi conf/zoo.cfg

The editorial content is as follows:

    1. Enter the data directory and edit myid

cd data

vi myid

Consistent with the zoo.cfg file (server.1)

  1. Copy master's zookeeper-3.4.10 to worker1 and worker2:

scp -r zookeeper-3.4.10 root@worker1:/home/hbase/zookeeper-3.4.10

scp -r zookeeper-3.4.10 root@worker2:/home/hbase/zookeeper-3.4.10

  1. Modify the value of myid on worker1 and worker2 to 2 and 3 respectively.

vi myid

  1. Start zookeeper on master, worker1, and worker2

[root@master zookeeper-3.4.10]$ bin/zkServer.sh start

[root@worker1 zookeeper-3.4.10]$ bin/zkServer.sh start

[root@worker2 zookeeper-3.4.10]$ bin/zkServer.sh start

  1. View the status of zookeeper

[root@master zookeeper-3.4.10]$ bin/zkServer.sh status

ZooKeeper JMX enabled by default

Using config: /home/hbase/zookeeper-3.4.10/bin/../conf/zoo.cfg

Mode: follower

[root@worker1 zookeeper-3.4.10]$ bin/zkServer.sh status

ZooKeeper JMX enabled by default

Using config: /home/hbase /zookeeper-3.4.10/bin/../conf/zoo.cfg

Mode: leader

[root@worker2 zookeeper-3.4.10]$ bin/zkServer.sh status

ZooKeeper JMX enabled by default

Using config: /home/hbase /zookeeper-3.4.10/bin/../conf/zoo.cfg

Mode: follower

  1. Verify the zookeeper cluster

[root@master zookeeper-3.4.10]$ bin/zkCli.sh -server c7003:2181

If "Welcome to ZooKeeper!" appears, it means the zookeeper cluster is installed!

Other notes:

Deploying the HBase cluster

  1. Unzip the hbase package with the following command. This article extracts to: /home/hbase.

tar -zxvf hbase-1.3.0-bin.tar.gz

  1. Configuring environment variables

vi /etc/profile

Enter the following content:

Enter the command source /etc/profile to make it take effect immediately

  1. Enter the hbase configuration directory and modify the hbase-env.sh file (path: ../hbase-1.3.1/conf)

Create a new pids directory under ../hbase-1.3.1

  1. Edit hbase-site.xml and add the configuration file:

vi hbase-site.xml

Among them:

  1. Edit the file regionservers under the configuration directory with command:

vi  regionservers

Delete localhost and add content:

worker1

worker2

  1. Download geomesa-hbase-distributed-runtime_2.11-2.0.2.jar and deploy it to the HBase cluster installation directory../lib, then restart the HBase cluster. Download address:

https://mvnrepository.com/artifact/org.locationtech.geomesa/geomesa-hbase-distributed-runtime_2.11/2.0.2

  1. Copy HBase to other machines with command: (in the /home/hbase directory)

scp -r hbase-1.3.1 root@worker1:/home/hbase/ hbase-1.3.1

scp -r hbase-1.3.1 root@worker2:/home/hbase/ hbase-1.3.1

  1. Start the Hbase service on the master machine with command:

[root@master hbase-1.3.1]$ bin/start-hbase.sh

Use the bin/hbase shell to enter the shell environment that comes with Hbase in any of the master, worker1, or worker2 machines, view the hbase information and operations such as creating tables with command version.

Access http://192.168.13.105:16010, if the following interface appears, it means the HBase cluster was successfully built:

Using the HBase cluster

 

To use the HBase in iServer, the hosts file of the computer where the iServer is located needs to be configured. You need to add the IP and port of the HBase cluster host to the hosts file as follows:

After the configuration is complete, the HBase can be associated with iServer via Data Registration of iServer, using as the datasource of Distributed Analysis Service, or the data storage location, or using as the service source to publish the data in it as map service and data service.