Assignment topic

Bit system for local compilation installation mode

choose 2

(1)  Can you give it to web Monitoring interface with security mechanism , How to achieve ? The capture process

(2) simulation namenode collapse , For example name Delete all contents of the directory , And then through secondary namenode recovery namenode, Capture the process of the experiment

(3)  How to change HDFS Block size ? Experimental verification and capture process

(4)  hold secondary namenode and namenode Separate , Deploy to a separate node , Capture the process of the experiment

(5)  stay Hadoop After the successful implementation of cluster , Format the name node again , Excuse me at this time datanode Can you join the cluster ? If not, how to solve it ? Simulate the process and capture

(6)  How to control namenode The frequency of checkpoints , The process before and after checkpoint is simulated by experiments , And capture the metadata situation before and after the occurrence for comparison , Description of the

compile Hadoop2.X 64bit

2.1  Operating environment description

2.1.1 Hardware and software environment

Threads , Main frequency 2.2G,6G Memory

l  Virtual software :VMware Workstation 9.0.0 build-812388

l  Virtual machine operating system :CentOS 6.5 64 position , Single core ,1G Memory

l  JDK:1.7.0_55 64 position

l  Hadoop:Release 2.3.0 source

2.1.2 Cluster network environment

The cluster contains only one node , Set up IP The address is 192.168.1.200.

2.2  Environment building

2.2.1JDK Installation and Java Environment variable configuration

1.     download JDK1.7 64bit Installation package

open JDK1.7 64bit The download link of the installation package is :

http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

After opening the interface , Select first Accept License Agreement , Then download jdk-7u55-linux-x64.tar.gz, As shown in the figure below :

2.     give hadoop user /usr/lib/java Directory read and write permissions , The commands are as follows :

sudo chmod -R 777 /usr/lib/java

3.     Download the installation package , Use ssh Tool upload /usr/lib/java Under the table of contents , Use the following command to unzip

tar -zxvf jdk-7u55-linux-x64.tar.gz

The directory after decompression is shown in the figure below :

4.     Use root User configuration /etc/profile, This setting works for all users

vi /etc/profile

export JAVA_HOME=/usr/lib/java/jdk1.7.0_55

export PATH=$JAVA_HOME/bin:$PATH

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

5.     Log back in and verify

logout

java -version

2.2.2 Install and set maven

1.     download maven Installation package , Proposed installation 3.0 Above version , This installation selects maven3.0.5 The binary package of , The download address is as follows

http://mirror.bit.edu.cn/apache/maven/maven-3/

2.     Use ssh Tool handle maven Packages uploaded to /home/hadoop/Downloads Catalog

3.     decompression apache-maven-3.0.5-bin.tar.gz package

tar -zxvf apache-maven-3.0.5-bin.tar.gz

4.     hold apache-maven-3.0.5 Move directory to /usr/local Under the table of contents

sudo mv apache-maven-3.0.5 /usr/local

5.     stay /etc/profile Add the following settings to the configuration file

export PATH=$JAVA_HOME/bin:/usr/local/apache-maven-3.0.5/bin:$PATH

6.     edit /etc/profile File and verify that the configuration is successful :

source /etc/profile

mvn -version

2.2.3 With root The user to use yum install svn

yum install svn

2.2.4 With root The user to use yum install autoconf automake libtool cmake

yum install autoconf automake libtool cmake

2.2.5 With root The user to use yum install ncurses-devel

yum install ncurses-devel

2.2.6 With root The user to use yum install openssl-devel

yum install openssl-devel

2.2.7 With root The user to use yum install gcc*

yum install gcc*

2.2.8 Install and set protobuf

notes : The package needs to be in gcc It can only be installed after installation , Otherwise, the prompt cannot be found gcc compiler .

1.     download protobuf Installation package

The download link is : https://code.google.com/p/protobuf/downloads/list

2.     Use ssh Tool handle protobuf-2.5.0.tar.gz Packages uploaded to /home/hadoop/Downloads Catalog

3.     Unzip the installation package

tar -zxvf protobuf-2.5.0.tar.gz

4.     hold protobuf-2.5.0 Directory to /usr/local Under the table of contents

sudo mv protobuf-2.5.0 /usr/local

5.     Run the directory command

Go to the directory to root The user runs the following command :

./configure

make

make check

make install

6.     Verify that the installation was successful

After successful operation , Verify that the installation is successful in the following ways

protoc

2.3  compile Hadoop

2.3.1 download Hadoop Source code Release2.2.0

adopt SVN obtain Hadoop2.2.0 Source code , stay /home/hadoop/Downloads Directory command :

svn checkout http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.2.0

It takes a long time to get , The size is about 75.3M

2.3.2 compile Hadoop Source code

:) because hadoop2.2.0 stay svn in pom.xml A bit of a problem. , Will cause errors in compilation , Please refer to 5.5 Fix the problem . stay Hadoop The root of the source code executes the following command :

mvn package -Pdist,native -DskipTests –Dtar

( Be careful : This command needs to be entered manually , If the copy is executed, an exception will be reported !)

minute , Networking is needed during the compilation process , Download the information you need from the Internet .

2.3.3 Verify that the compilation was successful

To hadoop-dist/target/hadoop-2.2.0/lib/native See... In the catalog libhadoop.so.1.0.0 attribute :

file ./libhadoop.so.1.0.0

position

stay hadoop-dist/target It's packed in the catalog hadoop-2.2.0.tar.gz, This document serves as Hadoop2.X 64 Bit installation package .

Hadoop2.X 64bit install

3.1  Operating environment description

3.1.1 Hardware and software environment

Threads , Main frequency 2.2G,6G Memory

l  Virtual software :VMware Workstation 9.0.0 build-812388

l  Virtual machine operating system :CentOS 64 position , Single core ,1G Memory

l  JDK:1.7.0_55 64 position

l  Hadoop:2.2.0

3.1.2 Cluster network environment

individual namenode、2 individual datanode, Nodes can interact with each other ping through . node IP The addresses and host names are distributed as follows :

Serial number

IP Address

machine name

type

user name

10.88.147.226

hadoop1

Name node

hadoop

10.88.147.227

hadoop2

Data nodes

hadoop

10.88.147.228

hadoop3

Data nodes

hadoop

All nodes are CentOS6.5 64bit System , Firewalls are disabled , One... Is created on all nodes hadoop user , The user's home directory is /usr/hadoop. A directory is created on all nodes /usr/local/hadoop, And the owner is hadoop user . Because this directory is used to install hadoop, Users have to have rwx jurisdiction .( The general practice is root The user is in /usr/local Create hadoop Catalog , And modify the directory owner as hadoop(chown –R hadoop /usr/local/hadoop), Otherwise, by SSH To other machines Hadoop There will be a prompt of insufficient permissions in the file .

3.1.3 Installation tools

3.1.3.1Linux File transfer tool

towards Linux It is recommended to use the system transfer file SSH Secure File Transfer, At the top of the tool are menus and shortcuts for the tool , The left side of the middle part is the local file directory , On the right is the remote file directory , You can download and upload files by dragging , The bottom is the operation monitoring area , As shown in the figure below :

3.1.3.2Linux Command line execution tools

l  SSH Secure Shell

SSH Secure The tool SSH Secure Shell Provides remote command execution , As shown in the figure below :

l  SecureCRT

SecureCRT Is a common remote execution Linux Command line tools , As shown in the figure below :

3.2  Environment building

This installation cluster is divided into three nodes , The node settings follow 2.1.2 Chapter settings . Environment building is divided into two parts , Specifically, configure the local environment and set the operating system environment .

3.2.1 Configure the local environment

The configuration of the server in this part needs to be done locally on the server , After configuration, restart the server to confirm whether the configuration is effective , In particular, the remote access server needs to be fixed IP Address .

3.2.1.1 Set up IP Address

1.     Click on System-->Preferences-->Network Connections, As shown in the figure below :

2.     Modify or rebuild the network connection , Set the connection to manual mode , Set the following network information :

IP Address :    10.88.147.*

Subnet mask : 255.255.255.0

gateway :     10.88.*.*

DNS:     10. **.***.** ( The Internet needs to be set up DNS The server )

Be careful : gateway 、DNS According to the actual situation of the network , And set the connection mode to "Available to all users", Otherwise, the remote connection will be unable to connect to the server after the server is restarted

3.     On the command line , Use ifconfig Command view settings IP Address information , If modified IP Don't take effect , You need to restart the machine and set it up again ( If the machine needs remote access after setup , It is recommended to restart the machine , Confirm the machine IP Whether to take effect ):

3.2.1.2 Set machine name

With root The user login , Use vi /etc/sysconfig/network Open profile , Set the machine name of the server according to the actual situation , The new machine name will take effect after restart

3.2.1.3 Set up Host The mapping file

1.     Use root Identity editor /etc/hosts The mapping file , Set up IP The mapping between address and machine name , The setting information is as follows :

vi /etc/hosts

l     10.88.147.226 hadoop1

l     10.88.147.227 hadoop2

l     10.88.147.228 hadoop3

2.     Use the following command to restart the network settings

/etc/init.d/network restart

3.     Verify that the settings are successful

3.2.1.4 Internet configuration

First step    With root The user to use vi /etc/profile Command to open the configuration file , As shown in the figure below :

The second step    Set the following configuration in this file :

export http_proxy=proxy.*****:8080

export no_proxy="localhost,10.88.*,hadoop*"

export https_proxy=proxy.*****:8080

3.2.2 Set up the operating system environment

3.2.2.1 Turn off firewall

stay Hadoop You need to shut down the firewall and SElinux, Otherwise, there will be an exception

1.     service iptables status View firewall status , It is shown as follows iptables Already open

2.     With root The user uses the following command to close iptables

chkconfig iptables off

3.2.2.2 close SElinux

1.     Use getenforce Command to see if it is off

2.     modify /etc/selinux/config file

take SELINUX=enforcing Change it to SELINUX=disabled, Execute the command and restart the machine

3.2.2.3JDK Installation and configuration

7.     download JDK1.7 64bit Installation package

open JDK1.7 64bit The download link of the installation package is :

http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

After opening the interface , Select first Accept License Agreement , Then download jdk-7u55-linux-x64.tar.gz, As shown in the figure below :

8.     give hadoop user /usr/lib/java Directory read and write permissions , The commands are as follows :

sudo chmod -R 777 /usr/lib/java

There may be problems with this step 5.2, Refer to the solution

9.     Download the installation package , Use 2.1.3.1 To introduce the ssh Tool upload /usr/lib/java Under the table of contents , Use the following command to unzip

tar -zxvf jdk-7u55-linux-x64.tar.gz

The directory after decompression is shown in the figure below :

10.  Use root User configuration /etc/profile, This setting works for all users

vi /etc/profile

export JAVA_HOME=/usr/lib/java/jdk1.7.0_55

export PATH=$JAVA_HOME/bin:$PATH

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

11.  Log back in and verify

logout

java -version

3.2.2.4 to update OpenSSL

C Self contained OpenSSL There is bug, If not updated OpenSSL stay Ambari The deployment process will fail SSH Connecting nodes , Use the following command to update :

yum update openssl

There may be problems with this step 5.3, Refer to the solution

3.2.2.5SSH No password authentication configuration

1.     With root The user to use vi /etc/ssh/sshd_config, open sshd_config The configuration file , Open three configurations , As shown in the figure below :

RSAAuthentication yes

PubkeyAuthentication yes

AuthorizedKeysFile .ssh/authorized_keys

2.     Restart the service after configuration

service sshd restart

3.     After completing the above steps , Copy two copies of the virtual machine , Respectively hadoop2 and hadoop3 Data nodes ,IP See... For settings 1.2 chapter

4.     Use hadoop The user logs in to the three nodes and uses the following command to generate the private key and public key ;

ssh-keygen -t rsa

5.     Get into /home/hadoop/.ssh The directory names the public key in each of the three nodes as authorized_keys_hadoop1、authorized_keys_hadoop2 and authorized_keys_hadoop3, The commands are as follows :

cp id_rsa.pub authorized_keys_hadoop1

6.     Take the two slave nodes (hadoop2、hadoop3) The public key of scp The command is sent to hadoop1 Node /home/hadoop/.ssh In the folder ;

scp authorized_keys_hadoop2 hadoop@hadoop1:/home/hadoop/.ssh

7.     Save the public key information of the three nodes to authorized_key In file

Use cat authorized_keys_hadoop1 >> authorized_keys command

8.     Distribute the file to the other two slave nodes

Use scp authorized_keys hadoop@hadoop2:/home/hadoop/.ssh Distribute the password file

9.     Use the following settings in three machines authorized_keys read-write permission

chmod 400 authorized_keys

10.  test ssh Whether password free login is effective

3.3  To configure Hadooop Set up

3.3.1 Download and unzip hadoop Installation package

Bit operating system installation , stay 64 Bit server installation will appear 5.4 The error exception of . We use what we compiled in the previous step hadoop-1.1.2-bin.tar.gz File as installation package ( It can also be downloaded from the Internet native Folder or packaged 64 position hadoop Installation package ), Use 2.1.3.1 To introduce the ssh Tool upload /home/hadoop/Downloads Under the table of contents

2.     Decompress on the master node

cd /home/hadoop/Downloads/

tar -xzvf hadoop-2.2.0.tar.gz

3.     hold hadoop-2.2.0 Move directory to /usr/local Under the table of contents

sudo mv hadoop-2.2.0 /usr/local

cd /usr/local

ls

4.     Use chown Command traversal modification hadoop-1.1.2 The directory owner is hadoop

sudo chown -R hadoop /usr/local/hadoop-2.2.0

3.3.2 stay Hadoop Create subdirectories under the directory

Use hadoop The user is in hadoop-2.2.0 Create under directory tmp、name and data Catalog , Make sure the directory owner is hadoop

mkdir tmp

mkdir name

mkdir data

ls

3.3.3 To configure hadoop-env.sh

1.     Open profile hadoop-env.sh

cd etc/hadoop

sudo vi hadoop-env.sh

2.     Add configuration content , Set up hadoop in jdk and hadoop/bin route

export JAVA_HOME=/usr/lib/java/jdk1.7.0_55

export PATH=$PATH:/usr/local/hadoop-2.2.0/bin

3.     Compile configuration file hadoop-env.sh, And confirm that

source hadoop-env.sh

hadoop version

3.3.4 To configure yarn-env.sh

1.     stay /usr/local/hadoop-2.2.0/etc/hadoop Open profile yarn-env.sh

cd /usr/local/hadoop-2.2.0/etc/hadoop

sudo vi yarn-env.sh

2.     Add configuration content , Set up hadoop in jdk and hadoop/bin route

export JAVA_HOME=/usr/lib/java/jdk1.7.0_55

3.     Compile configuration file yarn-env.sh, And confirm that

source yarn-env.sh

3.3.5 To configure core-site.xml

1.     Use the following command to open core-site.xml The configuration file

sudo vi core-site.xml

2.     In the configuration file , Configure as follows

<configuration>

  <property>

    <name>fs.default.name</name>

    <value>hdfs://hadoop1:9000</value>

  </property>

  <property>

    <name>fs.defaultFS</name>

    <value>hdfs://hadoop1:9000</value>

  </property>

  <property>

    <name>io.file.buffer.size</name>

    <value>131072</value>

  </property>

  <property>

    <name>hadoop.tmp.dir</name>

    <value>file:/usr/local/hadoop-2.2.0/tmp</value>

    <description>Abase for other temporary directories.</description>

  </property>

  <property>

    <name>hadoop.proxyuser.hduser.hosts</name>

    <value>*</value>

  </property>

  <property>

    <name>hadoop.proxyuser.hduser.groups</name>

    <value>*</value>

  </property>

</configuration>

3.3.6 To configure hdfs-site.xml

1.     Use the following command to open hdfs-site.xml The configuration file

sudo vi hdfs-site.xml

2.     In the configuration file , Configure as follows

<configuration>

  <property>

   <name>dfs.namenode.secondary.http-address</name>

   <value>hadoop1:9001</value>

  </property>

  <property>

   <name>dfs.namenode.name.dir</name>

   <value>file:/usr/local/hadoop-2.2.0/name</value>

  </property>

  <property>

   <name>dfs.datanode.data.dir</name>

   <value>file:/usr/local/hadoop-2.2.0/data</value>

  </property>

  <property>

   <name>dfs.replication</name>

   <value>2</value>

  </property>

  <property>

   <name>dfs.webhdfs.enabled</name>

   <value>true</value>

  </property>

</configuration>

3.3.7 To configure mapred-site.xml

1.     By default, it doesn't exist mapred-site.xml file , You can make a copy from the template

cp mapred-site.xml.template mapred-site.xml

2.     Use the following command to open mapred-site.xml The configuration file

sudo vi mapred-site.xml

3.     In the configuration file , Configure as follows

<configuration>

  <property>

    <name>mapreduce.framework.name</name>

    <value>yarn</value>

  </property>

  <property>

    <name>mapreduce.jobhistory.address</name>

    <value>hadoop1:10020</value>

  </property>

  <property>

    <name>mapreduce.jobhistory.webapp.address</name>

    <value>hadoop1:19888</value>

  </property>

</configuration>

3.3.8 To configure yarn-site.xml

1.     Use the following command to open yarn-site.xml The configuration file

sudo vi yarn-site.xml

2.     In the configuration file , Configure as follows

<configuration>

  <property>

    <name>yarn.nodemanager.aux-services</name>

    <value>mapreduce_shuffle</value>

  </property>

  <property>

    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

    <value>org.apache.hadoop.mapred.ShuffleHandler</value>

  </property>

  <property>

    <name>yarn.resourcemanager.address</name>

    <value>hadoop1:8032</value>

  </property>

  <property>

    <name>yarn.resourcemanager.scheduler.address</name>

    <value>hadoop1:8030</value>

  </property>

  <property>

    <name>yarn.resourcemanager.resource-tracker.address</name>

    <value>hadoop1:8031</value>

  </property>

  <property>

    <name>yarn.resourcemanager.admin.address</name>

    <value>hadoop1:8033</value>

  </property>

  <property>

    <name>yarn.resourcemanager.webapp.address</name>

    <value>hadoop1:8088</value>

  </property>

</configuration>

3.3.9 To configure slaves file

1.     Set up slave nodes

sudo vi slaves

hadoop2

hadoop3

3.3.10   Distribute... To each node hadoop Program

1.     stay hadoop2 and hadoop3 Created in the machine /usr/local/hadoop-2.2.0 Catalog , Then modify all permissions of the directory

sudo mkdir /usr/local/hadoop-2.2.0

sudo chown –R hadoop /usr/local/hadoop-2.2.0

2.     stay hadoop1 Enter on the machine /usr/local/hadoop-2.2.0 Catalog , Use the following command to hadoop Copy folder to hadoop2 and hadoop3 machine

Use command scp -r * hadoop@hadoop2:/usr/local/hadoop-2.2.0

3.     Check whether the copy is successful from the node

3.3.11   format namenode

./bin/hdfs namenode -format

3.3.12   start-up hdfs

cd hadoop-2.2.0/sbin

./start-dfs.sh

When a , Problems arise 5.4 abnormal , We can refer to the solution

3.3.13   Verify that the current

At this time in hadoop1 The processes running above are :namenode,secondarynamenode

hadoop2 and hadoop3 The processes running above are :datanode

3.3.14   start-up yarn

./start-yarn.sh

3.3.15   Verify that the current

At this time in hadoop1 The processes running on are :namenode,secondarynamenode,resourcemanager

hadoop2 and hadoop3 The processes running above are :datanode,nodemanager

Experimental problem solving

4.1  Operating environment description

The following experimental problem solving is in the second 1~2 Solve the problem in the environment built by Zhou , That is to say Hadoop1.1.2 Version of the simulation of the occurrence and solution of the problem .

4.1.1 Hardware and software environment

Threads , Main frequency 2.2G,6G Memory

l  Virtual software :VMware Workstation 9.0.0 build-812388

l  Virtual machine operating system : All three nodes are CentOS6.5 64 position , Single core ,1G Memory

l  JDK:1.7.0_55 64 position

l  Hadoop:1.1.2

4.1.2 Cluster network environment

individual namenode、2 individual datanode, Nodes can interact with each other ping through . node IP The addresses and host names are distributed as follows :

Serial number

IP Address

machine name

type

user name

Run the process

10.88.147.221

hadoop1

Name node

hadoop

NN、SNN、JobTracer

10.88.147.222

hadoop2

Data nodes

hadoop

DN、TaskTracer

10.88.147.223

hadoop3

Data nodes

hadoop

DN、TaskTracer

All nodes are CentOS System , Firewalls are disabled , One... Is created on all nodes hadoop user , The user's home directory is /usr/hadoop. A directory is created on all nodes /usr/local/hadoop, And the owner is hadoop user .

4.2  problem 1-- to web Monitoring interface with security mechanism

4.2.1 modify Core-Site.xml file

Here's how to add a section to the configuration :

  <property>

    <name>hadoop.http.filter.initializers</name>

    <value>org.apache.hadoop.security.AuthenticationFilterInitializer</value>

    <description>HTTP Authentication document in hadoop tar file</description>

  </property>

  <property>

    <name>hadoop.http.authentication.type</name>

    <value>simple</value>

    <description>authentication type for web UI</description>

  </property>

  <property>

    <name>hadoop.http.authentication.token.validity</name>

    <value>36000</value>

    <description>how long authentication token is valid before it needs to be renewed</description>

  </property>

  <property>

    <name>hadoop.http.authentication.signature.secret.file</name>

    <value>/usr/local/hadoop-1.1.2/signature-secret</value>

    <description>signature secret for signing authentication tokens</description>

  </property>

  <property>

    <name>hadoop.http.authentication.cookie.domain</name>

    <value></value>

    <description>domain to use for the http cookie that stores authentication token</description>

  </property>

  <property>

    <name>hadoop.http.authentication.simple.anonymous.allowed</name>

    <value>false</value>

    <description>anonymous web UI requests enabled or disabled</description>

  </property>

4.2.2 Manually create signature-secret file

stay /usr/local/hadoop-1.1.2 Create under directory signature-secret file , Use the following command :

echo hadoop >signature-secret

4.2.3 Distribute the document to two datanode in

stay namenode The creation needs to distribute this file to all datanode, Use the following command :

scp signature-secret hadoop@hadoop2:/usr/local/hadoop-1.1.2

4.2.4 Restart hadoop

4.2.5 Verify access

Use http://10.88.147.221:50030/jobtracker.jsp visit jobtracker The page has the following error :

stay http Visit the address to join ?user.name=hadoop You can visit , But the problem is user.name Yes, you can type it at will , stay http Zhongming code transmission , Still not safe !

4.3  problem 2-- simulation namenode Collapse and recover

4.3.1 Delete NameNode in Name All files in the directory , Simulation crash

stay NameNode In nodes name The catalogue is /usr/local/hadoop-1.1.2/hdfs/name, Use the following command to delete all the files in this folder :

cd /usr/local/hadoop-1.1.2/hdfs/name

rm -R *

4.3.2 restart Hadoop

Use ./stop-all.sh stop it Hadoop, And then use ./start-all.sh start-up Hadoop, adopt jps The command can be seen namenode The process could not be started

cd /usr/local/hadoop-1.1.2/bin

./stop-all.sh

./start-all.sh

adopt hadoop Command view hdfs Unable to connect to file , As shown in the figure below :

hadoop fs -ls

see logs Under the folder NameNode Corresponding log file hadoop-hadoop-jobtracker-hadoop1.log, The following error occurred :

INFO org.apache.hadoop.mapred.JobTracker: Problem connecting to HDFS Namenode... re-trying

java.net.ConnectException: Call to hadoop1/10.88.147.221:9000 failed on connection exception: java.net.ConnectException: Connection refused

4.3.3 format NameNode

stop it Hadoop after , Format with the following command Hadoop:

./stop-all.sh

./hadoop namenode -format

4.3.4 obtain DataNode Of namespaceID

adopt ssh Connect to hadoop2 node , Get into DataNode The folder where the data is stored /usr/local/hadoop-1.1.2/hdfs/data/current, Check it with the following command VERSION The contents of the document :

ssh hadoop2

cd /usr/local/hadoop-1.1.2/hdfs/data/current

ls

cat VERSION

obtain namespaceID value

4.3.5 modify NameNode Of namespaceID

modify hadoop1 machine NameNode Catalog /usr/local/hadoop-1.1.2/hdfs/name/current in VERSION In file namespaceID The value is the previous step DataNode The corresponding value , The results are shown in the following figure :

cd /usr/local/hadoop-1.1.2/hdfs/name/current

vi VERSION

4.3.6 Delete NameNode Of fsimage

Delete by the following command NameNode in fsimage:

cd /usr/local/hadoop-1.1.2/hdfs/name/current

rm fsimage

4.3.7 from SSN Middle copy fsimage To NN in

stay NameNode in SSN Path is /usr/local/hadoop-1.1.2/tmp/dfs/namesecondary, hold current It's under the folder fsimage copy to NN In the path , The order is as follows :

cd /usr/local/hadoop-1.1.2/tmp/dfs/namesecondary

cp fsimage /usr/local/hadoop-1.1.2/hdfs/name/current/

4.3.8 restart Hadoop

start-up Hadoop, Use jps Command view ,namenode The process started normally

cd /usr/local/hadoop-1.1.2/bin

./start-all.sh

4.4  problem 3-- change HDFS Block size

4.4.1 stay Hadoop To establish /input Folder

Use the following command in Hadoop Created in /input Folder , hold start At the beginning sh Put the script file in the file :

cd /usr/local/hadoop-1.1.2/bin

./hadoop fs -mkdir /input

./hadoop fs -put *.sh /input

./hadoop fs -ls /input

4.4.2 View the current blocksize

see hadoop2 Node data block size , As shown in the figure below :

4.4.3 modify hdfs-site.xml The configuration file

stay NameNode node hadoop1 Revision in China hdfs-site.xml The configuration file , Add the following configuration :

  <property>

    <name>dfs.block.size</name>

      <value>134217728</value>

  </property>

4.4.4 restart Hadoop

Restart Hadoop Program

./stop-all.sh

./start-all.sh

4.4.5 Check the current... Again blocksize

Use the following command in Hadoop Created in /input1 Folder and copy the file to that folder to cover the previously operated file :

./hadoop fs -mkdir /input1

./hadoop fs -put *.sh /input1

Check again block size, As shown in the figure below

4.5  problem 4--SNN And NN The separation of

4.5.1 Copy virtual machine

Copy NameNode The virtual machine where the node is located acts as SecondaryNameNode Running virtual machines

4.5.2 Set up SNN virtual machine IP Address

Set up the virtual machine IP The address is :10.88.147.224

4.5.3 Set up SNN Virtual machine name

Set up SNN The virtual machine name is :hadoop4

sudo vi /etc/sysconfig/network

4.5.4 All nodes hosts File joining SNN Of IP Corresponding information

At all nodes /etc/hosts Add... To the document SNN Of IP Address 10.88.147.224 Corresponding hadoop4

sudo vi /etc/hosts

4.5.5 All nodes masters File joining SNN Information

At all nodes masters Add... To the document SNN Machine name information , Use the following command :

sudo vi /usr/local/hadoop-1.1.2/conf/masters

stay master Add... To the document SNN machine name

4.5.6 Modify in all nodes hdfs-site.xml

Use the following command to edit hdfs-site.xml The configuration file :

sudo vi /usr/local/hadoop-1.1.2/conf/hdfs-site.xml

stay hdfs-site.xml Add the following information to the file :

  <property>

    <name>dfs.secondary.http.address</name>

      <value>hadoop4:50090</value>

  </property>

4.5.7 Restart all virtual machines

4.5.8 To configure ssh Password free login

1.     stay hadoop4(10.88.147.244) Nodes use ssh-keygen -t rsa Generate the private and public keys ;

2.     hold hadoop4(10.88.147.244) The public key information in the node is added to authorized_keys In file ;

ll

chmod 400 -R /home/hadoop/.ssh

cat id_rsa.pub >> authorized_keys

cat authorized_keys

3.     hold authorized_keys Distributed to each node ;

scp authorized_keys hadoop@hadoop1:/home/hadoop/.ssh

4.     Verify whether you can log on to each node without logging on ;

4.5.9 Reformat NameNode

stay /usr/local/hadoop-1.1.2/bin Use the following command to format :

./hadoop namenode -format

4.5.10   start-up Hadoop

Start with the following command Hadoop:

cd /usr/local/hadoop-1.1.2/bin

./start-all.sh

4.5.11   verification

1.     stay hadoop1(NN) Check the progress , Launched the NameNode、JobTracker Two processes :

2.     stay hadoop2、hadoop3 Check the progress , Launched the TraskTracker process :

( Notice that on this node DataNode It didn't start , The problem is due to NameNode and DataNode Between namespaceID It's caused by inconsistencies , The solution to the problem is as follows 4.6 Described )

3.     stay hadoop4(SNN) Check the progress , Launched the SecondaryNameNode process :

4.6  problem 5-- Format again namenode, here datanode Can you join

4.6.1 stop it Hadoop And format

Use the following command to stop Hadoop And format :

./stop-all.sh

./hadoop namenode -format

4.6.2 start-up Hadoop, And look at datanode state

Use ./start-all.sh start-up Hadoop:

stay datanode Use in jps View startup status :

4.6.3 see datanode journal

see datanode node hadoop2 Log contents in the log folder :

cd /usr/local/hadoop-1.1.2/logs

cat hadoop-hadoop-datanode-hadoop2.log

The error message is namenode And datanode Between namespaceID atypism :

2014-09-30 10:04:41,890 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /usr/local/hadoop-1.1.2/hdfs/data: namenode namespaceID = 87263132; datanode namespaceID = 1318122769

        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)

        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:399)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:309)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1651)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1590)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1608)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1734)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1751)

4.6.4 terms of settlement

There are two solutions :

Ø  Modify all datanode in /usr/local/hadoop-1.1.2/tmp/dfs/data/current/VERSION Of documents namespaceID by namenode Of namespaceID( Use this method )

Ø  Delete datanode in /usr/local/hadoop-1.1.2/tmp/dfs/data Catalog

Log in to hadoop1 Node , Get the node NameNode Of namespaceID:

cd /usr/local/hadoop-1.1.2/hdfs/name/current

cat VERSION

Log in to hadoop2 and hadoop3 Node , modify DataNode Of namespaceID by NameNode Of namespaceID Value :

cd /usr/local/hadoop-1.1.2/hdfs/data/current

vi VERSION

4.6.5 Restart the cluster , see datanode state

stay namenode node hadoop1 Use ./start-all.sh start-up Hadoop:

./start-all.sh

stay datanode node hadoop2 Use in jps View startup status :

datanode The process has started

4.7  problem 6-- control namenode The frequency of checkpoints

4.7.1 stay core-site.xml Modify checkpoint frequency in

In an hour 3600 second , stay namenode Node core-site.xml file , Add the following configuration :

  <property>

    <name>fs.checkpoint.period</name>

      <value>180</value>

  </property>

4.7.2 Restart the cluster , Check the checkpoint update frequency

Minutes to check , Find out namenode every other 180 second checkpoint Make an update :

4.7.3 Observe checkpoint Before and after namenode The change of

1.     Before the checkpoint :

:39.

l  16:40 towards hdfs The system joins input file ,namenode Medium edits Record this operation , Its modification time is 16:40

2.     After the checkpoint

l  namenode Medium fsimage、fsimage、fstime、VERSION Wait for the document to be in 16:42 A checkpoint update was made

4.7.4 The basic principle

When it comes to distance checkpoint Time by ${fs.checkpoint.period} when :

1. SSN request NN rolling edits file , Make new edits log Put it in another newly generated edits file .

2. SSN adopt HTTP GET obtain NN Of fsimage and edits file

3. SSN take fsimage File loading memory , And Application edits Every operation in the file , This creates a new synthetic fsimage file .

4. SSN use HTTP POST The way Just synthesized fsimage Send back NN

5. NN Use just from SSN received fsimage Instead of the old version of fsimage, And use what you generated in the first step edits Instead of the original edits, At the same time fctime File update to checkpoint When it happened

Final , The name node has a copy of the latest fsimage Files and a shorter edits file ( The edits The file doesn't have to be empty , When SSN In execution checkpoint In operation ,edits Maybe some of them have been recorded hdfs Operation of the system )

Problem solving

5.1  install CentOS64 Bit virtual machine This host supports Intel VT-x, but Intel VT-x is disabled

Bit virtual machine , The following error occurred during installation :

Press F1 Key in BIOS Setup Utility Use the arrow keys security Look under the panel for virtualization Press Enter key go in Intel  VirtualizationTechnology Change to Enabled Press F10 Key save and exit choice Yes Press Enter key Shut down completely ( Turn off the power ) Wait a few seconds to restart the computer Intel Virtualization technology started successfully

5.2  *** is not in the sudoers file resolvent

When using hadoop Users need to empower the folder , Use chmod The command appears “hadoop is not in the sudoers file.  This incident will be reported” error , As shown below :

1.     Use su Order to enter root user

2.     Add write permission to file , The operation command is :chmod u+w /etc/sudoers

3.     edit /etc/sudoers file , Use command "vi /etc/sudoers" Enter edit mode , find :"root ALL=(ALL) ALL" Add "hadoop ALL=(ALL) ALL", Then save to exit .

5.3  yum Unable to download

1.     stay /etc/yum.conf Add proxy=htt://XX.XXX.XX:PORT

2.     Restart the network

3.     Run again yum install ambari-server It can be downloaded normally

5.4  CentOS 64bit install Hadoop2.2.0 There is an exception in the number of file compilation bits

In the installation hadoop2.2.0 The following exception occurred in the process :Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Bit compilation , Unable to adapt CentOS 64 Bit environment causes

There are two solutions :

l  recompile hadoop, And then redeploy

l  The temporary solution is to modify the configuration , Ignore problematic files

5.5  compile Hadoop2.2.0 A code exception occurred

current 2.2.0 Of Source Code The compressed package is decompressed code There is one bug need patch Before you can compile . Otherwise compile hadoop-auth The following error will be prompted :

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCompile) on project hadoop-auth: Compilation failure: Compilation failure:

[ERROR] /home/hadoop/Downloads/release-2.2.0/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/client/AuthenticatorTestCase.java:[88,11] error: cannot access AbstractLifeCycle

[ERROR] class file for org.mortbay.component.AbstractLifeCycle not found

[ERROR] /home/hadoop/Downloads/release-2.2.0/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/client/AuthenticatorTestCase.java:[96,29] error: cannot access LifeCycle

[ERROR] class file for org.mortbay.component.LifeCycle not found

Directly modifying hadoop-common-project/hadoop-auth/pom.xml, In fact, one bag is missing , Add one dependency:

<dependency>

      <groupId>org.mortbay.jetty</groupId>

      <artifactId>jetty-util</artifactId>

      <scope>test</scope>

</dependency>

Hadoop The first 3 Weekly exercises --Hadoop2.X Compile, install and experiment more related articles

  1. Hadoop The first 6 Weekly exercises — stay Eclipse Install in Hadoop Plug in and test (Linux operating system )

    1    Operating environment description 1.1     Hardware and software environment 1.2     Machine network environment 2    : install Eclipse And test the 2.1     Content 2.2     Implementation process 2.2.1   2.2.2   ...

  2. hadoop2.x Compilation and installation

    Build instructions for Hadoop (Hadoop Compilation and installation , Reference resources hadoop Source package BUILDING.txt file ) ----------------------------- ...

  3. Hadoop series ( One )hadoop2.2.0 Source code compilation

    operating system :Centos Conditions required : Internet access Tools needed to compile : apache-ant-1.9.2-bin.tar.gz apache-maven-3.0.5-bin.tar.gz findbugs-2.0. ...

  4. 《OD learn hadoop》 In the second week of 0702

    Big data offline computing hadoop2.x Three weeks (6 God ) markdown Text clipper Luo Zhenyu -- New Year's speech , A friend of time  http://tech.163.com/16/0101/11/BC87H8DF000915B ...

  5. Hadoop2.2.0 Installation configuration manual ! Complete distribution Hadoop Cluster building process ~( A painstaking work ~~)

    http://blog.csdn.net/licongcong_0224/article/details/12972889 It took more than a week , Finally build the latest version hadoop2.2 colony , During this period, we encountered various problems , As ...

  6. appendix A Compilation and installation Hadoop

    A.1  compile Hadoop A.1.1  Set up the environment The first step is to install and set up maven 1.  download maven Installation package Proposed installation 3.0 Above version ( because Spark2.0 Compilation requirements Maven3.3.9 And above ), This time ...

  7. HADOOP Installation guide -Ubuntu15.10 and hadoop2.7.2

    Ubuntu15.10 Install in hadoop2.7.2 Installation manual Too early Catalog 1.      Hadoop Single point mode ... 2 1.1        Installation steps ... 2 0. Environment and version ... 2 1. stay ubu ...

  8. Install... Under the virtual machine hadoop Integrated environment (centos7+hadoop-2.6.4+jdk-7u79)

    [1]64 by win7 System , use virtualbox establish linux When virtual machines , Why not? 64 Bit options ?  Baidu [2] stay virtualbox Installation on centos7 [3]VirtualBox Virtual machine network environment analysis and ...

  9. hadoop-2.2.0 Compile, install and HA To configure

    One preparation The preparation requires 1.centOs 6.4, add to hadoop user , Configure the /etc/hosts file . 2. install hadoop User ssh, And get through all the machines in the cluster ,(ha perform fencing ...

Random recommendation

  1. JavaBean knowledge

    Four .JavaBean The concept of 1.JavaBean It has the characteristics of :a. Fields are private . private String name;b. Provide public getter or setter Method ( attribute ).getter or sett ...

  2. gulp Junior course I'll try to get out of the way

    1. Install it all over the world : $ npm install gulp -g 2.cd Go to the root of the project ( If it works with the program , If the program is built first , You can find it js and css also images Folder .) I was right here blocking , No ...

  3. NSoperation Thread communication

    Global variables @property (weak, nonatomic) IBOutlet UIImageView *imageView; @property (nonatomic, strong) NSOp ...

  4. CSS Properties that will be inherited

    Text color( Color ,a Except for elements ) direction( Direction ) font( typeface ) font-family( HTTP: ) font-size( font size ) font-style( Used to set italics ) font- ...

  5. BZOJ 2083 Intelligence test

    use vector, Two points . #include<iostream> #include<cstdio> #include<cstring> #include<algo ...

  6. [cocos2d demo] Word recognition games

    2013.9.5 Update Second Edition The game is divided into three scenes , The main scene respectively , Loading scenes and game scenes , The game scene is divided into background layers , The logical layer and UI layer 1. background : Spin the sun , Moving waves , Floating clouds 2.UI layer : Randomly generated words attached to moving on a sailboat , When ...

  7. USACO Section 1.2 Transformations Problem solving report

    subject Title Description A piece of N x N The pattern of black and white tiles of the square will be converted into a new square pattern . Write a program to find out the minimum way to convert the original pattern into a new pattern according to the following conversion methods : turn 90 degree : The pattern turns clockwise 90 degree . turn 1 ...

  8. IFE the second day

    HTML Hypertext markup language ,HTML5 It's the next generation HTML standard . HTML Elements are components HTML Part of the document ,HTML The attribute is HTML The element provides additional information . The document type <!DOCTYPE> The statement helps the browser get it right ...

  9. Create a LAN yum The server

    Installation is required first createrepo This package , Use yum Can be installed Create a Library folder , such as mkdir -p /var/www/html/myrepo/x86_64, Copy the package to the folder . cd To the file ...

  10. ss with kcptun

    install ss apt search shadowsocks shadowsocks/kali-rolling,kali-rolling,now 2.9.0-2 all [installed] ...