Spark At present, it supports a variety of distributed deployment modes : One 、Standalone Deploy Mode; Two Amazon EC2、; 3、 ... and 、Apache Mesos; Four 、Hadoop YARN. The first way is to deploy alone , You don't need to have a dependent resource manager , The other three need to be spark Deploy to the corresponding resource manager .

In addition to the multiple ways of deployment , Newer versions of Spark Support for multiple hadoop platform , For instance from 0.8.1 Versions are supported separately Hadoop 1 (HDP1, CDH3)、CDH4、Hadoop 2 (HDP2, CDH5). at present Cloudera The company's CDH5 In use CM When installing , Direct choice Spark Service installation .

at present Spark The latest version is 1.0.0.

We'll take 1.0.0 edition , Let's see how to achieve Spark Installation of distributed cluster :

One 、Spark 1.0.0 need JDK1.6 Or later , We use jdk 1.6.0_31;

Two 、Spark 1.0.0 need Scala 2.10 Or later , We use scala 2.10.3;

3、 ... and 、 from https://spark.apache.org/downloads.html Download the appropriate bin Package to install , We choose CDH4 Version of spark-1.0.0-bin-cdh4.tgz; Download to tongjihadoop165 On ;

Four 、 decompression bin package :tar –zxf spark-1.0.0-bin-cdh4.tgz;

5、 ... and 、 rename :mv spark-1.0.0-bin-cdh4 spark-1.0.0-cdh4;

6、 ... and 、cd spark-1.0.0-cdh4 ;

mv ./conf/spark-env.sh.template ./conf/spark-env.sh

7、 ... and 、vi ./conf/spark-env.sh Add the following :

export SCALA_HOME=/usr/lib/scala-2.10.3

export JAVA_HOME=/usr/java/jdk1.6.0_31

export SPARK_MASTER_IP=10.32.21.165

export SPARK_WORKER_INSTANCES=3

export SPARK_MASTER_PORT=8070

export SPARK_MASTER_WEBUI_PORT=8090

export SPARK_WORKER_PORT=8092

export SPARK_WORKER_MEMORY=5000m

SPARK_MASTER_IP This means master Of IP Address ;SPARK_MASTER_PORT This is master port ;SPARK_MASTER_WEBUI_PORT This is to check the operation of the cluster WEB UI Port number ;SPARK_WORKER_PORT This is all about worker The port of Number ;SPARK_WORKER_MEMORY This configuration is for each worker Running memory .

8、 ... and 、vi ./conf/ slaves  Each row of a worker The host name , The contents are as follows :

10.32.21.165

10.32.21.166

10.32.21.167

Nine 、( Optional ) Set up  SPARK_HOME  environment variable , And will  SPARK_HOME/bin  Join in  PATH:

vi /etc/profile , Add the following :

export SPARK_HOME=/usr/lib/spark-1.0.0-cdh4

export PATH=$SPARK_HOME/bin:$PATH

Ten 、 take tongjihadoop165 Upper spark Copied to the tongjihadoop166 and tongjihadoop167 On :

sudo scp -r hadoop@10.32.21.165:/usr/lib/spark-1.0.0-cdh4  /usr/lib

install scala You can also copy files remotely and modify environment variable files in this way /etc/profile, Don't forget after the change source.

11、 ... and 、 perform    ./sbin/start-all.sh     start-up spark colony ;

If start-all Mode cannot start the related process normally , Can be in $SPARK_HOME/logs Directory to view the relevant error information . Actually , You can look like Hadoop Start the related process separately , stay master Run the following command on the node :

stay Master On the implementation :./sbin/start-master.sh

stay Worker On the implementation :./sbin/start-slave.sh 3 spark://10.32.21.165:8070 --webui-port 8090

Twelve 、 Check if the process starts , perform jps command , You can see Worker Process or Master process . Then you can go to WEB UI Check out http://tongjihadoop165:8090/ You can see everything work  node , And their  CPU  Number and memory information .

13、 ... and 、Local mode demo

such as :./bin/run-example SparkLR 2 local   perhaps   ./bin/run-example SparkPi 2 local

These are two examples of the former calculating linear regression , Iterative calculation ; The latter is to calculate the PI

fourteen 、 Start interactive mode :./bin/spark-shell --master spark://10.32.21.165:8070 , If in conf/spark-env.sh Configuration of the MASTER( add a sentence export MASTER=spark://${SPARK_MASTER_IP}:${SPARK_MASTER_PORT}), You can use it directly   ./bin/spark-shell Launched the .

spark-shell As an application , Is to submit the assignment to spark colony , then spark Clusters are assigned to specific worker To deal with it ,worker Local files are read when processing jobs .

This shell It's modified scala shell, Open one like this shell Will be in WEB UI You can see a running Application, Here's the picture :

At the bottom is the run complete Applications,workers A list is a list of nodes in a cluster .

We can open it here shell Next pair HDFS Do some calculations with the data on the , stay shell Enter... In turn :

A、val file = sc.textFile("hdfs://10.32.21.165:8020/1639.sta")  # This is loading HDFS Documents in

B、file.map(_.size).reduce(_+_)         # This is to calculate the number of characters in the file

Operation of the , Here's the picture :

It turns out that there are 346658513 Characters . It's very fast. It takes less than 3s.

perhaps B Stage execution val count = file.flatMap(line => line.split("\t")).map(word => (word, 1)).reduceByKey(_+_) and count.saveAsTextFile("hdfs://10.32.21.165:8020/spark") Store the calculation results in HDFS Upper /spark Under the table of contents .

It can also be executed ./bin/spark-shell --master local[2] , Start a local shell ,[2] You can specify the number of threads , The default is 1.

perform exit You can quit shell.

15、 ... and 、 perform    ./sbin/stop-all.sh   stop it spark colony

It can also be done through a separate process stop Script termination

Be careful : Three machines spark The directory must be the same , because master Will log in to worker Carry out the order ,master Think worker Of spark The path is the same as yourself .

Reference resources :

1、  http://www.linuxidc.com/Linux/2014-06/103210p2.htm

2、  http://spark.apache.org/docs/latest/

3、  http://blog.csdn.net/myrainblues/article/details/22084445

Spark-1.0.0 standalone More articles on Distributed installation tutorial

  1. arcsde10.0 for oracle11g Distributed installation tutorial

    [ operating system ] oracle :windows server 2008ArcSDE:win7[ Database version ]  Oracle 11g [ArcSDE edition ]  ArcSDE 10.0 1. To install ArcSD ...

  2. CentOS7-64bit compile Hadoop-2.5.0, And Distributed installation

    Abstract CentOS7-64bit compile Hadoop-2.5.0, And Distributed installation Catalog [-] 1. System environment description 2. Preparation before installation 2.1 Turn off firewall 2.2 Check ssh Installation , If not, install ssh ...

  3. MySQL 8.0.12 be based on Windows Installation tutorial ( Super detailed )

    MySQL 8.0.12 be based on Windows Installation tutorial ( Super detailed ) ( Step by step , I can't pretend you came to me !) This tutorial only applies to Windows System , If you did, you didn't , Be sure to delete the original database first , perform :mysqld ...

  4. Hadoop2.2.0 Multi node distributed installation and testing

    as everyone knows ,hadoop stay 10 End of month release The latest version of 2.2. Many domestic technical colleagues immediately launched their new version on the Internet hadoop My experience in configuration . There are two main categories : 1. Single node configuration This is too easy , It's as simple as knowing ...

  5. mysql 8.0.19 win10 Quick installation tutorial

    This tutorial is for you to share mysql 8.0.19 Installation tutorial , For your reference , The details are as follows 1. download .zip The installation files 2. The root directory stores my.ini, The file path uses “/” Division , for example : [mysqld] port=3306 ...

  6. phpstorm10.0.3 Cracked version installation tutorial and Chinese method

    phpstorm Is a lightweight and convenient PHP IDE, It's designed to provide user efficiency , Deep understanding of user code , Provide intelligent code completion , Quick navigation and instant error checking . Not only php The weapon of development , Front end development is no inferior . Here's the record php ...

  7. mysql8.0.11 Green installation tutorial

    Unzip to the installation directory Create... In the root directory data Folder establish my.ini file The code is as follows # Other default tuning values # MySQL Server Instance Configur ...

  8. Linux Next Storm2.1.0 Pseudo Distributed installation of

    Official download website :http://storm.apache.org/downloads.html 1. The first step is to download the decompression package from the official website  2. Then decompress 3. Configure environment variables stay profile Insert the following format statement ...

  9. Hadoop、Zookeeper、Hbase Distributed installation tutorial

    Reference resources : Hadoop Installation tutorial _ Pseudo distributed configuration _CentOS6.4/Hadoop2.6.0   Hadoop Cluster installation and configuration tutorial _Hadoop2.6.0_Ubuntu/CentOS ZooKeeper-3.3 ...

Random recommendation

  1. leetcode 236. Lowest Common Ancestor of a Binary Tree

    Given a binary tree, find the lowest common ancestor (LCA) of two given nodes in the tree. According ...

  2. Centos User login failed N After the second time, the user is locked and forbidden to log in

    in the light of linux Users on the , If the user continues 3 Login failed , Just lock the user , After a few minutes, the user automatically unlocks Linux There is one pam_tally2.so Of PAM modular , To limit the number of login failures , If the number of times reaches the set threshold , Then lock ...

  3. jquery A powerful tool to exchange data with the server --ajax( asynchronous javascript and xml)

    load() Method to load data from the server , And put the returned data into the selected elements . One . The following example puts "demo_test.txt" In file id="p1" The content of the element , load ...

  4. digit dp-POJ-3252-Round Numbers

    I've been reading and blogging lately , Even if you do a few questions, you will see others write them , I don't feel like I have anything of my own , So I haven't updated my blog for a long time Looking at a lot of numbers dp The problem and the solution of the problem , Find the number dp The title has a fixed template , And finally I made one myself . I think using memory search ...

  5. [LightOJ1004]Monkey Banana Problem(dp)

    Topic link :http://lightoj.com/login_main.php?url=volume_showproblem.php?problem=1004 The question : The deformation of several towers , The top one, the bottom one , See clearly ...

  6. python actual combat

    use Python Write a real Web App! The goal is The actual goal we set is a Blog Website , Include logs . Users and comments 3 Most of the . such as webpy.org It's one of the Blog Example , Visual inspection means 100 Line code . We ...

  7. make the best of HTML Tag element – ordinary xtyle The front frame

    xtyle The framework makes full use of semantic tags to beautify the style , Compatible with a variety of mainstream browsers , Include IE8. xtyle The frame is not BS So strong , But I think it's also very practical , It's not very big , For enterprise websites .WordPress The theme . Personal website . Blog ...

  8. 8 year , from 2D To 3D, My way of learning

    Mickey Write an article < A graduate's two-year experience in Entrepreneurship >, From his point of view , Summed up our two years of cooperation experience . I'll write an article, too , Introduce my way of learning , I hope that's helpful , Thank you. - My way of learning 1. Directly from 0 ...

  9. Anonymous functions 、 Higher order functions and map

    Recent knowledge points # Anonymous functions n = lambda name:name+"_a" print(n("alex")) # Higher order function # 1. Parameter has function 2. return ...

  10. 18. socket io

    similar dataservice We socket io Interact with the back end We can also make it special service Let's introduce Why not cdn Well ? because client It's from our server I got it socket.io ...