Spark 常用的几种模式部署

Spark 支持本地运行模式(Local 模式)、独立运行模式(Standalone 模式)、Mesos、YARN(Yet Another Resource Negotiator)、Kubernetes 模式等。

一、Spark Local 模式

1.1 下载解压 Spark

1
2
3
4
5
6
7
8
9
[hadoop@hadoop1 ~/downloads]$ wget https://dlcdn.apache.org/spark/spark-3.1.2/spark-3.1.2-bin-hadoop2.7.tgz
[hadoop@hadoop1 ~]$ tar xf downloads/spark-3.1.2-bin-hadoop2.7.tgz

#配置环境变量
[hadoop@hadoop1 ~]$ vim .bash_profile
export SPARK_HOME=/home/hadoop/spark-3.1.2-bin-hadoop2.7
export PATH=$HIVE_HOME/bin:$PATH:$STORM_HOME/bin:$SPARK_HOME/bin

[hadoop@hadoop1 ~]$ source .bash_profile

解压目录说明

bin:可执行脚本
conf:配置文件
data:示例程序使用数据
examples:示例程序
jars:依赖 jar 包
python:pythonAPI
R:R 语言 API
sbin:集群管理命令
yarn:整合yarn需要的东西

1.2 启动 spark-shell

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[hadoop@hadoop1 ~]$ ./spark-3.1.2-bin-hadoop2.7/bin/spark-shell --master local[1]
21/11/09 10:39:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop1:4040
Spark context available as 'sc' (master = local[1], app id = local-1636425555582).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.1.2
/_/

Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_301)
Type in expressions to have them evaluated.
Type :help for more information.

spark-shell –master local[N] 表示在本地模拟N个线程来运行当前任务
spark-shell –master local[*] 表示使用当前机器上所有可用的资源

1.3 计算圆周率案例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[hadoop@hadoop1 ~]$ ./spark-3.1.2-bin-hadoop2.7/bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--executor-memory 1G \
--total-executor-cores 1 \
spark-3.1.2-bin-hadoop2.7/examples/jars/spark-examples_2.12-3.1.2.jar \
100

…………
21/11/09 11:01:43 INFO TaskSetManager: Finished task 99.0 in stage 0.0 (TID 99) in 30 ms on hadoop1 (executor driver) (100/100)
21/11/09 11:01:43 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 2.820 s
21/11/09 11:01:43 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
21/11/09 11:01:43 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
21/11/09 11:01:43 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
21/11/09 11:01:43 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 2.956005 s
Pi is roughly 3.1415763141576316
21/11/09 11:01:43 INFO SparkUI: Stopped Spark web UI at http://hadoop1:4041
21/11/09 11:01:44 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/11/09 11:01:44 INFO MemoryStore: MemoryStore cleared
21/11/09 11:01:44 INFO BlockManager: BlockManager stopped
21/11/09 11:01:44 INFO BlockManagerMaster: BlockManagerMaster stopped
21/11/09 11:01:44 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/11/09 11:01:44 INFO SparkContext: Successfully stopped SparkContext
21/11/09 11:01:44 INFO ShutdownHookManager: Shutdown hook called
21/11/09 11:01:44 INFO ShutdownHookManager: Deleting directory /tmp/spark-19b0ca6d-7ff5-4a75-b8d3-df59185049fc
21/11/09 11:01:44 INFO ShutdownHookManager: Deleting directory /tmp/spark-6400e584-4c21-427a-a445-644793b60378

二、Spark Standalone 模式

服务规划

服务器IP地址spark角色其它依赖
hadoop110.10.8.11Master、SlaveJDK1.8
hadoop210.10.8.12SlaveJDK1.8
hadoop310.10.8.13SlaveJDK1.8

2.1 准备工作

服务器免密登录

1
2
3
4
5
[dev@hadoop1 ~]# su - hadoop
[hadoop@hadoop1 ~]$ ssh-keygen
[hadoop@hadoop1 ~]$ ssh-copy-id hadoop1
[hadoop@hadoop1 ~]$ ssh-copy-id hadoop2
[hadoop@hadoop1 ~]$ ssh-copy-id hadoop3

配置jdk 1.8环境

1
2
3
4
5
6
7
8
[hadoop@hadoop1 ~]$ tar xf downloads/jdk-8u301-linux-x64.tar.gz
[hadoop@hadoop1 ~]$ vim .bash_profile
export JAVA_HOME=/home/hadoop/jdk1.8.0_301
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

[hadoop@hadoop1 ~]$ source .bash_profile

2.2 修改 spark 配置

1
2
3
4
#下载spark
[hadoop@hadoop1 ~/downloads]$ wget https://dlcdn.apache.org/spark/spark-3.1.2/spark-3.1.2-bin-hadoop2.7.tgz
[hadoop@hadoop1 ~]$ tar xf downloads/spark-3.1.2-bin-hadoop2.7.tgz
[hadoop@hadoop1 ~]$ cd spark-3.1.2-bin-hadoop2.7/conf/

修改 spark-env.sh 配置文件

1
2
3
4
5
6
7
8
9
10
[hadoop@hadoop1 ~/spark-3.1.2-bin-hadoop2.7/conf]$ cp spark-env.sh.template spark-env.sh
[hadoop@hadoop1 ~/spark-3.1.2-bin-hadoop2.7/conf]$ vim spark-env.sh
# java环境变量
export JAVA_HOME=/home/hadoop/jdk1.8.0_301
# spark home
export SPARK_HOME=/home/hadoop/spark-3.1.2-bin-hadoop2.7
# spark集群master进程主机host
export SPARK_MASTER_HOST=hadoop1
# spark运行产生临时数据目录
export SPARK_LOCAL_DIRS=$SPARK_HOME/datas

修改 slaves 配置文件

1
2
3
4
[hadoop@hadoop1 ~/spark-3.1.2-bin-hadoop2.7/conf]$ vim slaves
hadoop1
hadoop2
hadoop3

2.3 分发 spark 配置

将spark配置拷贝到其它服务器上

1
2
[hadoop@hadoop1 ~]$ rsync -av spark-3.1.2-bin-hadoop2.7 hadoop@hadoop2:~/
[hadoop@hadoop1 ~]$ rsync -av spark-3.1.2-bin-hadoop2.7 hadoop@hadoop3:~/

2.4 启动 spark 集群

1
2
3
4
5
#启动master
[hadoop@hadoop1 ~]$ ./spark-3.1.2-bin-hadoop2.7/sbin/start-slave.sh

#启动slaves
[hadoop@hadoop1 ~]$ ./spark-3.1.2-bin-hadoop2.7/sbin/start-slaves.sh

2.5 计算圆周率案例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Standalone cluster模式运行
[hadoop@hadoop3 ~/spark-3.1.2-bin-hadoop2.7]$ ./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://hadoop1:7077 \
--deploy-mode cluster \
examples/jars/spark-examples_2.12-3.1.2.jar \
1000

# Standalone client模式运行
[hadoop@hadoop3 ~/spark-3.1.2-bin-hadoop2.7]$ .bin/spark-submit \
--master spark://hadoop1:7077 \
--class org.apache.spark.examples.SparkPi \
examples/jars/spark-examples_2.12-3.1.2.jar \
1000

三、Spark Standalone 高可用

服务规划

服务器IP地址spark角色其它依赖
hadoop110.10.8.11Master、SlaveJDK1.8、zookeeper
hadoop210.10.8.12Master、SlaveJDK1.8、zookeeper
hadoop310.10.8.13SlaveJDK1.8、zookeeper

3.1 修改 spark 配置

Spark Standalone 模式的基础上修改spark-env.sh配置文件,增加zk配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[hadoop@hadoop1 ~]$ vim spark-3.1.2-bin-hadoop2.7/conf/spark-env.sh
# java环境变量
export JAVA_HOME=/home/hadoop/jdk1.8.0_301
# spark home
export SPARK_HOME=/home/hadoop/spark-3.1.2-bin-hadoop2.7
# spark集群master进程主机host
export SPARK_MASTER_HOST=hadoop1
# spark运行产生临时数据目录
export SPARK_LOCAL_DIRS=$SPARK_HOME/datas
# 配置zk 此处能够独立配置zk list,逗号分隔
export SPARK_DAEMON_JAVA_OPTS="
-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url=hadoop1,hadoop2,hadoop3
-Dspark.deploy.zookeeper.dir=/spark"

3.2 分发 spark 配置

将spark配置拷贝到其它服务器上

1
2
[hadoop@hadoop1 ~]$ rsync -av spark-3.1.2-bin-hadoop2.7 hadoop@hadoop2:~/
[hadoop@hadoop1 ~]$ rsync -av spark-3.1.2-bin-hadoop2.7 hadoop@hadoop3:~/

3.3 修改 master2 配置

修改master2机器上的spark-env.sh配置文件 SPARK_MASTER_HOST 参数,将master地址指向master2服务器

1
2
3
[hadoop@hadoop2 ~]$ vim spark-3.1.2-bin-hadoop2.7/conf/spark-env.sh

export SPARK_MASTER_HOST=hadoop2

3.4 启动 spark 集群

1
2
3
4
5
6
7
#启动master1
[hadoop@hadoop1 ~]$ ./spark-3.1.2-bin-hadoop2.7/sbin/start-master.sh
#启动master2
[hadoop@hadoop2 ~]$ ./spark-3.1.2-bin-hadoop2.7/sbin/start-master.sh

#启动slaves
[hadoop@hadoop1 ~]$ ./spark-3.1.2-bin-hadoop2.7/sbin/start-slaves.sh

如果和其它服务端口冲突可以修改 start-master.sh 中的 SPARK_MASTER_WEBUI_PORT=8080 更改为其它端口

查看 spark web页面

图片1

图片2

3.4 验证 高可用

停止 master 节点服务

1
[hadoop@hadoop1 ~]$ ./spark-3.1.2-bin-hadoop2.7/sbin/stop-master.sh

访问master1节点

图片3

访问master2节点

图片4

四、Spark on YARN模式

服务规划

主机名IP地址spark角色依赖服务
hadoop110.10.8.11Master、SlaveJDK1.8、YARN
hadoop210.10.8.12SlaveJDK1.8、YARN
hadoop310.10.8.13SlaveJDK1.8、YARN

前置条件

hadoop yarn集群是可用的

4.1 准备工作

服务器免密登录

1
2
3
4
5
[dev@hadoop1 ~]# su - hadoop
[hadoop@hadoop1 ~]$ ssh-keygen
[hadoop@hadoop1 ~]$ ssh-copy-id hadoop1
[hadoop@hadoop1 ~]$ ssh-copy-id hadoop2
[hadoop@hadoop1 ~]$ ssh-copy-id hadoop3

配置jdk 1.8环境

1
2
3
4
5
6
7
8
[hadoop@hadoop1 ~]$ tar xf downloads/jdk-8u301-linux-x64.tar.gz
[hadoop@hadoop1 ~]$ vim .bash_profile
export JAVA_HOME=/home/hadoop/jdk1.8.0_301
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

[hadoop@hadoop1 ~]$ source .bash_profile

4.2 修改 spark 配置

修改master2机器上的spark-env.sh配置,增加hadoop YARN配置文件目录

1
2
3
4
5
6
7
8
[hadoop@hadoop1 ~/spark-3.1.2-bin-hadoop2.7/conf]$ cp spark-env.sh.template spark-env.sh
[hadoop@hadoop1 ~/spark-3.1.2-bin-hadoop2.7/conf]$ vim spark-env.sh
export JAVA_HOME=/home/hadoop/jdk1.8.0_301
export SPARK_HOME=/home/hadoop/spark-3.1.2-bin-hadoop2.7
export SPARK_MASTER_HOST=hadoop1
export SPARK_LOCAL_DIRS=$SPARK_HOME/datas
export HADOOP_HOME=/home/hadoop/hadoop-2.7.2
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

修改 slaves 配置文件

1
2
3
4
[hadoop@hadoop1 ~/spark-3.1.2-bin-hadoop2.7/conf]$ vim slaves
hadoop1
hadoop2
hadoop3

4.3 分发 spark 配置

将spark配置拷贝到其它服务器上

1
2
[hadoop@hadoop1 ~]$ rsync -av spark-3.1.2-bin-hadoop2.7 hadoop@hadoop2:~/
[hadoop@hadoop1 ~]$ rsync -av spark-3.1.2-bin-hadoop2.7 hadoop@hadoop3:~/

4.4 启动 spark 集群

1
2
3
4
5
#启动master
[hadoop@hadoop1 ~]$ ./spark-3.1.2-bin-hadoop2.7/sbin/start-master.sh

#启动slaves
[hadoop@hadoop1 ~]$ ./spark-3.1.2-bin-hadoop2.7/sbin/start-slaves.sh

如果和其它服务端口冲突可以修改 start-master.sh 中的 SPARK_MASTER_WEBUI_PORT=8080 更改为其它端口

4.5 运行示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# cluster模式运行
[hadoop@hadoop1 ~/spark-3.1.2-bin-hadoop2.7]$ ./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
examples/jars/spark-examples_2.12-3.1.2.jar \
10

# client模式运行
[hadoop@hadoop1 ~/spark-3.1.2-bin-hadoop2.7]$ ./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
examples/jars/spark-examples_2.12-3.1.2.jar \
10

client模式如果运行报错:ERROR spark.SparkContext: Error initializing SparkContext. 修改 spark-env.sh 添加 export SPARK_LOCAL_IP=<IP_Address> 配置

4.6 在 YRAN 上查看任务

图片5

五、Spark on Mesos 模式

服务规划

主机名IP地址mesos角色
hadoop110.10.8.11Master、Agent

前置条件

spark 是可用的

5.1 部署mesos

编译安装 mesos

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[hadoop@hadoop1 ~/downloads]$ wget http://archive.apache.org/dist/mesos/1.9.0/mesos-1.9.0.tar.gz
[hadoop@hadoop1 ~/downloads]$ tar xf mesos-1.9.0.tar.gz
[hadoop@hadoop1 ~/downloads]$ sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
[hadoop@hadoop1 ~/downloads]$ subversion install below.
[hadoop@hadoop1 ~/downloads]$ sudo yum install -y epel-release
[hadoop@hadoop1 ~/downloads]$ sudo bash -c 'cat > /etc/yum.repos.d/wandisco-svn.repo <<EOF
[WANdiscoSVN]
name=WANdisco SVN Repo 1.9
enabled=1
baseurl=http://opensource.wandisco.com/centos/7/svn-1.9/RPMS/\$basearch/
gpgcheck=1
gpgkey=http://opensource.wandisco.com/RPM-GPG-KEY-WANdisco
EOF'

[hadoop@hadoop1 ~/downloads]$ sudo yum groupinstall -y "Development Tools"
[hadoop@hadoop1 ~/downloads]$ sudo yum install -y apache-maven python-devel python-six python-virtualenv zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 apr-devel subversion-devel apr-util-devel

[hadoop@hadoop1 ~/downloads]$ cd mesos-1.9.0
[hadoop@hadoop1 ~/downloads/mesos-1.9.0]$ ./bootstrap
[hadoop@hadoop1 ~/downloads/mesos-1.9.0]$ mkdir build
[hadoop@hadoop1 ~/downloads/mesos-1.9.0]$ cd build
[hadoop@hadoop1 ~/downloads/mesos-1.9.0/build]$ ../configure
[hadoop@hadoop1 ~/downloads/mesos-1.9.0/build]$ make
[hadoop@hadoop1 ~/downloads/mesos-1.9.0/build]$ make check
[hadoop@hadoop1 ~/downloads/mesos-1.9.0/build]$ sudo make install

[hadoop@hadoop1 ~]$ ln -s downloads/mesos-1.9.0/build/bin mesos
[hadoop@hadoop1 ~]$ cd mesos

启动 mesos

1
2
3
4
5
# 启动 mesos master
[hadoop@hadoop1 ~/mesos]$ ./bin/mesos-master.sh --ip=10.10.8.11 --work_dir=/home/hadoop/mesos/mesos_master

# 启动 mesos agent
[hadoop@hadoop1 ~/mesos]$ ./bin/mesos-agent.sh --master=hadoop1:5050 --work_dir=/home/hadoop/mesos/mesos_agent

访问 mesos WEB页面

图片6

5.2 spark 配置

将当前使用的spark二进制包上传到hdfs

1
[hadoop@hadoop1 ~]$ hdfs dfs -put ~/downloads/spark-3.1.2-bin-hadoop2.7.tgz spark-3.1.2-bin-hadoop2.7.tgz

修改 spark-env.sh 配置文件,增加mesos相关配置

1
2
3
[hadoop@hadoop1 ~/spark-3.1.2-bin-hadoop2.7]$ vim conf/spark-env.sh
export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
export SPARK_EXECUTOR_URI=hdfs://hadoop2:8020/user/hadoop/spark-3.1.2-bin-hadoop2.7.tgz

5.3 运行示例

1
2
3
4
5
[hadoop@hadoop1 ~/spark-3.1.2-bin-hadoop2.7]$ ./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master mesos://hadoop1:5050 \
examples/jars/spark-examples_2.12-3.1.2.jar \
1000

或使用spark-shell

1
2
3
4
5
6
7
8
9
10
11
# 上传示例文件到hdfs
[hadoop@hadoop1 ~/spark-3.1.2-bin-hadoop2.7]$ hdfs dfs -put README.md

# 统计文件行数
[hadoop@hadoop1 ~/spark-3.1.2-bin-hadoop2.7]$ ./bin/spark-shell --master mesos://hadoop1:5050

scala> val lines = sc.textFile("README.md")
lines: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1] at textFile at <console>:24

scala> lines.count()
res0: Long = 108

同时前往mesos WEB查看进行中的任务

图片7

-------------本文结束感谢您的阅读-------------