【Spark九十四】spark-sql工具的使用-技术博客集

【Spark九十四】spark-sql工具的使用
编程技术 / houtizong 发布于 3年前 74

spark-sql是Spark bin目录下的一个可执行脚本，它的目的是通过这个脚本执行Hive的命令，即原来通过

hive>输入的指令可以通过spark-sql>输入的指令来完成。

spark-sql可以使用内置的Hive metadata-store，也可以使用已经独立安装的Hive的metadata store

关于Hive build into Spark

Spark SQL can be built with or without Apache Hive, the Hadoop SQL engine. SparkSQL with Hive support allows us to access Hive tables, UDFs (user-defined functions),SerDes (serialization and deserialization formats), and the Hive query language(HiveQL). Hive query language (HQL) It is important to note that includingthe Hive libraries does not require an existing Hive installation.

2.关于Hive和Spark SQL的关系，参见：http://bit1129.iteye.com/blog/2192739

Note that if you don’t have an existing Hive installation, Spark SQL will create itsown Hive metastore (metadata DB) in your program’s work directory, called metastore_db. In addition, if you attempt to create tables using HiveQL’s CREATE TABLEstatement (not CREATE EXTERNAL TABLE), they will be placed in the /user/hive/warehousedirectory on your default filesystem (either your local filesystem, or HDFS ifyou have a hdfs-site.xml on your classpath).

配置步骤：

1. 将Hive的conf目录的hive-site.xml拷贝到Spark的conf目录

2. 将hive-site.xml中关于时间的配置的时间单位，比如ms，s全部删除掉

3. 将mysql jdbc的驱动添加到Spark的Classpath上

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/hadoop/software/spark-1.2.0-bin-hadoop2.4/lib/mysql-connector-java-5.1.34.jar

4.启动Spark SQL

[hadoop@hadoop bin]$ ./spark-sqlSpark assembly has been built with Hive, including Datanucleus jars on classpathSET spark.sql.hive.version=0.13.1

5. 显示所有的数据库

spark-sql> show databases;OKdefaultfinancialsfinancials2salesTime taken: 18.67 seconds

6. 显示表

spark-sql> use default;OKTime taken: 0.657 secondsspark-sql> show tables;OKabcavro_tableemployeesinvitesmy_wordmytable1parquet_tabletable1wordword3word4word5Time taken: 1.011 seconds

7. 查询

spark-sql> select * from word         > ;1MSN  10QQ  100Gtalk  1000Skype NULLNULLTime taken: 39.852 seconds

8.创建表并加载数据

spark-sql> create table word6 (id int,word string) row format delimited fields terminated by ',' stored as textfile ; OKTime taken: 10.852 seconds

spark-sql> load data local inpath '/home/hadoop/word.txt' into table word6;Copying data from file:/home/hadoop/word.txtCopying file: file:/home/hadoop/word.txtLoading data to table default.word6Table default.word6 stats: [numFiles=1, numRows=0, totalSize=31, rawDataSize=0]OKTime taken: 2.307 seconds

通过如上操作可以看到，实际上spark-sql操作等同于Hive操作，也就是spark-sql是提供了等价于Hive的能力

上一篇：【Nginx五】Nginx常用日志格式含义

下一篇：【Spark九十五】Spark Shell操作Spark SQL

请勿发布不友善或者负能量的内容。与人为善，比聪明更重要！

<div > spark-sql是Spark bin目录下的一个可执行脚本，它的目的是通过这个脚本执行Hive的命令，即原来通过 hive&gt;输入的指令可以通过spark-sql&gt;输入的指令来完成。 spark-sql可以使用内置的Hive metadata-store，也可以使用已经独立安装的Hive的metadata store &nbsp; 关于Hive build into Spark 1. <pre name="code" class="java">Spark SQL can be built with or without Apache Hive, the Hadoop SQL engine. SparkSQL with Hive support allows us to access Hive tables, UDFs (user-defined functions),SerDes (serialization and deserialization formats), and the Hive query language(HiveQL). Hive query language (HQL) It is important to note that includingthe Hive libraries does not require an existing Hive installation.</pre> &nbsp;2.关于Hive和Spark SQL的关系，参见：http://bit1129.iteye.com/blog/2192739 &nbsp; 3. <pre name="code" class="java">Note that if you don’t have an existing Hive installation, Spark SQL will create itsown Hive metastore (metadata DB) in your program’s work directory, called metastore_db. In addition, if you attempt to create tables using HiveQL’s CREATE TABLEstatement (not CREATE EXTERNAL TABLE), they will be placed in the /user/hive/warehousedirectory on your default filesystem (either your local filesystem, or HDFS ifyou have a hdfs-site.xml on your classpath).</pre> &nbsp; &nbsp; &nbsp; 配置步骤： &nbsp; 1. 将Hive的conf目录的hive-site.xml拷贝到Spark的conf目录 2. 将hive-site.xml中关于时间的配置的时间单位，比如ms，s全部删除掉 3. 将mysql jdbc的驱动添加到Spark的Classpath上 <pre name="code" class="java">export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/hadoop/software/spark-1.2.0-bin-hadoop2.4/lib/mysql-connector-java-5.1.34.jar</pre> 4.启动Spark SQL <pre name="code" class="java">[hadoop@hadoop bin]$ ./spark-sqlSpark assembly has been built with Hive, including Datanucleus jars on classpathSET spark.sql.hive.version=0.13.1</pre> &nbsp; 5. 显示所有的数据库 &nbsp; <pre name="code" class="java">spark-sql&gt; show databases;OKdefaultfinancialsfinancials2salesTime taken: 18.67 seconds</pre> &nbsp; 6. 显示表 &nbsp; <pre name="code" class="java">spark-sql&gt; use default;OKTime taken: 0.657 secondsspark-sql&gt; show tables;OKabcavro_tableemployeesinvitesmy_wordmytable1parquet_tabletable1wordword3word4word5Time taken: 1.011 seconds</pre> &nbsp; 7. 查询 <pre name="code" class="java">spark-sql&gt; select * from word &gt; ;1MSN 10QQ 100Gtalk 1000Skype NULLNULLTime taken: 39.852 seconds</pre> &nbsp; 8.创建表并加载数据 <pre name="code" class="java">spark-sql&gt; create table word6 (id int,word string) row format delimited fields terminated by ',' stored as textfile ; OKTime taken: 10.852 seconds</pre> &nbsp; <pre name="code" class="java">spark-sql&gt; load data local inpath '/home/hadoop/word.txt' into table word6;Copying data from file:/home/hadoop/word.txtCopying file: file:/home/hadoop/word.txtLoading data to table default.word6Table default.word6 stats: [numFiles=1, numRows=0, totalSize=31, rawDataSize=0]OKTime taken: 2.307 seconds</pre> &nbsp; 通过如上操作可以看到，实际上spark-sql操作等同于Hive操作，也就是spark-sql是提供了等价于Hive的能力 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </div>

留言需要登陆哦

技术博客集 - 网站简介：
前后端技术：
后端基于Hyperf2.1框架开发,前端使用Bootstrap可视化布局系统生成
网站主要作用：
1.编程技术分享及讨论交流，内置聊天系统;
2.测试交流框架问题，比如：Hyperf、Laravel、TP、beego;
3.本站数据是基于大数据采集等爬虫技术为基础助力分享知识，如有侵权请发邮件到站长邮箱，站长会尽快处理;
4.站长邮箱：[email protected];

文章归档

文章标签

友情链接

首页
关于我们

Auther ·HouTiZong: 侯体宗的博客