【Spark八十一】Hive in the spark assembly-技术博客集

【Spark八十一】Hive in the spark assembly
编程技术 / houtizong 发布于 3年前 85

Spark SQL supports most commonly used features of HiveQL. However, different HiveQL statements are executed in different manners:

1. DDL statements (e.g. CREATE TABLE, DROP TABLE, etc.) and commands (e.g. SET <key> = <value>, ADD FILE, ADD JAR, etc.)

2. In most cases, Spark SQL simply delegates these statements to Hive, as they don’t need to issue any distributed jobs and don’t rely on the computation engine (Spark, MR, or Tez).
SELECT queries, CREATE TABLE ... AS SELECT ... statements and insertions

These statements are executed using Spark as the execution engine.

The Hive classes packaged in the assembly jar are used to provide entry points to Hive features, for example:

1. HiveQL parser
2. Talking to Hive metastore to execute DDL statements
3. Accessing UDF/UDAF/UDTF

As for the differences between Hive on Spark and Spark SQL’s Hive support, please refer to this article by Reynold: https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html

上一篇：【HBase六】REST操作HBase

下一篇：【Spark八十三】BlockManager在Spark中的使用场景

请勿发布不友善或者负能量的内容。与人为善，比聪明更重要！

<div > Spark SQL supports most commonly used features of HiveQL. However, different HiveQL statements are executed in different manners: <ol> <li style="margin: 0.5em 0px;"> 1. DDL statements (e.g.&nbsp;<code style="font-size: 0.85em; font-family: Consolas, Inconsolata, Courier, monospace; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; border: 1px solid #eaeaea; background-color: #f8f8f8; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; display: inline;">CREATE TABLE</code>,&nbsp;<code style="font-size: 0.85em; font-family: Consolas, Inconsolata, Courier, monospace; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; border: 1px solid #eaeaea; background-color: #f8f8f8; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; display: inline;">DROP TABLE</code>, etc.) and commands (e.g.&nbsp;<code style="font-size: 0.85em; font-family: Consolas, Inconsolata, Courier, monospace; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; border: 1px solid #eaeaea; background-color: #f8f8f8; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; display: inline;">SET &lt;key&gt; = &lt;value&gt;</code>,&nbsp;<code style="font-size: 0.85em; font-family: Consolas, Inconsolata, Courier, monospace; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; border: 1px solid #eaeaea; background-color: #f8f8f8; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; display: inline;">ADD FILE</code>,&nbsp;<code style="font-size: 0.85em; font-family: Consolas, Inconsolata, Courier, monospace; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; border: 1px solid #eaeaea; background-color: #f8f8f8; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; display: inline;">ADD JAR</code>, etc.) 2. In most cases, Spark SQL simply delegates these statements to Hive, as they don’t need to issue any distributed jobs and don’t rely on the computation engine (Spark, MR, or Tez). </li> <li style="margin: 0.5em 0px;"> <code style="font-size: 0.85em; font-family: Consolas, Inconsolata, Courier, monospace; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; border: 1px solid #eaeaea; background-color: #f8f8f8; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; display: inline;">SELECT</code>&nbsp;queries,&nbsp;<code style="font-size: 0.85em; font-family: Consolas, Inconsolata, Courier, monospace; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; border: 1px solid #eaeaea; background-color: #f8f8f8; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; display: inline;">CREATE TABLE ... AS SELECT ...</code>&nbsp;statements and insertions These statements are executed using Spark as the execution engine. </li> </ol> The Hive classes packaged in the assembly jar are used to provide entry points to Hive features, for example: <ol> <li style="margin: 0.5em 0px;">1. HiveQL parser</li> <li style="margin: 0.5em 0px;">2. Talking to Hive metastore to execute DDL statements</li> <li style="margin: 0.5em 0px;">3. Accessing UDF/UDAF/UDTF</li> </ol> As for the differences between Hive on Spark and Spark SQL’s Hive support, please refer to this article by Reynold:&nbsp;<a href="https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html">https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html</a> </div>

留言需要登陆哦

技术博客集 - 网站简介：
前后端技术：
后端基于Hyperf2.1框架开发,前端使用Bootstrap可视化布局系统生成
网站主要作用：
1.编程技术分享及讨论交流，内置聊天系统;
2.测试交流框架问题，比如：Hyperf、Laravel、TP、beego;
3.本站数据是基于大数据采集等爬虫技术为基础助力分享知识，如有侵权请发邮件到站长邮箱，站长会尽快处理;
4.站长邮箱：[email protected];

文章归档

文章标签

友情链接

首页
关于我们

Auther ·HouTiZong: 侯体宗的博客