You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by The Watcher <wa...@gmail.com> on 2015/02/15 12:03:41 UTC

Spark & Hive

I'm a little confused around Hive & Spark, can someone shed some light ?

Using Spark, I can access the Hive metastore and run Hive queries. Since I
am able to do this in stand-alone mode, it can't be using map-reduce to run
the Hive queries and I suppose it's building a query plan and executing it
all in Spark.

So, is this the same as
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
?
If not, why not and aren't they likely to merge at some point ?

If Spark really builds its own query plan, joins, etc without Hive's then
is everything that requires special SQL syntax in Hive supported : window
functions, cubes, rollups, skewed tables, etc

Thanks

Re: Spark & Hive

Posted by Reynold Xin <rx...@databricks.com>.

Spark SQL is not the same as Hive on Spark.

Spark SQL is a query engine that is designed from ground up for Spark
without the historic baggage of Hive. It also does more than SQL now -- it
is meant for structured data processing (e.g. the new DataFrame API) and
SQL. Spark SQL is mostly compatible with Hive, but 100% compatibility is
not a goal (nor desired, since Hive has a lot of weird SQL semantics in the
course of its evolution).

Hive on Spark is meant to replace Hive's MapReduce runtime with Spark's.

For more information, see this blog post:
https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html

On Sun, Feb 15, 2015 at 3:03 AM, The Watcher <wa...@gmail.com> wrote:

> I'm a little confused around Hive & Spark, can someone shed some light ?
>
> Using Spark, I can access the Hive metastore and run Hive queries. Since I
> am able to do this in stand-alone mode, it can't be using map-reduce to run
> the Hive queries and I suppose it's building a query plan and executing it
> all in Spark.
>
> So, is this the same as
>
> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
> ?
> If not, why not and aren't they likely to merge at some point ?
>
> If Spark really builds its own query plan, joins, etc without Hive's then
> is everything that requires special SQL syntax in Hive supported : window
> functions, cubes, rollups, skewed tables, etc
>
> Thanks
>