You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "guoqing0629@yahoo.com.hk" <gu...@yahoo.com.hk> on 2015/05/20 07:38:56 UTC

Hive on Spark VS Spark SQL

Hive on Spark and SparkSQL which should be better , and what are the key characteristics and the advantages and the disadvantages between ?



guoqing0629@yahoo.com.hk

Re: Hive on Spark VS Spark SQL

Posted by Edward Capriolo <ed...@gmail.com>.
What about outer lateral view?

On Wed, May 20, 2015 at 11:28 AM, matshyeq <ma...@gmail.com> wrote:

> From my experience SparkSQL is still way faster than tez.
> Also, SparkSQL (even 1.2.1 which I'm on) supports *lateral view*
>
> On Wed, May 20, 2015 at 3:41 PM, Edward Capriolo <ed...@gmail.com>
> wrote:
>
>> Beyond window queries, hive still has concepts like cube or lateral view
>> that many "better than hive" systems don't have.
>>
>> Also now many people went around broadcasting SparkSQL/SparkSQL was/is
>> better/faster than hive but now that tez has "whooped" them in a benchmark
>> they are very quite.
>>
>>
>> http://www.quora.com/What-do-the-people-who-answered-Quora-questions-about-Spark-being-faster-than-Hive-say-now-that-Hortonworks-claims-that-Hive-on-Tez-is-faster-than-Spark
>>
>>
>>
>>
>> On Wed, May 20, 2015 at 9:50 AM, Dragga, Christopher <
>> Chris.Dragga@netapp.com> wrote:
>>
>>>  While I’ve not experimented with the most recent versions of SparkSQL,
>>> earlier releases could not cope with intermediate result sets that exceeded
>>> the available memory; Hive handles this sort of situation much more
>>> gracefully.  If you have a smallish cluster and large data, this could pose
>>> a problem.  Still, it’s worth looking into SparkSQL to see if this is still
>>> an issue.
>>>
>>>
>>>
>>> -Chris Dragga
>>>
>>>
>>>
>>> *From:* Uli Bethke [mailto:uli.bethke@sonra.io]
>>> *Sent:* Wednesday, May 20, 2015 7:04 AM
>>> *To:* user@hive.apache.org
>>> *Subject:* Re: Hive on Spark VS Spark SQL
>>>
>>>
>>>
>>> Interesting question and one that I have asked myself. If you are
>>> already heavily invested in the Hive ecosystem in terms of code and skills
>>> I would look at Hive on Spark as my engine. In theory swapping out engines
>>> (MR, TEZ, Spark) should be easy. Even though the devil is in the detail.
>>> SparkSQL supports a broad subset of HiveQL (some esoteric features are
>>> not supported). Crucially in my opinion SparkSQL 1.4 will also introduce
>>> windowing functions. If starting out on a greenfield site I would
>>> exclusively look at SparkSQL.
>>>
>>>  On 20/05/2015 06:38, guoqing0629@yahoo.com.hk wrote:
>>>
>>>  Hive on Spark and SparkSQL which should be better , and what are the
>>> key characteristics and the advantages and the disadvantages between ?
>>>
>>>
>>>  ------------------------------
>>>
>>> guoqing0629@yahoo.com.hk
>>>
>>>
>>>
>>>  --
>>>
>>> ___________________________
>>>
>>> Uli Bethke
>>>
>>> Co-founder Sonra
>>>
>>> p: +353 86 32 83 040
>>>
>>> w: www.sonra.io
>>>
>>> l: linkedin.com/in/ulibethke
>>>
>>> t: twitter.com/ubethke
>>>
>>>
>>>
>>> Chair Hadoop User Group Ireland:
>>>
>>> http://www.meetup.com/hadoop-user-group-ireland/
>>>
>>>
>>
>

Re: Hive on Spark VS Spark SQL

Posted by matshyeq <ma...@gmail.com>.
>From my experience SparkSQL is still way faster than tez.
Also, SparkSQL (even 1.2.1 which I'm on) supports *lateral view*

On Wed, May 20, 2015 at 3:41 PM, Edward Capriolo <ed...@gmail.com>
wrote:

> Beyond window queries, hive still has concepts like cube or lateral view
> that many "better than hive" systems don't have.
>
> Also now many people went around broadcasting SparkSQL/SparkSQL was/is
> better/faster than hive but now that tez has "whooped" them in a benchmark
> they are very quite.
>
>
> http://www.quora.com/What-do-the-people-who-answered-Quora-questions-about-Spark-being-faster-than-Hive-say-now-that-Hortonworks-claims-that-Hive-on-Tez-is-faster-than-Spark
>
>
>
>
> On Wed, May 20, 2015 at 9:50 AM, Dragga, Christopher <
> Chris.Dragga@netapp.com> wrote:
>
>>  While I’ve not experimented with the most recent versions of SparkSQL,
>> earlier releases could not cope with intermediate result sets that exceeded
>> the available memory; Hive handles this sort of situation much more
>> gracefully.  If you have a smallish cluster and large data, this could pose
>> a problem.  Still, it’s worth looking into SparkSQL to see if this is still
>> an issue.
>>
>>
>>
>> -Chris Dragga
>>
>>
>>
>> *From:* Uli Bethke [mailto:uli.bethke@sonra.io]
>> *Sent:* Wednesday, May 20, 2015 7:04 AM
>> *To:* user@hive.apache.org
>> *Subject:* Re: Hive on Spark VS Spark SQL
>>
>>
>>
>> Interesting question and one that I have asked myself. If you are already
>> heavily invested in the Hive ecosystem in terms of code and skills I would
>> look at Hive on Spark as my engine. In theory swapping out engines (MR,
>> TEZ, Spark) should be easy. Even though the devil is in the detail.
>> SparkSQL supports a broad subset of HiveQL (some esoteric features are
>> not supported). Crucially in my opinion SparkSQL 1.4 will also introduce
>> windowing functions. If starting out on a greenfield site I would
>> exclusively look at SparkSQL.
>>
>>  On 20/05/2015 06:38, guoqing0629@yahoo.com.hk wrote:
>>
>>  Hive on Spark and SparkSQL which should be better , and what are the
>> key characteristics and the advantages and the disadvantages between ?
>>
>>
>>  ------------------------------
>>
>> guoqing0629@yahoo.com.hk
>>
>>
>>
>>  --
>>
>> ___________________________
>>
>> Uli Bethke
>>
>> Co-founder Sonra
>>
>> p: +353 86 32 83 040
>>
>> w: www.sonra.io
>>
>> l: linkedin.com/in/ulibethke
>>
>> t: twitter.com/ubethke
>>
>>
>>
>> Chair Hadoop User Group Ireland:
>>
>> http://www.meetup.com/hadoop-user-group-ireland/
>>
>>
>

Re: Hive on Spark VS Spark SQL

Posted by Edward Capriolo <ed...@gmail.com>.
Beyond window queries, hive still has concepts like cube or lateral view
that many "better than hive" systems don't have.

Also now many people went around broadcasting SparkSQL/SparkSQL was/is
better/faster than hive but now that tez has "whooped" them in a benchmark
they are very quite.

http://www.quora.com/What-do-the-people-who-answered-Quora-questions-about-Spark-being-faster-than-Hive-say-now-that-Hortonworks-claims-that-Hive-on-Tez-is-faster-than-Spark




On Wed, May 20, 2015 at 9:50 AM, Dragga, Christopher <
Chris.Dragga@netapp.com> wrote:

>  While I’ve not experimented with the most recent versions of SparkSQL,
> earlier releases could not cope with intermediate result sets that exceeded
> the available memory; Hive handles this sort of situation much more
> gracefully.  If you have a smallish cluster and large data, this could pose
> a problem.  Still, it’s worth looking into SparkSQL to see if this is still
> an issue.
>
>
>
> -Chris Dragga
>
>
>
> *From:* Uli Bethke [mailto:uli.bethke@sonra.io]
> *Sent:* Wednesday, May 20, 2015 7:04 AM
> *To:* user@hive.apache.org
> *Subject:* Re: Hive on Spark VS Spark SQL
>
>
>
> Interesting question and one that I have asked myself. If you are already
> heavily invested in the Hive ecosystem in terms of code and skills I would
> look at Hive on Spark as my engine. In theory swapping out engines (MR,
> TEZ, Spark) should be easy. Even though the devil is in the detail.
> SparkSQL supports a broad subset of HiveQL (some esoteric features are not
> supported). Crucially in my opinion SparkSQL 1.4 will also introduce
> windowing functions. If starting out on a greenfield site I would
> exclusively look at SparkSQL.
>
>  On 20/05/2015 06:38, guoqing0629@yahoo.com.hk wrote:
>
>  Hive on Spark and SparkSQL which should be better , and what are the key
> characteristics and the advantages and the disadvantages between ?
>
>
>  ------------------------------
>
> guoqing0629@yahoo.com.hk
>
>
>
>  --
>
> ___________________________
>
> Uli Bethke
>
> Co-founder Sonra
>
> p: +353 86 32 83 040
>
> w: www.sonra.io
>
> l: linkedin.com/in/ulibethke
>
> t: twitter.com/ubethke
>
>
>
> Chair Hadoop User Group Ireland:
>
> http://www.meetup.com/hadoop-user-group-ireland/
>
>

RE: Hive on Spark VS Spark SQL

Posted by "Dragga, Christopher" <Ch...@netapp.com>.
While I've not experimented with the most recent versions of SparkSQL, earlier releases could not cope with intermediate result sets that exceeded the available memory; Hive handles this sort of situation much more gracefully.  If you have a smallish cluster and large data, this could pose a problem.  Still, it's worth looking into SparkSQL to see if this is still an issue.

-Chris Dragga

From: Uli Bethke [mailto:uli.bethke@sonra.io]
Sent: Wednesday, May 20, 2015 7:04 AM
To: user@hive.apache.org
Subject: Re: Hive on Spark VS Spark SQL

Interesting question and one that I have asked myself. If you are already heavily invested in the Hive ecosystem in terms of code and skills I would look at Hive on Spark as my engine. In theory swapping out engines (MR, TEZ, Spark) should be easy. Even though the devil is in the detail.
SparkSQL supports a broad subset of HiveQL (some esoteric features are not supported). Crucially in my opinion SparkSQL 1.4 will also introduce windowing functions. If starting out on a greenfield site I would exclusively look at SparkSQL.

On 20/05/2015 06:38, guoqing0629@yahoo.com.hk<ma...@yahoo.com.hk> wrote:
Hive on Spark and SparkSQL which should be better , and what are the key characteristics and the advantages and the disadvantages between ?

________________________________
guoqing0629@yahoo.com.hk<ma...@yahoo.com.hk>



--

___________________________

Uli Bethke

Co-founder Sonra

p: +353 86 32 83 040

w: www.sonra.io<http://www.sonra.io>

l: linkedin.com/in/ulibethke

t: twitter.com/ubethke



Chair Hadoop User Group Ireland:

http://www.meetup.com/hadoop-user-group-ireland/

Re: Hive on Spark VS Spark SQL

Posted by Uli Bethke <ul...@sonra.io>.
Interesting question and one that I have asked myself. If you are 
already heavily invested in the Hive ecosystem in terms of code and 
skills I would look at Hive on Spark as my engine. In theory swapping 
out engines (MR, TEZ, Spark) should be easy. Even though the devil is in 
the detail.
SparkSQL supports a broad subset of HiveQL (some esoteric features are 
not supported). Crucially in my opinion SparkSQL 1.4 will also introduce 
windowing functions. If starting out on a greenfield site I would 
exclusively look at SparkSQL.


On 20/05/2015 06:38, guoqing0629@yahoo.com.hk wrote:
> Hive on Spark and SparkSQL which should be better , and what are the 
> key characteristics and the advantages and the disadvantages between ?
>
> ------------------------------------------------------------------------
> guoqing0629@yahoo.com.hk

-- 
___________________________
Uli Bethke
Co-founder Sonra
p: +353 86 32 83 040
w: www.sonra.io
l: linkedin.com/in/ulibethke
t: twitter.com/ubethke

Chair Hadoop User Group Ireland:
http://www.meetup.com/hadoop-user-group-ireland/


Re: Hive on Spark VS Spark SQL

Posted by Alexander Pivovarov <ap...@gmail.com>.
Thank you Xuefu!

Excellent explanation and comparison!
We should put it to Hive on Spark wiki.
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark


On Wed, May 20, 2015 at 10:45 AM, Xuefu Zhang <xz...@cloudera.com> wrote:

> I have been working on HIve on Spark, and knows a little about SparkSQL.
> Here are a few factors to be considered:
>
> 1. SparkSQL is similar to Shark (discontinued) in that it clones Hive's
> front end (parser and semantic analyzer) and metastore, and inject in
> between a laryer where Hive's operator tree is reinterpreted in Spark's
> constructs (transactions and actions). Thus, it's tied to a specific
> version of Hive, which is always behind official Hive releases.
> 2. Because of the reinterpretation, many features (window functions,
> lateral views, etc) from Hive need to be reimplemented in Spark world. If
> an implementation hasn't been done, you see a gap. That's why you would
> expect functional disparity, not to mention future Hive futures.
> 3. SparkSQL is far from production ready.
> 4. On the other hand, Hive on Spark is native in Hive, embracing all Hive
> features and growing with Hive. Hive's operators are honored without
> re-interpretation. The integration is done at the execution layer, where
> Spark is nothing but an advanced MapReduce engine.
> 5. Hive is aiming at enterprise use cases, where there are more important
> concerns such as security than purely if it works or if it runs fast. Hive
> on Spark certainly makes the query run faster, but still keeps the same
> enterprise-readiness.
> 6. SparkSQL is a good fit if you're a heavy Spark user who occasionally
> needs to run some SQL. Or you're a casual SQL user and like to try
> something new.
> 7. If haven't touched either Spark or Hive, I'd suggest you start with
> Hive, especially for an enterprise.
> 8. If you're an existing Hive user and consider taking advantage of Spark,
> consider Hive on Spark.
> 9. It's strongly discouraged to mix Hive and SparkSQL in your deployment.
> SparkSQL includes a version of Hive, which is very likely at a different
> version of the Hive that you have (even if you don't use Hive on Spark).
> Library conflicts can put you in a nightmare.
> 10. I haven't benchmarked SparkSQL myself, but I heard several reports
> that SparkSQL, when being tried at scale, is either fast or failing your
> queries.
>
> Hope this helps.
>
> Thanks,
>
>
> On Tue, May 19, 2015 at 10:38 PM, guoqing0629@yahoo.com.hk <
> guoqing0629@yahoo.com.hk> wrote:
>
>> Hive on Spark and SparkSQL which should be better , and what are the key
>> characteristics and the advantages and the disadvantages between ?
>>
>> ------------------------------
>> guoqing0629@yahoo.com.hk
>>
>
>

Re: Pointing SparkSQL to existing Hive Metadata with data file locations in HDFS

Posted by Xuefu Zhang <xz...@cloudera.com>.
I'm afraid you're at the wrong community. You might have a better chance to
get an answer in Spark community.

Thanks,
Xuefu

On Wed, May 27, 2015 at 5:44 PM, Sanjay Subramanian <
sanjaysubramanian@yahoo.com> wrote:

> hey guys
>
> On the Hive/Hadoop ecosystem we have using Cloudera distribution CDH 5.2.x
> , there are about 300+ hive tables.
> The data is stored an text (moving slowly to Parquet) on HDFS.
> I want to use SparkSQL and point to the Hive metadata and be able to
> define JOINS etc using a programming structure like this
>
> import org.apache.spark.sql.hive.HiveContext
> val sqlContext = new HiveContext(sc)
> val schemaRdd = sqlContext.sql("some complex SQL")
>
>
> Is that the way to go ? Some guidance will be great.
>
> thanks
>
> sanjay
>
>
>
>

Pointing SparkSQL to existing Hive Metadata with data file locations in HDFS

Posted by Sanjay Subramanian <sa...@yahoo.com>.
hey guys
On the Hive/Hadoop ecosystem we have using Cloudera distribution CDH 5.2.x , there are about 300+ hive tables.The data is stored an text (moving slowly to Parquet) on HDFS.I want to use SparkSQL and point to the Hive metadata and be able to define JOINS etc using a programming structure like this 
import org.apache.spark.sql.hive.HiveContextval sqlContext = new HiveContext(sc)val schemaRdd = sqlContext.sql("some complex SQL")

Is that the way to go ? Some guidance will be great.
thanks
sanjay



Re: Hive on Spark VS Spark SQL

Posted by Xuefu Zhang <xz...@cloudera.com>.
Hi Cheolsoo,

Thanks for the correction. I took that for granted and didn't actually
check the code to verify. Yes, from the Spark version (1.2), I did see
their parser etc. Below is a portion of the README from Spark's sql package
for reference.

Thanks,
Xuefu

Spark SQL is broken up into four subprojects:
 - Catalyst (sql/catalyst) - An implementation-agnostic framework for
manipulating trees of relational operators and expressions.
 - Execution (sql/core) - A query planner / execution engine for
translating Catalyst’s logical query plans into Spark RDDs.  This component
also includes a new public interface, SQLContext, that allows users to
execute SQL or LINQ statements against existing RDDs and Parquet files.
 - Hive Support (sql/hive) - Includes an extension of SQLContext called
HiveContext that allows users to write queries using *a subset of HiveQL*
and access data from a Hive Metastore using Hive SerDes.  There are also
wrappers that allows users to run queries that include Hive UDFs, UDAFs,
and UDTFs.
 - HiveServer and CLI support (sql/hive-thriftserver) - Includes support
for the SQL CLI (bin/spark-sql) and a HiveServer2 (for JDBC/ODBC)
compatible server.


On Thu, May 21, 2015 at 10:31 PM, Cheolsoo Park <pi...@gmail.com>
wrote:

> Hi Xuefu,
>
> Thanks for the good comparison. I agree with most points, but #1 isn't
> true.
>
> SparkSQL has its own parser (implemented with Scala parser combinator
> library), analyzer, and optimizer although they're not as mature as Hive.
> What it depends on Hive for is Metastore, CliDriver, DDL parser, etc.
>
> Cheolsoo
>
> On Wed, May 20, 2015 at 10:45 AM, Xuefu Zhang <xz...@cloudera.com> wrote:
>
>> I have been working on HIve on Spark, and knows a little about SparkSQL.
>> Here are a few factors to be considered:
>>
>> 1. SparkSQL is similar to Shark (discontinued) in that it clones Hive's
>> front end (parser and semantic analyzer) and metastore, and inject in
>> between a laryer where Hive's operator tree is reinterpreted in Spark's
>> constructs (transactions and actions). Thus, it's tied to a specific
>> version of Hive, which is always behind official Hive releases.
>> 2. Because of the reinterpretation, many features (window functions,
>> lateral views, etc) from Hive need to be reimplemented in Spark world. If
>> an implementation hasn't been done, you see a gap. That's why you would
>> expect functional disparity, not to mention future Hive futures.
>> 3. SparkSQL is far from production ready.
>> 4. On the other hand, Hive on Spark is native in Hive, embracing all Hive
>> features and growing with Hive. Hive's operators are honored without
>> re-interpretation. The integration is done at the execution layer, where
>> Spark is nothing but an advanced MapReduce engine.
>> 5. Hive is aiming at enterprise use cases, where there are more important
>> concerns such as security than purely if it works or if it runs fast. Hive
>> on Spark certainly makes the query run faster, but still keeps the same
>> enterprise-readiness.
>> 6. SparkSQL is a good fit if you're a heavy Spark user who occasionally
>> needs to run some SQL. Or you're a casual SQL user and like to try
>> something new.
>> 7. If haven't touched either Spark or Hive, I'd suggest you start with
>> Hive, especially for an enterprise.
>> 8. If you're an existing Hive user and consider taking advantage of
>> Spark, consider Hive on Spark.
>> 9. It's strongly discouraged to mix Hive and SparkSQL in your deployment.
>> SparkSQL includes a version of Hive, which is very likely at a different
>> version of the Hive that you have (even if you don't use Hive on Spark).
>> Library conflicts can put you in a nightmare.
>> 10. I haven't benchmarked SparkSQL myself, but I heard several reports
>> that SparkSQL, when being tried at scale, is either fast or failing your
>> queries.
>>
>> Hope this helps.
>>
>> Thanks,
>>
>>
>> On Tue, May 19, 2015 at 10:38 PM, guoqing0629@yahoo.com.hk <
>> guoqing0629@yahoo.com.hk> wrote:
>>
>>> Hive on Spark and SparkSQL which should be better , and what are the key
>>> characteristics and the advantages and the disadvantages between ?
>>>
>>> ------------------------------
>>> guoqing0629@yahoo.com.hk
>>>
>>
>>
>

Re: Hive on Spark VS Spark SQL

Posted by Cheolsoo Park <pi...@gmail.com>.
Hi Xuefu,

Thanks for the good comparison. I agree with most points, but #1 isn't true.

SparkSQL has its own parser (implemented with Scala parser combinator
library), analyzer, and optimizer although they're not as mature as Hive.
What it depends on Hive for is Metastore, CliDriver, DDL parser, etc.

Cheolsoo

On Wed, May 20, 2015 at 10:45 AM, Xuefu Zhang <xz...@cloudera.com> wrote:

> I have been working on HIve on Spark, and knows a little about SparkSQL.
> Here are a few factors to be considered:
>
> 1. SparkSQL is similar to Shark (discontinued) in that it clones Hive's
> front end (parser and semantic analyzer) and metastore, and inject in
> between a laryer where Hive's operator tree is reinterpreted in Spark's
> constructs (transactions and actions). Thus, it's tied to a specific
> version of Hive, which is always behind official Hive releases.
> 2. Because of the reinterpretation, many features (window functions,
> lateral views, etc) from Hive need to be reimplemented in Spark world. If
> an implementation hasn't been done, you see a gap. That's why you would
> expect functional disparity, not to mention future Hive futures.
> 3. SparkSQL is far from production ready.
> 4. On the other hand, Hive on Spark is native in Hive, embracing all Hive
> features and growing with Hive. Hive's operators are honored without
> re-interpretation. The integration is done at the execution layer, where
> Spark is nothing but an advanced MapReduce engine.
> 5. Hive is aiming at enterprise use cases, where there are more important
> concerns such as security than purely if it works or if it runs fast. Hive
> on Spark certainly makes the query run faster, but still keeps the same
> enterprise-readiness.
> 6. SparkSQL is a good fit if you're a heavy Spark user who occasionally
> needs to run some SQL. Or you're a casual SQL user and like to try
> something new.
> 7. If haven't touched either Spark or Hive, I'd suggest you start with
> Hive, especially for an enterprise.
> 8. If you're an existing Hive user and consider taking advantage of Spark,
> consider Hive on Spark.
> 9. It's strongly discouraged to mix Hive and SparkSQL in your deployment.
> SparkSQL includes a version of Hive, which is very likely at a different
> version of the Hive that you have (even if you don't use Hive on Spark).
> Library conflicts can put you in a nightmare.
> 10. I haven't benchmarked SparkSQL myself, but I heard several reports
> that SparkSQL, when being tried at scale, is either fast or failing your
> queries.
>
> Hope this helps.
>
> Thanks,
>
>
> On Tue, May 19, 2015 at 10:38 PM, guoqing0629@yahoo.com.hk <
> guoqing0629@yahoo.com.hk> wrote:
>
>> Hive on Spark and SparkSQL which should be better , and what are the key
>> characteristics and the advantages and the disadvantages between ?
>>
>> ------------------------------
>> guoqing0629@yahoo.com.hk
>>
>
>

Re: Hive on Spark VS Spark SQL

Posted by Xuefu Zhang <xz...@cloudera.com>.
I have been working on HIve on Spark, and knows a little about SparkSQL.
Here are a few factors to be considered:

1. SparkSQL is similar to Shark (discontinued) in that it clones Hive's
front end (parser and semantic analyzer) and metastore, and inject in
between a laryer where Hive's operator tree is reinterpreted in Spark's
constructs (transactions and actions). Thus, it's tied to a specific
version of Hive, which is always behind official Hive releases.
2. Because of the reinterpretation, many features (window functions,
lateral views, etc) from Hive need to be reimplemented in Spark world. If
an implementation hasn't been done, you see a gap. That's why you would
expect functional disparity, not to mention future Hive futures.
3. SparkSQL is far from production ready.
4. On the other hand, Hive on Spark is native in Hive, embracing all Hive
features and growing with Hive. Hive's operators are honored without
re-interpretation. The integration is done at the execution layer, where
Spark is nothing but an advanced MapReduce engine.
5. Hive is aiming at enterprise use cases, where there are more important
concerns such as security than purely if it works or if it runs fast. Hive
on Spark certainly makes the query run faster, but still keeps the same
enterprise-readiness.
6. SparkSQL is a good fit if you're a heavy Spark user who occasionally
needs to run some SQL. Or you're a casual SQL user and like to try
something new.
7. If haven't touched either Spark or Hive, I'd suggest you start with
Hive, especially for an enterprise.
8. If you're an existing Hive user and consider taking advantage of Spark,
consider Hive on Spark.
9. It's strongly discouraged to mix Hive and SparkSQL in your deployment.
SparkSQL includes a version of Hive, which is very likely at a different
version of the Hive that you have (even if you don't use Hive on Spark).
Library conflicts can put you in a nightmare.
10. I haven't benchmarked SparkSQL myself, but I heard several reports that
SparkSQL, when being tried at scale, is either fast or failing your queries.

Hope this helps.

Thanks,


On Tue, May 19, 2015 at 10:38 PM, guoqing0629@yahoo.com.hk <
guoqing0629@yahoo.com.hk> wrote:

> Hive on Spark and SparkSQL which should be better , and what are the key
> characteristics and the advantages and the disadvantages between ?
>
> ------------------------------
> guoqing0629@yahoo.com.hk
>