You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "guoqing0629@yahoo.com.hk" <gu...@yahoo.com.hk> on 2015/05/20 07:38:56 UTC
Hive on Spark VS Spark SQL
Hive on Spark and SparkSQL which should be better , and what are the key characteristics and the advantages and the disadvantages between ?
guoqing0629@yahoo.com.hk
Re: Hive on Spark VS Spark SQL
Posted by Edward Capriolo <ed...@gmail.com>.
What about outer lateral view?
On Wed, May 20, 2015 at 11:28 AM, matshyeq <ma...@gmail.com> wrote:
> From my experience SparkSQL is still way faster than tez.
> Also, SparkSQL (even 1.2.1 which I'm on) supports *lateral view*
>
> On Wed, May 20, 2015 at 3:41 PM, Edward Capriolo <ed...@gmail.com>
> wrote:
>
>> Beyond window queries, hive still has concepts like cube or lateral view
>> that many "better than hive" systems don't have.
>>
>> Also now many people went around broadcasting SparkSQL/SparkSQL was/is
>> better/faster than hive but now that tez has "whooped" them in a benchmark
>> they are very quite.
>>
>>
>> http://www.quora.com/What-do-the-people-who-answered-Quora-questions-about-Spark-being-faster-than-Hive-say-now-that-Hortonworks-claims-that-Hive-on-Tez-is-faster-than-Spark
>>
>>
>>
>>
>> On Wed, May 20, 2015 at 9:50 AM, Dragga, Christopher <
>> Chris.Dragga@netapp.com> wrote:
>>
>>> While I’ve not experimented with the most recent versions of SparkSQL,
>>> earlier releases could not cope with intermediate result sets that exceeded
>>> the available memory; Hive handles this sort of situation much more
>>> gracefully. If you have a smallish cluster and large data, this could pose
>>> a problem. Still, it’s worth looking into SparkSQL to see if this is still
>>> an issue.
>>>
>>>
>>>
>>> -Chris Dragga
>>>
>>>
>>>
>>> *From:* Uli Bethke [mailto:uli.bethke@sonra.io]
>>> *Sent:* Wednesday, May 20, 2015 7:04 AM
>>> *To:* user@hive.apache.org
>>> *Subject:* Re: Hive on Spark VS Spark SQL
>>>
>>>
>>>
>>> Interesting question and one that I have asked myself. If you are
>>> already heavily invested in the Hive ecosystem in terms of code and skills
>>> I would look at Hive on Spark as my engine. In theory swapping out engines
>>> (MR, TEZ, Spark) should be easy. Even though the devil is in the detail.
>>> SparkSQL supports a broad subset of HiveQL (some esoteric features are
>>> not supported). Crucially in my opinion SparkSQL 1.4 will also introduce
>>> windowing functions. If starting out on a greenfield site I would
>>> exclusively look at SparkSQL.
>>>
>>> On 20/05/2015 06:38, guoqing0629@yahoo.com.hk wrote:
>>>
>>> Hive on Spark and SparkSQL which should be better , and what are the
>>> key characteristics and the advantages and the disadvantages between ?
>>>
>>>
>>> ------------------------------
>>>
>>> guoqing0629@yahoo.com.hk
>>>
>>>
>>>
>>> --
>>>
>>> ___________________________
>>>
>>> Uli Bethke
>>>
>>> Co-founder Sonra
>>>
>>> p: +353 86 32 83 040
>>>
>>> w: www.sonra.io
>>>
>>> l: linkedin.com/in/ulibethke
>>>
>>> t: twitter.com/ubethke
>>>
>>>
>>>
>>> Chair Hadoop User Group Ireland:
>>>
>>> http://www.meetup.com/hadoop-user-group-ireland/
>>>
>>>
>>
>
Re: Hive on Spark VS Spark SQL
Posted by matshyeq <ma...@gmail.com>.
>From my experience SparkSQL is still way faster than tez.
Also, SparkSQL (even 1.2.1 which I'm on) supports *lateral view*
On Wed, May 20, 2015 at 3:41 PM, Edward Capriolo <ed...@gmail.com>
wrote:
> Beyond window queries, hive still has concepts like cube or lateral view
> that many "better than hive" systems don't have.
>
> Also now many people went around broadcasting SparkSQL/SparkSQL was/is
> better/faster than hive but now that tez has "whooped" them in a benchmark
> they are very quite.
>
>
> http://www.quora.com/What-do-the-people-who-answered-Quora-questions-about-Spark-being-faster-than-Hive-say-now-that-Hortonworks-claims-that-Hive-on-Tez-is-faster-than-Spark
>
>
>
>
> On Wed, May 20, 2015 at 9:50 AM, Dragga, Christopher <
> Chris.Dragga@netapp.com> wrote:
>
>> While I’ve not experimented with the most recent versions of SparkSQL,
>> earlier releases could not cope with intermediate result sets that exceeded
>> the available memory; Hive handles this sort of situation much more
>> gracefully. If you have a smallish cluster and large data, this could pose
>> a problem. Still, it’s worth looking into SparkSQL to see if this is still
>> an issue.
>>
>>
>>
>> -Chris Dragga
>>
>>
>>
>> *From:* Uli Bethke [mailto:uli.bethke@sonra.io]
>> *Sent:* Wednesday, May 20, 2015 7:04 AM
>> *To:* user@hive.apache.org
>> *Subject:* Re: Hive on Spark VS Spark SQL
>>
>>
>>
>> Interesting question and one that I have asked myself. If you are already
>> heavily invested in the Hive ecosystem in terms of code and skills I would
>> look at Hive on Spark as my engine. In theory swapping out engines (MR,
>> TEZ, Spark) should be easy. Even though the devil is in the detail.
>> SparkSQL supports a broad subset of HiveQL (some esoteric features are
>> not supported). Crucially in my opinion SparkSQL 1.4 will also introduce
>> windowing functions. If starting out on a greenfield site I would
>> exclusively look at SparkSQL.
>>
>> On 20/05/2015 06:38, guoqing0629@yahoo.com.hk wrote:
>>
>> Hive on Spark and SparkSQL which should be better , and what are the
>> key characteristics and the advantages and the disadvantages between ?
>>
>>
>> ------------------------------
>>
>> guoqing0629@yahoo.com.hk
>>
>>
>>
>> --
>>
>> ___________________________
>>
>> Uli Bethke
>>
>> Co-founder Sonra
>>
>> p: +353 86 32 83 040
>>
>> w: www.sonra.io
>>
>> l: linkedin.com/in/ulibethke
>>
>> t: twitter.com/ubethke
>>
>>
>>
>> Chair Hadoop User Group Ireland:
>>
>> http://www.meetup.com/hadoop-user-group-ireland/
>>
>>
>
Re: Hive on Spark VS Spark SQL
Posted by Edward Capriolo <ed...@gmail.com>.
Beyond window queries, hive still has concepts like cube or lateral view
that many "better than hive" systems don't have.
Also now many people went around broadcasting SparkSQL/SparkSQL was/is
better/faster than hive but now that tez has "whooped" them in a benchmark
they are very quite.
http://www.quora.com/What-do-the-people-who-answered-Quora-questions-about-Spark-being-faster-than-Hive-say-now-that-Hortonworks-claims-that-Hive-on-Tez-is-faster-than-Spark
On Wed, May 20, 2015 at 9:50 AM, Dragga, Christopher <
Chris.Dragga@netapp.com> wrote:
> While I’ve not experimented with the most recent versions of SparkSQL,
> earlier releases could not cope with intermediate result sets that exceeded
> the available memory; Hive handles this sort of situation much more
> gracefully. If you have a smallish cluster and large data, this could pose
> a problem. Still, it’s worth looking into SparkSQL to see if this is still
> an issue.
>
>
>
> -Chris Dragga
>
>
>
> *From:* Uli Bethke [mailto:uli.bethke@sonra.io]
> *Sent:* Wednesday, May 20, 2015 7:04 AM
> *To:* user@hive.apache.org
> *Subject:* Re: Hive on Spark VS Spark SQL
>
>
>
> Interesting question and one that I have asked myself. If you are already
> heavily invested in the Hive ecosystem in terms of code and skills I would
> look at Hive on Spark as my engine. In theory swapping out engines (MR,
> TEZ, Spark) should be easy. Even though the devil is in the detail.
> SparkSQL supports a broad subset of HiveQL (some esoteric features are not
> supported). Crucially in my opinion SparkSQL 1.4 will also introduce
> windowing functions. If starting out on a greenfield site I would
> exclusively look at SparkSQL.
>
> On 20/05/2015 06:38, guoqing0629@yahoo.com.hk wrote:
>
> Hive on Spark and SparkSQL which should be better , and what are the key
> characteristics and the advantages and the disadvantages between ?
>
>
> ------------------------------
>
> guoqing0629@yahoo.com.hk
>
>
>
> --
>
> ___________________________
>
> Uli Bethke
>
> Co-founder Sonra
>
> p: +353 86 32 83 040
>
> w: www.sonra.io
>
> l: linkedin.com/in/ulibethke
>
> t: twitter.com/ubethke
>
>
>
> Chair Hadoop User Group Ireland:
>
> http://www.meetup.com/hadoop-user-group-ireland/
>
>
RE: Hive on Spark VS Spark SQL
Posted by "Dragga, Christopher" <Ch...@netapp.com>.
While I've not experimented with the most recent versions of SparkSQL, earlier releases could not cope with intermediate result sets that exceeded the available memory; Hive handles this sort of situation much more gracefully. If you have a smallish cluster and large data, this could pose a problem. Still, it's worth looking into SparkSQL to see if this is still an issue.
-Chris Dragga
From: Uli Bethke [mailto:uli.bethke@sonra.io]
Sent: Wednesday, May 20, 2015 7:04 AM
To: user@hive.apache.org
Subject: Re: Hive on Spark VS Spark SQL
Interesting question and one that I have asked myself. If you are already heavily invested in the Hive ecosystem in terms of code and skills I would look at Hive on Spark as my engine. In theory swapping out engines (MR, TEZ, Spark) should be easy. Even though the devil is in the detail.
SparkSQL supports a broad subset of HiveQL (some esoteric features are not supported). Crucially in my opinion SparkSQL 1.4 will also introduce windowing functions. If starting out on a greenfield site I would exclusively look at SparkSQL.
On 20/05/2015 06:38, guoqing0629@yahoo.com.hk<ma...@yahoo.com.hk> wrote:
Hive on Spark and SparkSQL which should be better , and what are the key characteristics and the advantages and the disadvantages between ?
________________________________
guoqing0629@yahoo.com.hk<ma...@yahoo.com.hk>
--
___________________________
Uli Bethke
Co-founder Sonra
p: +353 86 32 83 040
w: www.sonra.io<http://www.sonra.io>
l: linkedin.com/in/ulibethke
t: twitter.com/ubethke
Chair Hadoop User Group Ireland:
http://www.meetup.com/hadoop-user-group-ireland/
Re: Hive on Spark VS Spark SQL
Posted by Uli Bethke <ul...@sonra.io>.
Interesting question and one that I have asked myself. If you are
already heavily invested in the Hive ecosystem in terms of code and
skills I would look at Hive on Spark as my engine. In theory swapping
out engines (MR, TEZ, Spark) should be easy. Even though the devil is in
the detail.
SparkSQL supports a broad subset of HiveQL (some esoteric features are
not supported). Crucially in my opinion SparkSQL 1.4 will also introduce
windowing functions. If starting out on a greenfield site I would
exclusively look at SparkSQL.
On 20/05/2015 06:38, guoqing0629@yahoo.com.hk wrote:
> Hive on Spark and SparkSQL which should be better , and what are the
> key characteristics and the advantages and the disadvantages between ?
>
> ------------------------------------------------------------------------
> guoqing0629@yahoo.com.hk
--
___________________________
Uli Bethke
Co-founder Sonra
p: +353 86 32 83 040
w: www.sonra.io
l: linkedin.com/in/ulibethke
t: twitter.com/ubethke
Chair Hadoop User Group Ireland:
http://www.meetup.com/hadoop-user-group-ireland/
Re: Hive on Spark VS Spark SQL
Posted by Alexander Pivovarov <ap...@gmail.com>.
Thank you Xuefu!
Excellent explanation and comparison!
We should put it to Hive on Spark wiki.
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark
On Wed, May 20, 2015 at 10:45 AM, Xuefu Zhang <xz...@cloudera.com> wrote:
> I have been working on HIve on Spark, and knows a little about SparkSQL.
> Here are a few factors to be considered:
>
> 1. SparkSQL is similar to Shark (discontinued) in that it clones Hive's
> front end (parser and semantic analyzer) and metastore, and inject in
> between a laryer where Hive's operator tree is reinterpreted in Spark's
> constructs (transactions and actions). Thus, it's tied to a specific
> version of Hive, which is always behind official Hive releases.
> 2. Because of the reinterpretation, many features (window functions,
> lateral views, etc) from Hive need to be reimplemented in Spark world. If
> an implementation hasn't been done, you see a gap. That's why you would
> expect functional disparity, not to mention future Hive futures.
> 3. SparkSQL is far from production ready.
> 4. On the other hand, Hive on Spark is native in Hive, embracing all Hive
> features and growing with Hive. Hive's operators are honored without
> re-interpretation. The integration is done at the execution layer, where
> Spark is nothing but an advanced MapReduce engine.
> 5. Hive is aiming at enterprise use cases, where there are more important
> concerns such as security than purely if it works or if it runs fast. Hive
> on Spark certainly makes the query run faster, but still keeps the same
> enterprise-readiness.
> 6. SparkSQL is a good fit if you're a heavy Spark user who occasionally
> needs to run some SQL. Or you're a casual SQL user and like to try
> something new.
> 7. If haven't touched either Spark or Hive, I'd suggest you start with
> Hive, especially for an enterprise.
> 8. If you're an existing Hive user and consider taking advantage of Spark,
> consider Hive on Spark.
> 9. It's strongly discouraged to mix Hive and SparkSQL in your deployment.
> SparkSQL includes a version of Hive, which is very likely at a different
> version of the Hive that you have (even if you don't use Hive on Spark).
> Library conflicts can put you in a nightmare.
> 10. I haven't benchmarked SparkSQL myself, but I heard several reports
> that SparkSQL, when being tried at scale, is either fast or failing your
> queries.
>
> Hope this helps.
>
> Thanks,
>
>
> On Tue, May 19, 2015 at 10:38 PM, guoqing0629@yahoo.com.hk <
> guoqing0629@yahoo.com.hk> wrote:
>
>> Hive on Spark and SparkSQL which should be better , and what are the key
>> characteristics and the advantages and the disadvantages between ?
>>
>> ------------------------------
>> guoqing0629@yahoo.com.hk
>>
>
>
Re: Pointing SparkSQL to existing Hive Metadata with data file
locations in HDFS
Posted by Xuefu Zhang <xz...@cloudera.com>.
I'm afraid you're at the wrong community. You might have a better chance to
get an answer in Spark community.
Thanks,
Xuefu
On Wed, May 27, 2015 at 5:44 PM, Sanjay Subramanian <
sanjaysubramanian@yahoo.com> wrote:
> hey guys
>
> On the Hive/Hadoop ecosystem we have using Cloudera distribution CDH 5.2.x
> , there are about 300+ hive tables.
> The data is stored an text (moving slowly to Parquet) on HDFS.
> I want to use SparkSQL and point to the Hive metadata and be able to
> define JOINS etc using a programming structure like this
>
> import org.apache.spark.sql.hive.HiveContext
> val sqlContext = new HiveContext(sc)
> val schemaRdd = sqlContext.sql("some complex SQL")
>
>
> Is that the way to go ? Some guidance will be great.
>
> thanks
>
> sanjay
>
>
>
>
Pointing SparkSQL to existing Hive Metadata with data file
locations in HDFS
Posted by Sanjay Subramanian <sa...@yahoo.com>.
hey guys
On the Hive/Hadoop ecosystem we have using Cloudera distribution CDH 5.2.x , there are about 300+ hive tables.The data is stored an text (moving slowly to Parquet) on HDFS.I want to use SparkSQL and point to the Hive metadata and be able to define JOINS etc using a programming structure like this
import org.apache.spark.sql.hive.HiveContextval sqlContext = new HiveContext(sc)val schemaRdd = sqlContext.sql("some complex SQL")
Is that the way to go ? Some guidance will be great.
thanks
sanjay
Re: Hive on Spark VS Spark SQL
Posted by Xuefu Zhang <xz...@cloudera.com>.
Hi Cheolsoo,
Thanks for the correction. I took that for granted and didn't actually
check the code to verify. Yes, from the Spark version (1.2), I did see
their parser etc. Below is a portion of the README from Spark's sql package
for reference.
Thanks,
Xuefu
Spark SQL is broken up into four subprojects:
- Catalyst (sql/catalyst) - An implementation-agnostic framework for
manipulating trees of relational operators and expressions.
- Execution (sql/core) - A query planner / execution engine for
translating Catalyst’s logical query plans into Spark RDDs. This component
also includes a new public interface, SQLContext, that allows users to
execute SQL or LINQ statements against existing RDDs and Parquet files.
- Hive Support (sql/hive) - Includes an extension of SQLContext called
HiveContext that allows users to write queries using *a subset of HiveQL*
and access data from a Hive Metastore using Hive SerDes. There are also
wrappers that allows users to run queries that include Hive UDFs, UDAFs,
and UDTFs.
- HiveServer and CLI support (sql/hive-thriftserver) - Includes support
for the SQL CLI (bin/spark-sql) and a HiveServer2 (for JDBC/ODBC)
compatible server.
On Thu, May 21, 2015 at 10:31 PM, Cheolsoo Park <pi...@gmail.com>
wrote:
> Hi Xuefu,
>
> Thanks for the good comparison. I agree with most points, but #1 isn't
> true.
>
> SparkSQL has its own parser (implemented with Scala parser combinator
> library), analyzer, and optimizer although they're not as mature as Hive.
> What it depends on Hive for is Metastore, CliDriver, DDL parser, etc.
>
> Cheolsoo
>
> On Wed, May 20, 2015 at 10:45 AM, Xuefu Zhang <xz...@cloudera.com> wrote:
>
>> I have been working on HIve on Spark, and knows a little about SparkSQL.
>> Here are a few factors to be considered:
>>
>> 1. SparkSQL is similar to Shark (discontinued) in that it clones Hive's
>> front end (parser and semantic analyzer) and metastore, and inject in
>> between a laryer where Hive's operator tree is reinterpreted in Spark's
>> constructs (transactions and actions). Thus, it's tied to a specific
>> version of Hive, which is always behind official Hive releases.
>> 2. Because of the reinterpretation, many features (window functions,
>> lateral views, etc) from Hive need to be reimplemented in Spark world. If
>> an implementation hasn't been done, you see a gap. That's why you would
>> expect functional disparity, not to mention future Hive futures.
>> 3. SparkSQL is far from production ready.
>> 4. On the other hand, Hive on Spark is native in Hive, embracing all Hive
>> features and growing with Hive. Hive's operators are honored without
>> re-interpretation. The integration is done at the execution layer, where
>> Spark is nothing but an advanced MapReduce engine.
>> 5. Hive is aiming at enterprise use cases, where there are more important
>> concerns such as security than purely if it works or if it runs fast. Hive
>> on Spark certainly makes the query run faster, but still keeps the same
>> enterprise-readiness.
>> 6. SparkSQL is a good fit if you're a heavy Spark user who occasionally
>> needs to run some SQL. Or you're a casual SQL user and like to try
>> something new.
>> 7. If haven't touched either Spark or Hive, I'd suggest you start with
>> Hive, especially for an enterprise.
>> 8. If you're an existing Hive user and consider taking advantage of
>> Spark, consider Hive on Spark.
>> 9. It's strongly discouraged to mix Hive and SparkSQL in your deployment.
>> SparkSQL includes a version of Hive, which is very likely at a different
>> version of the Hive that you have (even if you don't use Hive on Spark).
>> Library conflicts can put you in a nightmare.
>> 10. I haven't benchmarked SparkSQL myself, but I heard several reports
>> that SparkSQL, when being tried at scale, is either fast or failing your
>> queries.
>>
>> Hope this helps.
>>
>> Thanks,
>>
>>
>> On Tue, May 19, 2015 at 10:38 PM, guoqing0629@yahoo.com.hk <
>> guoqing0629@yahoo.com.hk> wrote:
>>
>>> Hive on Spark and SparkSQL which should be better , and what are the key
>>> characteristics and the advantages and the disadvantages between ?
>>>
>>> ------------------------------
>>> guoqing0629@yahoo.com.hk
>>>
>>
>>
>
Re: Hive on Spark VS Spark SQL
Posted by Cheolsoo Park <pi...@gmail.com>.
Hi Xuefu,
Thanks for the good comparison. I agree with most points, but #1 isn't true.
SparkSQL has its own parser (implemented with Scala parser combinator
library), analyzer, and optimizer although they're not as mature as Hive.
What it depends on Hive for is Metastore, CliDriver, DDL parser, etc.
Cheolsoo
On Wed, May 20, 2015 at 10:45 AM, Xuefu Zhang <xz...@cloudera.com> wrote:
> I have been working on HIve on Spark, and knows a little about SparkSQL.
> Here are a few factors to be considered:
>
> 1. SparkSQL is similar to Shark (discontinued) in that it clones Hive's
> front end (parser and semantic analyzer) and metastore, and inject in
> between a laryer where Hive's operator tree is reinterpreted in Spark's
> constructs (transactions and actions). Thus, it's tied to a specific
> version of Hive, which is always behind official Hive releases.
> 2. Because of the reinterpretation, many features (window functions,
> lateral views, etc) from Hive need to be reimplemented in Spark world. If
> an implementation hasn't been done, you see a gap. That's why you would
> expect functional disparity, not to mention future Hive futures.
> 3. SparkSQL is far from production ready.
> 4. On the other hand, Hive on Spark is native in Hive, embracing all Hive
> features and growing with Hive. Hive's operators are honored without
> re-interpretation. The integration is done at the execution layer, where
> Spark is nothing but an advanced MapReduce engine.
> 5. Hive is aiming at enterprise use cases, where there are more important
> concerns such as security than purely if it works or if it runs fast. Hive
> on Spark certainly makes the query run faster, but still keeps the same
> enterprise-readiness.
> 6. SparkSQL is a good fit if you're a heavy Spark user who occasionally
> needs to run some SQL. Or you're a casual SQL user and like to try
> something new.
> 7. If haven't touched either Spark or Hive, I'd suggest you start with
> Hive, especially for an enterprise.
> 8. If you're an existing Hive user and consider taking advantage of Spark,
> consider Hive on Spark.
> 9. It's strongly discouraged to mix Hive and SparkSQL in your deployment.
> SparkSQL includes a version of Hive, which is very likely at a different
> version of the Hive that you have (even if you don't use Hive on Spark).
> Library conflicts can put you in a nightmare.
> 10. I haven't benchmarked SparkSQL myself, but I heard several reports
> that SparkSQL, when being tried at scale, is either fast or failing your
> queries.
>
> Hope this helps.
>
> Thanks,
>
>
> On Tue, May 19, 2015 at 10:38 PM, guoqing0629@yahoo.com.hk <
> guoqing0629@yahoo.com.hk> wrote:
>
>> Hive on Spark and SparkSQL which should be better , and what are the key
>> characteristics and the advantages and the disadvantages between ?
>>
>> ------------------------------
>> guoqing0629@yahoo.com.hk
>>
>
>
Re: Hive on Spark VS Spark SQL
Posted by Xuefu Zhang <xz...@cloudera.com>.
I have been working on HIve on Spark, and knows a little about SparkSQL.
Here are a few factors to be considered:
1. SparkSQL is similar to Shark (discontinued) in that it clones Hive's
front end (parser and semantic analyzer) and metastore, and inject in
between a laryer where Hive's operator tree is reinterpreted in Spark's
constructs (transactions and actions). Thus, it's tied to a specific
version of Hive, which is always behind official Hive releases.
2. Because of the reinterpretation, many features (window functions,
lateral views, etc) from Hive need to be reimplemented in Spark world. If
an implementation hasn't been done, you see a gap. That's why you would
expect functional disparity, not to mention future Hive futures.
3. SparkSQL is far from production ready.
4. On the other hand, Hive on Spark is native in Hive, embracing all Hive
features and growing with Hive. Hive's operators are honored without
re-interpretation. The integration is done at the execution layer, where
Spark is nothing but an advanced MapReduce engine.
5. Hive is aiming at enterprise use cases, where there are more important
concerns such as security than purely if it works or if it runs fast. Hive
on Spark certainly makes the query run faster, but still keeps the same
enterprise-readiness.
6. SparkSQL is a good fit if you're a heavy Spark user who occasionally
needs to run some SQL. Or you're a casual SQL user and like to try
something new.
7. If haven't touched either Spark or Hive, I'd suggest you start with
Hive, especially for an enterprise.
8. If you're an existing Hive user and consider taking advantage of Spark,
consider Hive on Spark.
9. It's strongly discouraged to mix Hive and SparkSQL in your deployment.
SparkSQL includes a version of Hive, which is very likely at a different
version of the Hive that you have (even if you don't use Hive on Spark).
Library conflicts can put you in a nightmare.
10. I haven't benchmarked SparkSQL myself, but I heard several reports that
SparkSQL, when being tried at scale, is either fast or failing your queries.
Hope this helps.
Thanks,
On Tue, May 19, 2015 at 10:38 PM, guoqing0629@yahoo.com.hk <
guoqing0629@yahoo.com.hk> wrote:
> Hive on Spark and SparkSQL which should be better , and what are the key
> characteristics and the advantages and the disadvantages between ?
>
> ------------------------------
> guoqing0629@yahoo.com.hk
>