You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by 理 <ww...@126.com> on 2016/08/08 00:27:42 UTC

hive will die or not?

hi,
  sparksql improve  so fast,   both  hive and sparksql  are similar,  so hive  will  lost  or not? 


regards

Re: Re: hive will die or not?

Posted by Mich Talebzadeh <mi...@gmail.com>.

I am afraid your points (the original thread owner) are manifestly
misleading or at best half baked. Given a set of parameters one can argue
from any angle. Why use Spark but not Flink. Why use this and not that.
These are cyclic arguments.


   - Hive can use Spark as its execution engine with excellent results
   compared to map-reduce. It does not mean that map-reduce is out of picture.
   It can also use Tez+LLAP as its execution engine. I think this shows how
   versatile Hive is.
   - Transactional support was added to Hive for ORC tables.
   - No transactional support with Spark SQL on ORC tables yet or on any
   other DB
   - Locking and concurrency (as used by Hive) with Spark app running a
   Hive context. I am not convinced this works with Spark SQL
   - Spark as yet does not have a Cost Based Optimizer (CBO).
   - Spark has a complete fork of Hive inside it. *Spark SQL is a sub-set
   of Hive SQL*
   - Hive was billed as a Data Warehouse (DW)  on HDFS.
   - Hive is the most versatile and capable of the many SQL or SQL-like
   ways of accessing data on Hadoop
   - You can set up your copy of your RDBMS table in Hive in no time and
   use Sqoop to get the table data into Hive table practically in one command.
   For many this is the
   - great attraction of Hive
   - Ability to do real time analytics on Hive by sending real time
   transactional movements from RDBMS tables to Hive via the existing
   replication technologies. This is very handy. Today, organizations are
   struggling to achieve real-time integration between RDBMS silos and Big
   Data. Fast decision-making depends on real-time data movement that
   allows businesses to gather data from multiple locations into Big Data as
   well as conventional data warehouses.
   - Spark Thrift Server is basically Hive thrift server and without it
   would not exist
   - Without Hive and HiveContext there would not be Spark-sql


I am a fan of Spark and use it extensively. However, you have to consider
the use case when talking about a product.

HTH












Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 8 August 2016 at 02:49, 理 <ww...@126.com> wrote:

> in  my opinion, multiple  engine  is not  advantage,  but reverse.  it
>  disperse  the dev energy.
>   consider  the activity ,sparksql  support  all  tpc ds without modify
> syntax!  but  hive cannot.
> consider the tech,   dag, vectorization,   etc sparksql also has,   seems
> the  code  is  more   efficiently.
>
>
> regards
> On 08/08/2016 08:48, Will Du <wi...@gmail.com> wrote:
>
> First, hive supports different engines. Look forward it's dynamic engine
> switch
> Second, look forward hadoop 3rd gen and map reduce on memory will fill the
> gap
>
> Thanks,
> Will
>
> On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:
>
> hi,
>   sparksql improve  so fast,   both  hive and sparksql  are similar,  so
> hive  will  lost  or not?
>
> regards
>
>
>
>
>
>

Re: hive will die or not?

Posted by Mich Talebzadeh <mi...@gmail.com>.

If LLAP is integrated well into Hive to kind of making it an in-memory
offerings plus using either Spark or Tez+LLAP as the execution engines,
then I think Hive will be a great integrated platform in the ecosystem.

Spark is a very good query tool but operates in a different space compared
to Hive which is a Data Warehouse.

HTH



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 13 August 2016 at 16:09, Thejas Nair <th...@gmail.com> wrote:

> In addition, there is also lot more coming with LLAP.
>
> http://www.slideshare.net/HadoopSummit/llap-subsecond-
> analytical-queries-in-hive
>
> There is also no fine grained access control natively in Spark.
> LLAP would help with that as well - http://www.slideshare.net/
> HadoopSummit/finegrained-security-for-spark-and-hive
>
>
>
> On Sun, Aug 7, 2016 at 7:24 PM, Marcin Tustin <mt...@handybook.com>
> wrote:
>
>> I think that's right. My testing (not very scientific) puts it on par for
>> redshift for the datasets I use.
>>
>>
>> On Sunday, August 7, 2016, Edward Capriolo <ed...@gmail.com> wrote:
>>
>>> A few entities going to "kill/take out/better than hive"
>>> I seem to remember HadoopDb, Impala, RedShift , voltdb...
>>>
>>> But apparent hive is still around and probably faster
>>> http://www.slideshare.net/hortonworks/hive-on-spark-is-blazi
>>> ng-fast-or-is-it-final
>>>
>>>
>>>
>>>
>>> On Sun, Aug 7, 2016 at 9:49 PM, 理 <ww...@126.com> wrote:
>>>
>>>> in  my opinion, multiple  engine  is not  advantage,  but reverse.  it
>>>>  disperse  the dev energy.
>>>>   consider  the activity ,sparksql  support  all  tpc ds without modify
>>>> syntax!  but  hive cannot.
>>>> consider the tech,   dag, vectorization,   etc sparksql also has,
>>>> seems the  code  is  more   efficiently.
>>>>
>>>>
>>>> regards
>>>> On 08/08/2016 08:48, Will Du wrote:
>>>>
>>>> First, hive supports different engines. Look forward it's dynamic
>>>> engine switch
>>>> Second, look forward hadoop 3rd gen and map reduce on memory will fill
>>>> the gap
>>>>
>>>> Thanks,
>>>> Will
>>>>
>>>> On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:
>>>>
>>>> hi,
>>>>   sparksql improve  so fast,   both  hive and sparksql  are similar,
>>>>  so hive  will  lost  or not?
>>>>
>>>> regards
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>> Want to work at Handy? Check out our culture deck and open roles
>> <http://www.handy.com/careers>
>> Latest news <http://www.handy.com/press> at Handy
>> Handy just raised $50m
>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led
>> by Fidelity
>>
>>
>

Re: hive will die or not?

Posted by Thejas Nair <th...@gmail.com>.

In addition, there is also lot more coming with LLAP.

http://www.slideshare.net/HadoopSummit/llap-subsecond-analytical-queries-in-hive

There is also no fine grained access control natively in Spark.
LLAP would help with that as well -
http://www.slideshare.net/HadoopSummit/finegrained-security-for-spark-and-hive



On Sun, Aug 7, 2016 at 7:24 PM, Marcin Tustin <mt...@handybook.com> wrote:

> I think that's right. My testing (not very scientific) puts it on par for
> redshift for the datasets I use.
>
>
> On Sunday, August 7, 2016, Edward Capriolo <ed...@gmail.com> wrote:
>
>> A few entities going to "kill/take out/better than hive"
>> I seem to remember HadoopDb, Impala, RedShift , voltdb...
>>
>> But apparent hive is still around and probably faster
>> http://www.slideshare.net/hortonworks/hive-on-spark-is-blazi
>> ng-fast-or-is-it-final
>>
>>
>>
>>
>> On Sun, Aug 7, 2016 at 9:49 PM, 理 <ww...@126.com> wrote:
>>
>>> in  my opinion, multiple  engine  is not  advantage,  but reverse.  it
>>>  disperse  the dev energy.
>>>   consider  the activity ,sparksql  support  all  tpc ds without modify
>>> syntax!  but  hive cannot.
>>> consider the tech,   dag, vectorization,   etc sparksql also has,
>>> seems the  code  is  more   efficiently.
>>>
>>>
>>> regards
>>> On 08/08/2016 08:48, Will Du wrote:
>>>
>>> First, hive supports different engines. Look forward it's dynamic engine
>>> switch
>>> Second, look forward hadoop 3rd gen and map reduce on memory will fill
>>> the gap
>>>
>>> Thanks,
>>> Will
>>>
>>> On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:
>>>
>>> hi,
>>>   sparksql improve  so fast,   both  hive and sparksql  are similar,  so
>>> hive  will  lost  or not?
>>>
>>> regards
>>>
>>>
>>>
>>>
>>>
>>>
>>
> Want to work at Handy? Check out our culture deck and open roles
> <http://www.handy.com/careers>
> Latest news <http://www.handy.com/press> at Handy
> Handy just raised $50m
> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led
> by Fidelity
>
>

Re: hive will die or not?

Posted by Marcin Tustin <mt...@handybook.com>.

I think that's right. My testing (not very scientific) puts it on par for
redshift for the datasets I use.

On Sunday, August 7, 2016, Edward Capriolo <ed...@gmail.com> wrote:

> A few entities going to "kill/take out/better than hive"
> I seem to remember HadoopDb, Impala, RedShift , voltdb...
>
> But apparent hive is still around and probably faster
> http://www.slideshare.net/hortonworks/hive-on-spark-is-
> blazing-fast-or-is-it-final
>
>
>
>
> On Sun, Aug 7, 2016 at 9:49 PM, 理 <wwli05@126.com
> <javascript:_e(%7B%7D,'cvml','wwli05@126.com');>> wrote:
>
>> in  my opinion, multiple  engine  is not  advantage,  but reverse.  it
>>  disperse  the dev energy.
>>   consider  the activity ,sparksql  support  all  tpc ds without modify
>> syntax!  but  hive cannot.
>> consider the tech,   dag, vectorization,   etc sparksql also has,   seems
>> the  code  is  more   efficiently.
>>
>>
>> regards
>> On 08/08/2016 08:48, Will Du
>> <javascript:_e(%7B%7D,'cvml','willddy@gmail.com');> wrote:
>>
>> First, hive supports different engines. Look forward it's dynamic engine
>> switch
>> Second, look forward hadoop 3rd gen and map reduce on memory will fill
>> the gap
>>
>> Thanks,
>> Will
>>
>> On 2016年8月7日, at 20:27, 理 <wwli05@126.com
>> <javascript:_e(%7B%7D,'cvml','wwli05@126.com');>> wrote:
>>
>> hi,
>>   sparksql improve  so fast,   both  hive and sparksql  are similar,  so
>> hive  will  lost  or not?
>>
>> regards
>>
>>
>>
>>
>>
>>
>

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led 
by Fidelity

Re: Re: Re: hive will die or not?

Posted by 理 <ww...@126.com>.




regards wenli
On 08/08/2016 10:16, Edward Capriolo wrote:
A few entities going to "kill/take out/better than hive"

I seem to remember HadoopDb, Impala, RedShift , voltdb...


But apparent hive is still around and probably faster
http://www.slideshare.net/hortonworks/hive-on-spark-is-blazing-fast-or-is-it-final








On Sun, Aug 7, 2016 at 9:49 PM, 理 <ww...@126.com> wrote:

in  my opinion, multiple  engine  is not  advantage,  but reverse.  it  disperse  the dev energy. 
  consider  the activity ,sparksql  support  all  tpc ds without modify syntax!  but  hive cannot.
consider the tech,   dag, vectorization,   etc sparksql also has,   seems the  code  is  more   efficiently.




regards
On 08/08/2016 08:48, Will Du wrote:
First, hive supports different engines. Look forward it's dynamic engine switch
Second, look forward hadoop 3rd gen and map reduce on memory will fill the gap

Thanks,
Will

On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:


hi,
  sparksql improve  so fast,   both  hive and sparksql  are similar,  so hive  will  lost  or not? 


regards

Re: Re: hive will die or not?

Posted by Edward Capriolo <ed...@gmail.com>.

A few entities going to "kill/take out/better than hive"
I seem to remember HadoopDb, Impala, RedShift , voltdb...

But apparent hive is still around and probably faster
http://www.slideshare.net/hortonworks/hive-on-spark-is-blazing-fast-or-is-it-final




On Sun, Aug 7, 2016 at 9:49 PM, 理 <ww...@126.com> wrote:

> in  my opinion, multiple  engine  is not  advantage,  but reverse.  it
>  disperse  the dev energy.
>   consider  the activity ,sparksql  support  all  tpc ds without modify
> syntax!  but  hive cannot.
> consider the tech,   dag, vectorization,   etc sparksql also has,   seems
> the  code  is  more   efficiently.
>
>
> regards
> On 08/08/2016 08:48, Will Du <wi...@gmail.com> wrote:
>
> First, hive supports different engines. Look forward it's dynamic engine
> switch
> Second, look forward hadoop 3rd gen and map reduce on memory will fill the
> gap
>
> Thanks,
> Will
>
> On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:
>
> hi,
>   sparksql improve  so fast,   both  hive and sparksql  are similar,  so
> hive  will  lost  or not?
>
> regards
>
>
>
>
>
>

Re: hive will die or not?

Posted by Mich Talebzadeh <mi...@gmail.com>.

...I do not agree with you...

Yeah right. I am so upset. Was waiting for your nod

LOL

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 14 August 2016 at 12:40, Jörn Franke <jo...@gmail.com> wrote:

> I do not agree with you . Competition is needed for innovation. I would be
> more concerned if there is no innovation. This is really bad for companies.
>
> Besides, as others pointed out Spark and Hive have different use cases.
> There are use cases where Spark is better and other where Hive is better.
> There are also a lot of use cases where both are useless.
>
> On 08 Aug 2016, at 03:49, 理 <ww...@126.com> wrote:
>
> in  my opinion, multiple  engine  is not  advantage,  but reverse.  it
>  disperse  the dev energy.
>   consider  the activity ,sparksql  support  all  tpc ds without modify
> syntax!  but  hive cannot.
> consider the tech,   dag, vectorization,   etc sparksql also has,   seems
> the  code  is  more   efficiently.
>
>
> regards
> On 08/08/2016 08:48, Will Du <wi...@gmail.com> wrote:
>
> First, hive supports different engines. Look forward it's dynamic engine
> switch
> Second, look forward hadoop 3rd gen and map reduce on memory will fill the
> gap
>
> Thanks,
> Will
>
> On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:
>
> hi,
>   sparksql improve  so fast,   both  hive and sparksql  are similar,  so
> hive  will  lost  or not?
>
> regards
>
>
>
>
>
>

Re: hive will die or not?

Posted by Jörn Franke <jo...@gmail.com>.

I do not agree with you . Competition is needed for innovation. I would be more concerned if there is no innovation. This is really bad for companies.

Besides, as others pointed out Spark and Hive have different use cases. There are use cases where Spark is better and other where Hive is better. There are also a lot of use cases where both are useless.

> On 08 Aug 2016, at 03:49, 理 <ww...@126.com> wrote:
> 
> in  my opinion, multiple  engine  is not  advantage,  but reverse.  it  disperse  the dev energy. 
>   consider  the activity ,sparksql  support  all  tpc ds without modify syntax!  but  hive cannot.
> consider the tech,   dag, vectorization,   etc sparksql also has,   seems the  code  is  more   efficiently.
> 
> 
> regards
> On 08/08/2016 08:48, Will Du wrote:
> First, hive supports different engines. Look forward it's dynamic engine switch
> Second, look forward hadoop 3rd gen and map reduce on memory will fill the gap
> 
> Thanks,
> Will
> 
>> On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:
>> 
>> hi,
>>   sparksql improve  so fast,   both  hive and sparksql  are similar,  so hive  will  lost  or not? 
>> 
>> regards
> 
>

Re: Re: hive will die or not?

Posted by 理 <ww...@126.com>.

in  my opinion, multiple  engine  is not  advantage,  but reverse.  it  disperse  the dev energy. 
  consider  the activity ,sparksql  support  all  tpc ds without modify syntax!  but  hive cannot.
consider the tech,   dag, vectorization,   etc sparksql also has,   seems the  code  is  more   efficiently.




regards
On 08/08/2016 08:48, Will Du wrote:
First, hive supports different engines. Look forward it's dynamic engine switch
Second, look forward hadoop 3rd gen and map reduce on memory will fill the gap

Thanks,
Will

On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:


hi,
  sparksql improve  so fast,   both  hive and sparksql  are similar,  so hive  will  lost  or not? 


regards

Re: hive will die or not?

Posted by Will Du <wi...@gmail.com>.

First, hive supports different engines. Look forward it's dynamic engine switch
Second, look forward hadoop 3rd gen and map reduce on memory will fill the gap

Thanks,
Will

> On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:
> 
> hi,
>   sparksql improve  so fast,   both  hive and sparksql  are similar,  so hive  will  lost  or not? 
> 
> regards
> 
> 
>