You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by 理 <ww...@126.com> on 2016/08/08 00:27:42 UTC
hive will die or not?
hi,
sparksql improve so fast, both hive and sparksql are similar, so hive will lost or not?
regards
Re: Re: hive will die or not?
Posted by Mich Talebzadeh <mi...@gmail.com>.
I am afraid your points (the original thread owner) are manifestly
misleading or at best half baked. Given a set of parameters one can argue
from any angle. Why use Spark but not Flink. Why use this and not that.
These are cyclic arguments.
- Hive can use Spark as its execution engine with excellent results
compared to map-reduce. It does not mean that map-reduce is out of picture.
It can also use Tez+LLAP as its execution engine. I think this shows how
versatile Hive is.
- Transactional support was added to Hive for ORC tables.
- No transactional support with Spark SQL on ORC tables yet or on any
other DB
- Locking and concurrency (as used by Hive) with Spark app running a
Hive context. I am not convinced this works with Spark SQL
- Spark as yet does not have a Cost Based Optimizer (CBO).
- Spark has a complete fork of Hive inside it. *Spark SQL is a sub-set
of Hive SQL*
- Hive was billed as a Data Warehouse (DW) on HDFS.
- Hive is the most versatile and capable of the many SQL or SQL-like
ways of accessing data on Hadoop
- You can set up your copy of your RDBMS table in Hive in no time and
use Sqoop to get the table data into Hive table practically in one command.
For many this is the
- great attraction of Hive
- Ability to do real time analytics on Hive by sending real time
transactional movements from RDBMS tables to Hive via the existing
replication technologies. This is very handy. Today, organizations are
struggling to achieve real-time integration between RDBMS silos and Big
Data. Fast decision-making depends on real-time data movement that
allows businesses to gather data from multiple locations into Big Data as
well as conventional data warehouses.
- Spark Thrift Server is basically Hive thrift server and without it
would not exist
- Without Hive and HiveContext there would not be Spark-sql
I am a fan of Spark and use it extensively. However, you have to consider
the use case when talking about a product.
HTH
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On 8 August 2016 at 02:49, 理 <ww...@126.com> wrote:
> in my opinion, multiple engine is not advantage, but reverse. it
> disperse the dev energy.
> consider the activity ,sparksql support all tpc ds without modify
> syntax! but hive cannot.
> consider the tech, dag, vectorization, etc sparksql also has, seems
> the code is more efficiently.
>
>
> regards
> On 08/08/2016 08:48, Will Du <wi...@gmail.com> wrote:
>
> First, hive supports different engines. Look forward it's dynamic engine
> switch
> Second, look forward hadoop 3rd gen and map reduce on memory will fill the
> gap
>
> Thanks,
> Will
>
> On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:
>
> hi,
> sparksql improve so fast, both hive and sparksql are similar, so
> hive will lost or not?
>
> regards
>
>
>
>
>
>
Re: hive will die or not?
Posted by Mich Talebzadeh <mi...@gmail.com>.
If LLAP is integrated well into Hive to kind of making it an in-memory
offerings plus using either Spark or Tez+LLAP as the execution engines,
then I think Hive will be a great integrated platform in the ecosystem.
Spark is a very good query tool but operates in a different space compared
to Hive which is a Data Warehouse.
HTH
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On 13 August 2016 at 16:09, Thejas Nair <th...@gmail.com> wrote:
> In addition, there is also lot more coming with LLAP.
>
> http://www.slideshare.net/HadoopSummit/llap-subsecond-
> analytical-queries-in-hive
>
> There is also no fine grained access control natively in Spark.
> LLAP would help with that as well - http://www.slideshare.net/
> HadoopSummit/finegrained-security-for-spark-and-hive
>
>
>
> On Sun, Aug 7, 2016 at 7:24 PM, Marcin Tustin <mt...@handybook.com>
> wrote:
>
>> I think that's right. My testing (not very scientific) puts it on par for
>> redshift for the datasets I use.
>>
>>
>> On Sunday, August 7, 2016, Edward Capriolo <ed...@gmail.com> wrote:
>>
>>> A few entities going to "kill/take out/better than hive"
>>> I seem to remember HadoopDb, Impala, RedShift , voltdb...
>>>
>>> But apparent hive is still around and probably faster
>>> http://www.slideshare.net/hortonworks/hive-on-spark-is-blazi
>>> ng-fast-or-is-it-final
>>>
>>>
>>>
>>>
>>> On Sun, Aug 7, 2016 at 9:49 PM, 理 <ww...@126.com> wrote:
>>>
>>>> in my opinion, multiple engine is not advantage, but reverse. it
>>>> disperse the dev energy.
>>>> consider the activity ,sparksql support all tpc ds without modify
>>>> syntax! but hive cannot.
>>>> consider the tech, dag, vectorization, etc sparksql also has,
>>>> seems the code is more efficiently.
>>>>
>>>>
>>>> regards
>>>> On 08/08/2016 08:48, Will Du wrote:
>>>>
>>>> First, hive supports different engines. Look forward it's dynamic
>>>> engine switch
>>>> Second, look forward hadoop 3rd gen and map reduce on memory will fill
>>>> the gap
>>>>
>>>> Thanks,
>>>> Will
>>>>
>>>> On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:
>>>>
>>>> hi,
>>>> sparksql improve so fast, both hive and sparksql are similar,
>>>> so hive will lost or not?
>>>>
>>>> regards
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>> Want to work at Handy? Check out our culture deck and open roles
>> <http://www.handy.com/careers>
>> Latest news <http://www.handy.com/press> at Handy
>> Handy just raised $50m
>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led
>> by Fidelity
>>
>>
>
Re: hive will die or not?
Posted by Thejas Nair <th...@gmail.com>.
In addition, there is also lot more coming with LLAP.
http://www.slideshare.net/HadoopSummit/llap-subsecond-analytical-queries-in-hive
There is also no fine grained access control natively in Spark.
LLAP would help with that as well -
http://www.slideshare.net/HadoopSummit/finegrained-security-for-spark-and-hive
On Sun, Aug 7, 2016 at 7:24 PM, Marcin Tustin <mt...@handybook.com> wrote:
> I think that's right. My testing (not very scientific) puts it on par for
> redshift for the datasets I use.
>
>
> On Sunday, August 7, 2016, Edward Capriolo <ed...@gmail.com> wrote:
>
>> A few entities going to "kill/take out/better than hive"
>> I seem to remember HadoopDb, Impala, RedShift , voltdb...
>>
>> But apparent hive is still around and probably faster
>> http://www.slideshare.net/hortonworks/hive-on-spark-is-blazi
>> ng-fast-or-is-it-final
>>
>>
>>
>>
>> On Sun, Aug 7, 2016 at 9:49 PM, 理 <ww...@126.com> wrote:
>>
>>> in my opinion, multiple engine is not advantage, but reverse. it
>>> disperse the dev energy.
>>> consider the activity ,sparksql support all tpc ds without modify
>>> syntax! but hive cannot.
>>> consider the tech, dag, vectorization, etc sparksql also has,
>>> seems the code is more efficiently.
>>>
>>>
>>> regards
>>> On 08/08/2016 08:48, Will Du wrote:
>>>
>>> First, hive supports different engines. Look forward it's dynamic engine
>>> switch
>>> Second, look forward hadoop 3rd gen and map reduce on memory will fill
>>> the gap
>>>
>>> Thanks,
>>> Will
>>>
>>> On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:
>>>
>>> hi,
>>> sparksql improve so fast, both hive and sparksql are similar, so
>>> hive will lost or not?
>>>
>>> regards
>>>
>>>
>>>
>>>
>>>
>>>
>>
> Want to work at Handy? Check out our culture deck and open roles
> <http://www.handy.com/careers>
> Latest news <http://www.handy.com/press> at Handy
> Handy just raised $50m
> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led
> by Fidelity
>
>
Re: hive will die or not?
Posted by Marcin Tustin <mt...@handybook.com>.
I think that's right. My testing (not very scientific) puts it on par for
redshift for the datasets I use.
On Sunday, August 7, 2016, Edward Capriolo <ed...@gmail.com> wrote:
> A few entities going to "kill/take out/better than hive"
> I seem to remember HadoopDb, Impala, RedShift , voltdb...
>
> But apparent hive is still around and probably faster
> http://www.slideshare.net/hortonworks/hive-on-spark-is-
> blazing-fast-or-is-it-final
>
>
>
>
> On Sun, Aug 7, 2016 at 9:49 PM, 理 <wwli05@126.com
> <javascript:_e(%7B%7D,'cvml','wwli05@126.com');>> wrote:
>
>> in my opinion, multiple engine is not advantage, but reverse. it
>> disperse the dev energy.
>> consider the activity ,sparksql support all tpc ds without modify
>> syntax! but hive cannot.
>> consider the tech, dag, vectorization, etc sparksql also has, seems
>> the code is more efficiently.
>>
>>
>> regards
>> On 08/08/2016 08:48, Will Du
>> <javascript:_e(%7B%7D,'cvml','willddy@gmail.com');> wrote:
>>
>> First, hive supports different engines. Look forward it's dynamic engine
>> switch
>> Second, look forward hadoop 3rd gen and map reduce on memory will fill
>> the gap
>>
>> Thanks,
>> Will
>>
>> On 2016年8月7日, at 20:27, 理 <wwli05@126.com
>> <javascript:_e(%7B%7D,'cvml','wwli05@126.com');>> wrote:
>>
>> hi,
>> sparksql improve so fast, both hive and sparksql are similar, so
>> hive will lost or not?
>>
>> regards
>>
>>
>>
>>
>>
>>
>
--
Want to work at Handy? Check out our culture deck and open roles
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led
by Fidelity
Re: Re: Re: hive will die or not?
Posted by 理 <ww...@126.com>.
regards wenli
On 08/08/2016 10:16, Edward Capriolo wrote:
A few entities going to "kill/take out/better than hive"
I seem to remember HadoopDb, Impala, RedShift , voltdb...
But apparent hive is still around and probably faster
http://www.slideshare.net/hortonworks/hive-on-spark-is-blazing-fast-or-is-it-final
On Sun, Aug 7, 2016 at 9:49 PM, 理 <ww...@126.com> wrote:
in my opinion, multiple engine is not advantage, but reverse. it disperse the dev energy.
consider the activity ,sparksql support all tpc ds without modify syntax! but hive cannot.
consider the tech, dag, vectorization, etc sparksql also has, seems the code is more efficiently.
regards
On 08/08/2016 08:48, Will Du wrote:
First, hive supports different engines. Look forward it's dynamic engine switch
Second, look forward hadoop 3rd gen and map reduce on memory will fill the gap
Thanks,
Will
On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:
hi,
sparksql improve so fast, both hive and sparksql are similar, so hive will lost or not?
regards
Re: Re: hive will die or not?
Posted by Edward Capriolo <ed...@gmail.com>.
A few entities going to "kill/take out/better than hive"
I seem to remember HadoopDb, Impala, RedShift , voltdb...
But apparent hive is still around and probably faster
http://www.slideshare.net/hortonworks/hive-on-spark-is-blazing-fast-or-is-it-final
On Sun, Aug 7, 2016 at 9:49 PM, 理 <ww...@126.com> wrote:
> in my opinion, multiple engine is not advantage, but reverse. it
> disperse the dev energy.
> consider the activity ,sparksql support all tpc ds without modify
> syntax! but hive cannot.
> consider the tech, dag, vectorization, etc sparksql also has, seems
> the code is more efficiently.
>
>
> regards
> On 08/08/2016 08:48, Will Du <wi...@gmail.com> wrote:
>
> First, hive supports different engines. Look forward it's dynamic engine
> switch
> Second, look forward hadoop 3rd gen and map reduce on memory will fill the
> gap
>
> Thanks,
> Will
>
> On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:
>
> hi,
> sparksql improve so fast, both hive and sparksql are similar, so
> hive will lost or not?
>
> regards
>
>
>
>
>
>
Re: hive will die or not?
Posted by Mich Talebzadeh <mi...@gmail.com>.
...I do not agree with you...
Yeah right. I am so upset. Was waiting for your nod
LOL
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On 14 August 2016 at 12:40, Jörn Franke <jo...@gmail.com> wrote:
> I do not agree with you . Competition is needed for innovation. I would be
> more concerned if there is no innovation. This is really bad for companies.
>
> Besides, as others pointed out Spark and Hive have different use cases.
> There are use cases where Spark is better and other where Hive is better.
> There are also a lot of use cases where both are useless.
>
> On 08 Aug 2016, at 03:49, 理 <ww...@126.com> wrote:
>
> in my opinion, multiple engine is not advantage, but reverse. it
> disperse the dev energy.
> consider the activity ,sparksql support all tpc ds without modify
> syntax! but hive cannot.
> consider the tech, dag, vectorization, etc sparksql also has, seems
> the code is more efficiently.
>
>
> regards
> On 08/08/2016 08:48, Will Du <wi...@gmail.com> wrote:
>
> First, hive supports different engines. Look forward it's dynamic engine
> switch
> Second, look forward hadoop 3rd gen and map reduce on memory will fill the
> gap
>
> Thanks,
> Will
>
> On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:
>
> hi,
> sparksql improve so fast, both hive and sparksql are similar, so
> hive will lost or not?
>
> regards
>
>
>
>
>
>
Re: hive will die or not?
Posted by Jörn Franke <jo...@gmail.com>.
I do not agree with you . Competition is needed for innovation. I would be more concerned if there is no innovation. This is really bad for companies.
Besides, as others pointed out Spark and Hive have different use cases. There are use cases where Spark is better and other where Hive is better. There are also a lot of use cases where both are useless.
> On 08 Aug 2016, at 03:49, 理 <ww...@126.com> wrote:
>
> in my opinion, multiple engine is not advantage, but reverse. it disperse the dev energy.
> consider the activity ,sparksql support all tpc ds without modify syntax! but hive cannot.
> consider the tech, dag, vectorization, etc sparksql also has, seems the code is more efficiently.
>
>
> regards
> On 08/08/2016 08:48, Will Du wrote:
> First, hive supports different engines. Look forward it's dynamic engine switch
> Second, look forward hadoop 3rd gen and map reduce on memory will fill the gap
>
> Thanks,
> Will
>
>> On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:
>>
>> hi,
>> sparksql improve so fast, both hive and sparksql are similar, so hive will lost or not?
>>
>> regards
>
>
Re: Re: hive will die or not?
Posted by 理 <ww...@126.com>.
in my opinion, multiple engine is not advantage, but reverse. it disperse the dev energy.
consider the activity ,sparksql support all tpc ds without modify syntax! but hive cannot.
consider the tech, dag, vectorization, etc sparksql also has, seems the code is more efficiently.
regards
On 08/08/2016 08:48, Will Du wrote:
First, hive supports different engines. Look forward it's dynamic engine switch
Second, look forward hadoop 3rd gen and map reduce on memory will fill the gap
Thanks,
Will
On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:
hi,
sparksql improve so fast, both hive and sparksql are similar, so hive will lost or not?
regards
Re: hive will die or not?
Posted by Will Du <wi...@gmail.com>.
First, hive supports different engines. Look forward it's dynamic engine switch
Second, look forward hadoop 3rd gen and map reduce on memory will fill the gap
Thanks,
Will
> On 2016年8月7日, at 20:27, 理 <ww...@126.com> wrote:
>
> hi,
> sparksql improve so fast, both hive and sparksql are similar, so hive will lost or not?
>
> regards
>
>
>