You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by RD <rd...@gmail.com> on 2016/09/14 20:36:16 UTC

Hive 2.x usage

Hi Folks,
  We  (at my org) are currently planning our move to Hive-2.x. As part of
this I wanted to get a sense of how stable the Hive-2.x release is.  I
thought it would be good to conduct a brief survey on this. I've added a
few questions below. It would really be a ton of help if folks could
provide their feedback

* Are you using Hive-2.x at your org and at what scale?
* Is the release stable enough? Did you notice any correctness issues?
* MR is deprecated in Hive-2.x (Though almost all the qtests still use MR).
Are you still using MR with Hive-2.x?
* Are you using the apache release or HDP ?

-Best,

Re: Hive 2.x usage

Posted by Mich Talebzadeh <mi...@gmail.com>.
Yep I agree with what Stephen said. I use Hive 2.0.1 and do not see an
issue so far. We also use Hive on Spark engine and of course we can switch
to MR at one command within the script.

I do not subscribe to use open source and run  for cover if things don't
work. If you are knots and bolts type, then you possibly can sort it out.
Loads of time I have seen guys waiting for a vendor's supply or fix that
could have been sort it out in a fraction of a time cause they could not be
bothered to DIY.

We are vendor agnostic and so far so good.


HTH





Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 15 September 2016 at 06:00, Stephen Sprague <sp...@gmail.com> wrote:

> > * Are you using Hive-2.x at your org and at what scale?
>
> yes. we're using 2.1.0.  1.5PB.  30 node cluster.  ~1000 jobs a day.
> And yeah hive 2.1.0 has some issues and can require some finesse wrt the
> hive-site.xml settings.
>
> > * Is the release stable enough? Did you notice any correctness issues?
>
> yes.  we did notice correctness issues.    "NOT IN" against a partition
> key silently fails.  and trouble with UNION statements also got us.
>
> > * MR is deprecated in Hive-2.x (Though almost all the qtests still use
> MR). Are you still using MR with Hive-2.x?
>
> we still use MR on YARN and have no issues with it.
>
> > * Are you using the apache release or HDP ?
>
> we're using Apache/Hive 2.1.0 on a CDH distro.  does cloudera certify it?
> probably not.  Do i care? no. :)
>
> bottom line IMHO is Hive 2.1.0 is still in an early adopter phase.
> however, if you're the do-it-yourself kind of guy then i'm sure you can
> adapt - and reap the rewards of your know-how.  If you're the type who
> needs somebody else to go to when it fails then i wouldn't jump on it just
> yet.
>
> just one opinion!  :o
>
> Cheers,
> Stephen.
>
>
> On Wed, Sep 14, 2016 at 3:53 PM, Jörn Franke <jo...@gmail.com> wrote:
>
>> If you are using a distribution (which you should if you go to production
>> - Apache releases should not be used due to the maintainability, complexity
>> and interaction with other components, such as Hadoop etc) then wait until
>> a distribution with 2.x is out. As far as i am aware there is currently no
>> such distribution. As far as i know , Hortonworks and probably also
>> Cloudera test their distributions on large scale real production systems
>> beforehand.
>>
>> I would not use MR even with 1.x and go for TEZ (except you are using
>> some very specific outdated functionality). Spark is another option, but i
>> do not see Hive on Spark as stable and less functionality - this may change
>> in the future.
>>
>> On 14 Sep 2016, at 22:36, RD <rd...@gmail.com> wrote:
>>
>> Hi Folks,
>>   We  (at my org) are currently planning our move to Hive-2.x. As part of
>> this I wanted to get a sense of how stable the Hive-2.x release is.  I
>> thought it would be good to conduct a brief survey on this. I've added a
>> few questions below. It would really be a ton of help if folks could
>> provide their feedback
>>
>> * Are you using Hive-2.x at your org and at what scale?
>> * Is the release stable enough? Did you notice any correctness issues?
>> * MR is deprecated in Hive-2.x (Though almost all the qtests still use
>> MR). Are you still using MR with Hive-2.x?
>> * Are you using the apache release or HDP ?
>>
>> -Best,
>>
>>
>>
>>
>>
>

Re: Hive 2.x usage

Posted by Stephen Sprague <sp...@gmail.com>.
> * Are you using Hive-2.x at your org and at what scale?

yes. we're using 2.1.0.  1.5PB.  30 node cluster.  ~1000 jobs a day.    And
yeah hive 2.1.0 has some issues and can require some finesse wrt the
hive-site.xml settings.

> * Is the release stable enough? Did you notice any correctness issues?

yes.  we did notice correctness issues.    "NOT IN" against a partition key
silently fails.  and trouble with UNION statements also got us.

> * MR is deprecated in Hive-2.x (Though almost all the qtests still use
MR). Are you still using MR with Hive-2.x?

we still use MR on YARN and have no issues with it.

> * Are you using the apache release or HDP ?

we're using Apache/Hive 2.1.0 on a CDH distro.  does cloudera certify it?
probably not.  Do i care? no. :)

bottom line IMHO is Hive 2.1.0 is still in an early adopter phase.
however, if you're the do-it-yourself kind of guy then i'm sure you can
adapt - and reap the rewards of your know-how.  If you're the type who
needs somebody else to go to when it fails then i wouldn't jump on it just
yet.

just one opinion!  :o

Cheers,
Stephen.


On Wed, Sep 14, 2016 at 3:53 PM, Jörn Franke <jo...@gmail.com> wrote:

> If you are using a distribution (which you should if you go to production
> - Apache releases should not be used due to the maintainability, complexity
> and interaction with other components, such as Hadoop etc) then wait until
> a distribution with 2.x is out. As far as i am aware there is currently no
> such distribution. As far as i know , Hortonworks and probably also
> Cloudera test their distributions on large scale real production systems
> beforehand.
>
> I would not use MR even with 1.x and go for TEZ (except you are using some
> very specific outdated functionality). Spark is another option, but i do
> not see Hive on Spark as stable and less functionality - this may change in
> the future.
>
> On 14 Sep 2016, at 22:36, RD <rd...@gmail.com> wrote:
>
> Hi Folks,
>   We  (at my org) are currently planning our move to Hive-2.x. As part of
> this I wanted to get a sense of how stable the Hive-2.x release is.  I
> thought it would be good to conduct a brief survey on this. I've added a
> few questions below. It would really be a ton of help if folks could
> provide their feedback
>
> * Are you using Hive-2.x at your org and at what scale?
> * Is the release stable enough? Did you notice any correctness issues?
> * MR is deprecated in Hive-2.x (Though almost all the qtests still use
> MR). Are you still using MR with Hive-2.x?
> * Are you using the apache release or HDP ?
>
> -Best,
>
>
>
>
>

Re: Hive 2.x usage

Posted by Jörn Franke <jo...@gmail.com>.
If you are using a distribution (which you should if you go to production - Apache releases should not be used due to the maintainability, complexity and interaction with other components, such as Hadoop etc) then wait until a distribution with 2.x is out. As far as i am aware there is currently no such distribution. As far as i know , Hortonworks and probably also Cloudera test their distributions on large scale real production systems beforehand.

I would not use MR even with 1.x and go for TEZ (except you are using some very specific outdated functionality). Spark is another option, but i do not see Hive on Spark as stable and less functionality - this may change in the future.

> On 14 Sep 2016, at 22:36, RD <rd...@gmail.com> wrote:
> 
> Hi Folks,
>   We  (at my org) are currently planning our move to Hive-2.x. As part of this I wanted to get a sense of how stable the Hive-2.x release is.  I thought it would be good to conduct a brief survey on this. I've added a few questions below. It would really be a ton of help if folks could provide their feedback
> 
> * Are you using Hive-2.x at your org and at what scale?
> * Is the release stable enough? Did you notice any correctness issues?
> * MR is deprecated in Hive-2.x (Though almost all the qtests still use MR). Are you still using MR with Hive-2.x?
> * Are you using the apache release or HDP ?
> 
> -Best,
> 
> 
> 
>