You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Sergey Shelukhin <se...@hortonworks.com> on 2013/12/12 01:33:16 UTC

adding ANSI flag for hive

Hi.

There's recently been some discussion about data type changes in Hive
(double to decimal), and result changes for special cases like division by
zero, etc., to bring it in compliance with MySQL (that's what JIRAs use an
example; I am assuming ANSI SQL is meant).
The latter are non-controversial (I guess), but for the former, performance
may suffer and/or backward compat may be broken if Hive is brought in
compliance.
If fuller ANSI compat is sought in the future, there may be some even
hairier issues such as double-quoted identifiers.

In light of that, and also following MySQL, I wonder if we should add a
flag, or set of flags, to HIVE to be able to force ANSI compliance.
When this/ese flag/s is/are not set, for example, int/int division could
return double for backward compat/perf, vectorization can skip the special
case handling for division by zero/etc., etc.
Wdyt?

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: adding ANSI flag for hive

Posted by Sergey Shelukhin <se...@hortonworks.com>.
Agree on both points. For now, what I had in mind was double vs decimal,
and other such backward compat vs SQL compat and potentially perf vs SQL
compact cases.


I think one flag would not be so bad...


On Mon, Dec 16, 2013 at 8:29 AM, Alan Gates <ga...@hortonworks.com> wrote:

> A couple of thoughts on this:
>
> 1) If we did this I think we should have one flag, not many.  As Thejas
> points out, your test matrix goes insane when you have too many flags and
> hence things don't get properly tested.
>
> 2) We could do this in an incremental way, where we create this new ANSI
> flag and are clear with users that for a while this will be evolving.  That
> is, as we find new issues with data types, semantics, whatever, we will
> continue to change the behavior of this flag.  At some point in the future
> (as Thejas suggests, at a 1.0 release) we could make this the default
> behavior.  This avoids having to do a full sweep now and find everything
> that we want to change and make ANSI compliant and living with whatever we
> miss.
>
> Alan.
>
> On Dec 11, 2013, at 5:14 PM, Thejas Nair wrote:
>
> > Having too many configs complicates things for the user, and also
> > complicates the code, and you also end up having many untested
> > combinations of config flags.
> > I think we should identify a bunch of non compatible changes that we
> > think are important, fix it in a branch and make a major version
> > release (say 1.x).
> >
> > This is also related to HIVE-5875, where there is a discussion on
> > switching the defaults for some of the configs to more desirable
> > values, but non backward compatible values.
> >
> > On Wed, Dec 11, 2013 at 4:33 PM, Sergey Shelukhin
> > <se...@hortonworks.com> wrote:
> >> Hi.
> >>
> >> There's recently been some discussion about data type changes in Hive
> >> (double to decimal), and result changes for special cases like division
> by
> >> zero, etc., to bring it in compliance with MySQL (that's what JIRAs use
> an
> >> example; I am assuming ANSI SQL is meant).
> >> The latter are non-controversial (I guess), but for the former,
> performance
> >> may suffer and/or backward compat may be broken if Hive is brought in
> >> compliance.
> >> If fuller ANSI compat is sought in the future, there may be some even
> >> hairier issues such as double-quoted identifiers.
> >>
> >> In light of that, and also following MySQL, I wonder if we should add a
> >> flag, or set of flags, to HIVE to be able to force ANSI compliance.
> >> When this/ese flag/s is/are not set, for example, int/int division could
> >> return double for backward compat/perf, vectorization can skip the
> special
> >> case handling for division by zero/etc., etc.
> >> Wdyt?
> >>
> >> --
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or
> entity to
> >> which it is addressed and may contain information that is confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> reader
> >> of this message is not the intended recipient, you are hereby notified
> that
> >> any printing, copying, dissemination, distribution, disclosure or
> >> forwarding of this communication is strictly prohibited. If you have
> >> received this communication in error, please contact the sender
> immediately
> >> and delete it from your system. Thank You.
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: adding ANSI flag for hive

Posted by Alan Gates <ga...@hortonworks.com>.
A couple of thoughts on this:

1) If we did this I think we should have one flag, not many.  As Thejas points out, your test matrix goes insane when you have too many flags and hence things don't get properly tested.

2) We could do this in an incremental way, where we create this new ANSI flag and are clear with users that for a while this will be evolving.  That is, as we find new issues with data types, semantics, whatever, we will continue to change the behavior of this flag.  At some point in the future (as Thejas suggests, at a 1.0 release) we could make this the default behavior.  This avoids having to do a full sweep now and find everything that we want to change and make ANSI compliant and living with whatever we miss.

Alan.

On Dec 11, 2013, at 5:14 PM, Thejas Nair wrote:

> Having too many configs complicates things for the user, and also
> complicates the code, and you also end up having many untested
> combinations of config flags.
> I think we should identify a bunch of non compatible changes that we
> think are important, fix it in a branch and make a major version
> release (say 1.x).
> 
> This is also related to HIVE-5875, where there is a discussion on
> switching the defaults for some of the configs to more desirable
> values, but non backward compatible values.
> 
> On Wed, Dec 11, 2013 at 4:33 PM, Sergey Shelukhin
> <se...@hortonworks.com> wrote:
>> Hi.
>> 
>> There's recently been some discussion about data type changes in Hive
>> (double to decimal), and result changes for special cases like division by
>> zero, etc., to bring it in compliance with MySQL (that's what JIRAs use an
>> example; I am assuming ANSI SQL is meant).
>> The latter are non-controversial (I guess), but for the former, performance
>> may suffer and/or backward compat may be broken if Hive is brought in
>> compliance.
>> If fuller ANSI compat is sought in the future, there may be some even
>> hairier issues such as double-quoted identifiers.
>> 
>> In light of that, and also following MySQL, I wonder if we should add a
>> flag, or set of flags, to HIVE to be able to force ANSI compliance.
>> When this/ese flag/s is/are not set, for example, int/int division could
>> return double for backward compat/perf, vectorization can skip the special
>> case handling for division by zero/etc., etc.
>> Wdyt?
>> 
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: adding ANSI flag for hive

Posted by Thejas Nair <th...@hortonworks.com>.
Having too many configs complicates things for the user, and also
complicates the code, and you also end up having many untested
combinations of config flags.
I think we should identify a bunch of non compatible changes that we
think are important, fix it in a branch and make a major version
release (say 1.x).

This is also related to HIVE-5875, where there is a discussion on
switching the defaults for some of the configs to more desirable
values, but non backward compatible values.

On Wed, Dec 11, 2013 at 4:33 PM, Sergey Shelukhin
<se...@hortonworks.com> wrote:
> Hi.
>
> There's recently been some discussion about data type changes in Hive
> (double to decimal), and result changes for special cases like division by
> zero, etc., to bring it in compliance with MySQL (that's what JIRAs use an
> example; I am assuming ANSI SQL is meant).
> The latter are non-controversial (I guess), but for the former, performance
> may suffer and/or backward compat may be broken if Hive is brought in
> compliance.
> If fuller ANSI compat is sought in the future, there may be some even
> hairier issues such as double-quoted identifiers.
>
> In light of that, and also following MySQL, I wonder if we should add a
> flag, or set of flags, to HIVE to be able to force ANSI compliance.
> When this/ese flag/s is/are not set, for example, int/int division could
> return double for backward compat/perf, vectorization can skip the special
> case handling for division by zero/etc., etc.
> Wdyt?
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.