You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Benedict Elliott Smith <be...@apache.org> on 2018/10/02 11:30:42 UTC

Implicit Casts for Arithmetic Operators

CASSANDRA-11935 introduced arithmetic operators, and alongside these came implicit casts for their operands.  There is a semantic decision to be made, and I think the project would do well to explicitly raise this kind of question for wider input before release, since the project is bound by them forever more.

In this case, the choice is between lossy and lossless casts for operations involving integers and floating point numbers.  In essence, should:

(1) float + int = float, double + bigint = double; or
(2) float + int = double, double + bigint = decimal; or
(3) float + int = decimal, double + bigint = decimal

Option 1 performs a lossy implicit cast from int -> float, or bigint -> double.  Simply casting between these types changes the value.  This is what MS SQL Server does.
Options 2 and 3 cast without loss of precision, and 3 (or thereabouts) is what PostgreSQL does.

The question I’m interested in is not just which is the right decision, but how the right decision should be arrived at.  My view is that we should primarily aim for least surprise to the user, but I’m keen to hear from others.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Posted by Rahul Singh <ra...@gmail.com>.

+1 on Postgres approach. In the last 5 years I’ve seen people move from Oracle and SQL server to some variant of Cassandra or Postgres and other new tech is also more likely to support Postgres (Cockroach..)

I don’t care either way. It really depends on what you are storing.

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Oct 2, 2018, 11:11 AM -0700, Jonathan Haddad <jo...@jonhaddad.com>, wrote:
> Thanks for bringing this up, it definitely needs to be discussed.
>
> Last surprise is difficult here, since all major databases have their own
> way of doing things and people will just assume that their way is the right
> way. On that note, some people will be surprised no matter what we do.
>
> I'd rather avoid the pitfalls of returning incorrect results, so either
> option 2 or 3 sound reasonable, but leaning towards the Postgres approach
> of always returning a decimal for those cases.
>
> Jon
>
>
>
> On Tue, Oct 2, 2018 at 10:54 AM Benedict Elliott Smith <be...@apache.org>
> wrote:
>
> > I agree, in broad strokes at least. Interested to hear others’ positions.
> >
> >
> >
> > > On 2 Oct 2018, at 16:44, Ariel Weisberg <ar...@weisberg.ws> wrote:
> > >
> > > Hi,
> > >
> > > I think overflow and the role of widening conversions are pretty linked
> > so I'll continue to inject that into this discussion. Also overflow is much
> > worse since most applications won't be impacted by a loss of precision when
> > an expression involves an int and float, but will care quite a bit if they
> > get some nonsense wrapped number in an integer only expression.
> > >
> > > For VoltDB in practice we didn't run into issues with applications not
> > making progress due to exceptions with real data due to the widening
> > conversions. The range of double and long are pretty big and that hides
> > wrap around/infinity.
> > >
> > > I think the proposal of having all operations return a decimal is
> > attractive in that these expressions always result in a consistent type.
> > Two pain points might be whether client languages have decimal support and
> > whether there is a performance issue? The nice thing about always returning
> > decimal is we can sidestep the issue of overflow.
> > >
> > > I would start with seeing if that's acceptable, and if it isn't then
> > look at other approaches like returning a variety of types such when doing
> > int + int return a bigint or int + float return a double.
> > >
> > > If we take an approach that allows overflow the ideal end state IMO
> > would be to get all users to run Cassandra in way that overflow results in
> > an error even in the context of aggregation. The road to get there is
> > tricky, but maybe start by having it as an opt in tunable in
> > cassandra.yaml. I don't know how/when we could ever change that as a
> > default and it's unfortunate having an option like this that 99% won't know
> > they should flip.
> > >
> > > It seems like having the default throw on overflow is not as bad as it
> > sounds if you do the widening conversions since most people won't run into
> > them. The change in the column types of results sets actually sounds worse
> > if we want to also improve aggregrations. Many applications won't notice if
> > the client library abstracts that away, but I think there are still cases
> > where people would notice the type changing.
> > >
> > > Ariel
> > >
> > > > On Tue, Oct 2, 2018, at 11:09 AM, Benedict Elliott Smith wrote:
> > > > This (overflow) is an excellent point, but this also affects
> > > > aggregations which were introduced a long time ago. They already
> > > > inherit Java semantics for all of the relevant types (silent wrap
> > > > around). We probably want to be consistent, meaning either changing
> > > > aggregations (which incurs a cost for changing API) or continuing the
> > > > java semantics here.
> > > >
> > > > This is why having these discussions explicitly in the community before
> > > > a release is so critical, in my view. It’s very easy for these
> > semantic
> > > > changes to go unnoticed on a JIRA, and then ossify.
> > > >
> > > >
> > > > > On 2 Oct 2018, at 15:48, Ariel Weisberg <ar...@weisberg.ws> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I think we should decide based on what is least surprising as you
> > mention, but isn't overridden by some other concern.
> > > > >
> > > > > It seems to me the priorities are
> > > > >
> > > > > * Correctness
> > > > > * Performance
> > > > > * User visible complexity
> > > > > * Developer visible complexity
> > > > >
> > > > > Defaulting to silent implicit data loss is not ideal from a
> > correctness standpoint.
> > > > >
> > > > > Doing something better like using wider types doesn't seem like a
> > performance issue.
> > > > >
> > > > > From a user standpoint doing something less lossy doesn't look more
> > complex as long as it's consistent, and documented and doesn't change from
> > version to version.
> > > > >
> > > > > There is some developer complexity, but this is a public API and we
> > only get one shot at this.
> > > > >
> > > > > I wonder about how overflow is handled as well. In VoltDB I think we
> > threw on overflow and tended to just do widening conversions to make that
> > less common. We didn't imitate another database (as far as I know) we just
> > went with what least likely to silently corrupt data.
> > > > >
> > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213
> > <
> > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213
> > >
> > > > >
> > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764
> > <
> > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764
> > >
> > > > >
> > > > > Ariel
> > > > >
> > > > > > On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
> > > > > > ç introduced arithmetic operators, and alongside these
> > > > > > came implicit casts for their operands. There is a semantic decision
> > to
> > > > > > be made, and I think the project would do well to explicitly raise
> > this
> > > > > > kind of question for wider input before release, since the project is
> > > > > > bound by them forever more.
> > > > > >
> > > > > > In this case, the choice is between lossy and lossless casts for
> > > > > > operations involving integers and floating point numbers. In
> > essence,
> > > > > > should:
> > > > > >
> > > > > > (1) float + int = float, double + bigint = double; or
> > > > > > (2) float + int = double, double + bigint = decimal; or
> > > > > > (3) float + int = decimal, double + bigint = decimal
> > > > > >
> > > > > > Option 1 performs a lossy implicit cast from int -> float, or bigint
> > ->
> > > > > > double. Simply casting between these types changes the value. This
> > is
> > > > > > what MS SQL Server does.
> > > > > > Options 2 and 3 cast without loss of precision, and 3 (or
> > thereabouts)
> > > > > > is what PostgreSQL does.
> > > > > >
> > > > > > The question I’m interested in is not just which is the right
> > decision,
> > > > > > but how the right decision should be arrived at. My view is that we
> > > > > > should primarily aim for least surprise to the user, but I’m keen to
> > > > > > hear from others.
> > > > > > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <mailto:
> > dev-unsubscribe@cassandra.apache.org>
> > > > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > <ma...@cassandra.apache.org>
> > > > > >
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <mailto:
> > dev-unsubscribe@cassandra.apache.org>
> > > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > <ma...@cassandra.apache.org>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade

Re: Implicit Casts for Arithmetic Operators

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

Thanks for bringing this up, it definitely needs to be discussed.

Last surprise is difficult here, since all major databases have their own
way of doing things and people will just assume that their way is the right
way.  On that note, some people will be surprised no matter what we do.

I'd rather avoid the pitfalls of returning incorrect results, so either
option 2 or 3 sound reasonable, but leaning towards the Postgres approach
of always returning a decimal for those cases.

Jon



On Tue, Oct 2, 2018 at 10:54 AM Benedict Elliott Smith <be...@apache.org>
wrote:

> I agree, in broad strokes at least.  Interested to hear others’ positions.
>
>
>
> > On 2 Oct 2018, at 16:44, Ariel Weisberg <ar...@weisberg.ws> wrote:
> >
> > Hi,
> >
> > I think overflow and the role of widening conversions are pretty linked
> so I'll continue to inject that into this discussion. Also overflow is much
> worse since most applications won't be impacted by a loss of precision when
> an expression involves an int and float, but will care quite a bit if they
> get some nonsense wrapped number in an integer only expression.
> >
> > For VoltDB in practice we didn't run into issues with applications not
> making progress due to exceptions with real data due to the widening
> conversions. The range of double and long are pretty big and that hides
> wrap around/infinity.
> >
> > I think the proposal of having all operations return a decimal is
> attractive in that these expressions always result in a consistent type.
> Two pain points might be whether client languages have decimal support and
> whether there is a performance issue? The nice thing about always returning
> decimal is we can sidestep the issue of overflow.
> >
> > I would start with seeing if that's acceptable, and if it isn't then
> look at other approaches like returning a variety of types such when doing
> int + int return a bigint or int + float return a double.
> >
> > If we take an approach that allows overflow the ideal end state IMO
> would be to get all users to run Cassandra in way that overflow results in
> an error even in the context of aggregation. The road to get there is
> tricky, but maybe start by having it as an opt in tunable in
> cassandra.yaml. I don't know how/when we could ever change that as a
> default and it's unfortunate having an option like this that 99% won't know
> they should flip.
> >
> > It seems like having the default throw on overflow is not as bad as it
> sounds if you do the widening conversions since most people won't run into
> them. The change in the column types of results sets actually sounds worse
> if we want to also improve aggregrations. Many applications won't notice if
> the client library abstracts that away, but I think there are still cases
> where people would notice the type changing.
> >
> > Ariel
> >
> >> On Tue, Oct 2, 2018, at 11:09 AM, Benedict Elliott Smith wrote:
> >> This (overflow) is an excellent point, but this also affects
> >> aggregations which were introduced a long time ago.  They already
> >> inherit Java semantics for all of the relevant types (silent wrap
> >> around).  We probably want to be consistent, meaning either changing
> >> aggregations (which incurs a cost for changing API) or continuing the
> >> java semantics here.
> >>
> >> This is why having these discussions explicitly in the community before
> >> a release is so critical, in my view.  It’s very easy for these
> semantic
> >> changes to go unnoticed on a JIRA, and then ossify.
> >>
> >>
> >>> On 2 Oct 2018, at 15:48, Ariel Weisberg <ar...@weisberg.ws> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I think we should decide based on what is least surprising as you
> mention, but isn't overridden by some other concern.
> >>>
> >>> It seems to me the priorities are
> >>>
> >>> * Correctness
> >>> * Performance
> >>> * User visible complexity
> >>> * Developer visible complexity
> >>>
> >>> Defaulting to silent implicit data loss is not ideal from a
> correctness standpoint.
> >>>
> >>> Doing something better like using wider types doesn't seem like a
> performance issue.
> >>>
> >>> From a user standpoint doing something less lossy doesn't look more
> complex as long as it's consistent, and documented and doesn't change from
> version to version.
> >>>
> >>> There is some developer complexity, but this is a public API and we
> only get one shot at this.
> >>>
> >>> I wonder about how overflow is handled as well. In VoltDB I think we
> threw on overflow and tended to just do widening conversions to make that
> less common. We didn't imitate another database (as far as I know) we just
> went with what least likely to silently corrupt data.
> >>>
> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213
> <
> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213
> >
> >>>
> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764
> <
> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764
> >
> >>>
> >>> Ariel
> >>>
> >>>> On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
> >>>> ç introduced arithmetic operators, and alongside these
> >>>> came implicit casts for their operands.  There is a semantic decision
> to
> >>>> be made, and I think the project would do well to explicitly raise
> this
> >>>> kind of question for wider input before release, since the project is
> >>>> bound by them forever more.
> >>>>
> >>>> In this case, the choice is between lossy and lossless casts for
> >>>> operations involving integers and floating point numbers.  In
> essence,
> >>>> should:
> >>>>
> >>>> (1) float + int = float, double + bigint = double; or
> >>>> (2) float + int = double, double + bigint = decimal; or
> >>>> (3) float + int = decimal, double + bigint = decimal
> >>>>
> >>>> Option 1 performs a lossy implicit cast from int -> float, or bigint
> ->
> >>>> double.  Simply casting between these types changes the value.  This
> is
> >>>> what MS SQL Server does.
> >>>> Options 2 and 3 cast without loss of precision, and 3 (or
> thereabouts)
> >>>> is what PostgreSQL does.
> >>>>
> >>>> The question I’m interested in is not just which is the right
> decision,
> >>>> but how the right decision should be arrived at.  My view is that we
> >>>> should primarily aim for least surprise to the user, but I’m keen to
> >>>> hear from others.
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <mailto:
> dev-unsubscribe@cassandra.apache.org>
> >>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <mailto:
> dev-unsubscribe@cassandra.apache.org>
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Implicit Casts for Arithmetic Operators

Posted by Benedict Elliott Smith <be...@apache.org>.

I think we do, implicitly, support precision and scale - only dynamically.  The precision and scale are defined by the value on insertion, i.e. those necessary to represent it exactly.  During arithmetic operations we currently truncate to decimal128, but we can (and probably should) change this.   Ideally, we would support explicit precision/scale in the declared type, but our current behaviour is not inconsistent with introducing this later.

FTR, I wasn’t suggesting the spec required the most approximate type, but that the most consistent rule to describe this behaviour is that the approximate type always wins.  Somebody earlier justified this by the fact that one operand is already truncated to this level of approximation, so why would you want more accuracy in the result type?

I would be comfortable with either, fwiw, and they are both consistent with the spec.  It’s great if we can have a consistent idea behind why we do things though, so it seems at least worth briefly discussing this extra weirdness.





> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote:
> 
> Hi,
> 
> From reading the spec. Precision is always implementation defined. The spec specifies scale in several cases, but never precision for any type or operation (addition/subtraction, multiplication, division).
> 
> So we don't implement anything remotely approaching precision and scale in CQL when it comes to numbers I think? So we aren't going to follow the spec for scale. We are already pretty far down that road so I would leave it alone. 
> 
> I don't think the spec is asking for the most approximate type. It's just saying the result is approximate, and the precision is implementation defined. We could return either float or double. I think if one of the operands is a double we should return a double because clearly the schema thought a double was required to represent that number. I would also be in favor of returning a double all the time so that people can expect a consistent type from expressions involving approximate numbers.
> 
> I am a big fan of widening for arithmetic expressions in a database to avoid having to error on overflow. You can go to the trouble of only widening the minimum amount, but I think it's simpler if we always widen to bigint and double. This would be something the spec allows.
> 
> Definitely if we can make overflow not occur we should and the spec allows that. We should also not return different types for the same operand types just to work around overflow if we detect we need more precision.
> 
> Ariel
> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging this 
>> out (and Mike for getting some empirical examples).
>> 
>> We still have to decide on the approximate data type to return; right 
>> now, we have float+bigint=double, but float+int=float.  I think this is 
>> fairly inconsistent, and either the approximate type should always win, 
>> or we should always upgrade to double for mixed operands.
>> 
>> The quoted spec also suggests that decimal+float=float, and decimal
>> +double=double, whereas we currently have decimal+float=decimal, and 
>> decimal+double=decimal
>> 
>> If we’re going to go with an approximate operand implying an approximate 
>> result, I think we should do it consistently (and consistent with the 
>> SQL92 spec), and have the type of the approximate operand always be the 
>> return type.
>> 
>> This would still leave a decision for float+double, though.  The most 
>> consistent behaviour with that stated above would be to always take the 
>> most approximate type to return (i.e. float), but this would seem to me 
>> to be fairly unexpected for the user.
>> 
>> 
>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>> 
>>> Hi,
>>> 
>>> I agree with what's been said about expectations regarding expressions involving floating point numbers. I think that if one of the inputs is approximate then the result should be approximate.
>>> 
>>> One thing we could look at for inspiration is the SQL spec. Not to follow dogmatically necessarily.
>>> 
>>> From the SQL 92 spec regarding assignment http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
>>> "
>>>        Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
>>>        FLOAT, REAL, and DOUBLE PRECISION are numbers and are all mutually
>>>        comparable and mutually assignable. If an assignment would result
>>>        in a loss of the most significant digits, an exception condition
>>>        is raised. If least significant digits are lost, implementation-
>>>        defined rounding or truncating occurs with no exception condition
>>>        being raised. The rules for arithmetic are generally governed by
>>>        Subclause 6.12, "<numeric value expression>".
>>> "
>>> 
>>> Section 6.12 numeric value expressions:
>>> "
>>>        1) If the data type of both operands of a dyadic arithmetic opera-
>>>           tor is exact numeric, then the data type of the result is exact
>>>           numeric, with precision and scale determined as follows:
>>> ...
>>>        2) If the data type of either operand of a dyadic arithmetic op-
>>>           erator is approximate numeric, then the data type of the re-
>>>           sult is approximate numeric. The precision of the result is
>>>           implementation-defined.
>>> "
>>> 
>>> And this makes sense to me. I think we should only return an exact result if both of the inputs are exact.
>>> 
>>> I think we might want to look closely at the SQL spec and especially when the spec requires an error to be generated. Those are sometimes in the spec to prevent subtle paths to wrong answers. Any time we deviate from the spec we should be asking why is it in the spec and why are we deviating.
>>> 
>>> Another issue besides overflow handling is how we determine precision and scale for expressions involving two exact types.
>>> 
>>> Ariel
>>> 
>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
>>>> Hi,
>>>> 
>>>> I'm not sure if I would prefer the Postgres way of doing things, which is
>>>> returning just about any type depending on the order of operators.
>>>> Considering it actually mentions in the docs that using numeric/decimal is
>>>> slow and also multiple times that floating points are inexact. So doing
>>>> some math with Postgres (9.6.5):
>>>> 
>>>> SELECT 2147483647::bigint*1.0::double precision returns double
>>>> precision 2147483647
>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
>>>> SELECT 2147483647::bigint*1.0::real returns double
>>>> SELECT 2147483647::double precision*1::bigint returns double 2147483647
>>>> SELECT 2147483647::double precision*1.0::bigint returns double 2147483647
>>>> 
>>>> With + - we can get the same amount of mixture of returned types. There's
>>>> no difference in those calculations, just some casting. To me
>>>> floating-point math indicates inexactness and has errors and whoever mixes
>>>> up two different types should understand that. If one didn't want exact
>>>> numeric type, why would the server return such? The floating point value
>>>> itself could be wrong already before the calculation - trying to say we do
>>>> it lossless is just wrong.
>>>> 
>>>> Fun with 2.65:
>>>> 
>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
>>>> SELECT 2.65::double precision * 1::int returns double 2.65
>>>> 
>>>> SELECT round(2.65) returns numeric 4
>>>> SELECT round(2.65::double precision) returns double 4
>>>> 
>>>> SELECT 2.65 * 1 returns double 2.65
>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
>>>> SELECT 2.65 * 1.0 returns numeric 2.650
>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
>>>> 
>>>> SELECT round(2.65) * 1 returns numeric 3
>>>> SELECT round(2.65) * round(1) returns double 3
>>>> 
>>>> So as we're going to have silly values in any case, why pretend something
>>>> else? Also, exact calculations are slow if we crunch large amount of
>>>> numbers. I guess I slightly deviated towards Postgres' implemention in this
>>>> case, but I wish it wasn't used as a benchmark in this case. And most
>>>> importantly, I would definitely want the exact same type returned each time
>>>> I do a calculation.
>>>> 
>>>> - Micke
>>>> 
>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <be...@apache.org>
>>>> wrote:
>>>> 
>>>>> As far as I can tell we reached a relatively strong consensus that we
>>>>> should implement lossless casts by default?  Does anyone have anything more
>>>>> to add?
>>>>> 
>>>>> Looking at the emails, everyone who participated and expressed a
>>>>> preference was in favour of the “Postgres approach” of upcasting to decimal
>>>>> for mixed float/int operands?
>>>>> 
>>>>> I’d like to get a clear-cut decision on this, so we know what we’re doing
>>>>> for 4.0.  Then hopefully we can move on to a collective decision on Ariel’s
>>>>> concerns about overflow, which I think are also pressing - particularly for
>>>>> tinyint and smallint.  This does also impact implicit casts for mixed
>>>>> integer type operations, but an approach for these will probably fall out
>>>>> of any decision on overflow.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <mu...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> I think you're conflating two things here. There's the loss resulting
>>>>> from
>>>>>> using some operators, and loss involved in casting. Dividing an integer
>>>>> by
>>>>>> another integer to obtain an integer result can result in loss, but
>>>>> there's
>>>>>> no implicit casting there and no loss due to casting.  Casting an integer
>>>>>> to a float can also result in loss. So dividing an integer by a float,
>>>>> for
>>>>>> example, with an implicit cast has an additional avenue for loss: the
>>>>>> implicit cast for the operands so that they're of the same type. I
>>>>> believe
>>>>>> this discussion so far has been about the latter, not the loss from the
>>>>>> operations themselves.
>>>>>> 
>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <be...@datastax.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I would like to try to clarify things a bit to help people to understand
>>>>>>> the true complexity of the problem.
>>>>>>> 
>>>>>>> The *float *and *double *types are inexact numeric types. Not only at
>>>>> the
>>>>>>> operation level.
>>>>>>> 
>>>>>>> If you insert 676543.21 in a *float* column and then read it, you will
>>>>>>> realize that the value has been truncated to 676543.2.
>>>>>>> 
>>>>>>> If you want accuracy the only way is to avoid those inexact types.
>>>>>>> Using *decimals
>>>>>>> *during operations will mitigate the problem but will not remove it.
>>>>>>> 
>>>>>>> 
>>>>>>> I do not recall PostgreSQL behaving has described. If I am not mistaken
>>>>> in
>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what MS SQL
>>>>>>> server and Oracle do. So all thoses databases will lose precision if you
>>>>>>> are not carefull.
>>>>>>> 
>>>>>>> If you truly need precision you can have it by using exact numeric types
>>>>>>> for your data types. Of course it has a cost on performance, memory and
>>>>>>> disk usage.
>>>>>>> 
>>>>>>> The advantage of the current approach is that it give you the choice.
>>>>> It is
>>>>>>> up to you to decide what you need for your application. It is also in
>>>>> line
>>>>>>> with the way CQL behave everywhere else.
>>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> Muru
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> 
>>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

Re: RES: RES: Implicit Casts for Arithmetic Operators

Posted by "dinesh.joshi@yahoo.com.INVALID" <di...@yahoo.com.INVALID>.

To clarify send an empty email (no subject or body) to dev-unsubscribe@cassandra.apache.org
You will then get a confirmation email with a link. Click that.
Dinesh 

    On Tuesday, November 20, 2018, 6:20:34 PM GMT+2, Michael Shuler <mi...@pbandjelly.org> wrote:  
 
 On 11/20/18 10:15 AM, Versátil wrote:
> 
> I already requested as you said and it did not help. And I NEVER asked to enter into this discussion. Please request to withdraw my email ....

 | | | | | | | | | |
\/\/\/\/\/\/\/\/\/\/

> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: RES: RES: Implicit Casts for Arithmetic Operators

Posted by Michael Shuler <mi...@pbandjelly.org>.

On 11/20/18 10:15 AM, Versátil wrote:
> 
> I already requested as you said and it did not help. And I NEVER asked to enter into this discussion. Please request to withdraw my email ....

 | | | | | | | | | |
\/\/\/\/\/\/\/\/\/\/

> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

RES: RES: Implicit Casts for Arithmetic Operators

Posted by Versátil <ve...@versatilengenharia.com.br>.

I already requested as you said and it did not help. And I NEVER asked to enter into this discussion. Please request to withdraw my email ....

-----Mensagem original-----
De: Michael Shuler [mailto:mshuler@pbandjelly.org] Em nome de Michael Shuler
Enviada em: terça-feira, 20 de novembro de 2018 14:12
Para: dev@cassandra.apache.org
Cc: dev-unsubscribe-versatil=versatilengenharia.com.br@cassandra.apache.org
Assunto: Re: RES: Implicit Casts for Arithmetic Operators

On 11/20/18 9:53 AM, Versátil wrote:
> 
> PLEASE TAKE MY EMAIL FROM THIS SHIT !!

FYI, mailing list subscriptions (and unsubscriptions) are self-serve. In general, you subscribed yourself, so you are responsible to unsubscribe.
The email address to do so is appended to every plain text message to the list.

You will still have to follow the approval link you'll get from the list moderation removal CC here, sent on your behalf...

--
Kind regards,
Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: RES: Implicit Casts for Arithmetic Operators

Posted by Michael Shuler <mi...@pbandjelly.org>.

On 11/20/18 9:53 AM, Versátil wrote:
> 
> PLEASE TAKE MY EMAIL FROM THIS SHIT !!

FYI, mailing list subscriptions (and unsubscriptions) are self-serve. In
general, you subscribed yourself, so you are responsible to unsubscribe.
The email address to do so is appended to every plain text message to
the list.

You will still have to follow the approval link you'll get from the list
moderation removal CC here, sent on your behalf...

-- 
Kind regards,
Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

RES: Implicit Casts for Arithmetic Operators

Posted by Versátil <ve...@versatilengenharia.com.br>.

PLEASE TAKE MY EMAIL FROM THIS SHIT !!


-----Mensagem original-----
De: Michael Burman [mailto:yak@iki.fi] 
Enviada em: terça-feira, 20 de novembro de 2018 13:51
Para: dev@cassandra.apache.org
Assunto: Re: Implicit Casts for Arithmetic Operators

Yep, that's a good approach.

  - Micke

On Tue, Nov 20, 2018 at 5:12 PM Ariel Weisberg <ad...@fastmail.fm> wrote:

> Hi,
>
> +1
>
> This is a public API so we will be much better off if we get it right 
> the first time.
>
> Ariel
>
> > On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <jo...@jonhaddad.com> wrote:
> >
> > Sounds good to me.
> >
> > On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
> benedict@apache.org>
> > wrote:
> >
> >> So, this thread somewhat petered out.
> >>
> >> There are still a number of unresolved issues, but to make progress 
> >> I wonder if it would first be helpful to have a vote on ensuring we 
> >> are
> ANSI
> >> SQL 92 compliant for our arithmetic?  This seems like a sensible
> baseline,
> >> since we will hopefully minimise surprise to operators this way.
> >>
> >> If people largely agree, I will call a vote, and we can pick up a 
> >> couple of more focused discussions afterwards on how we interpret 
> >> the leeway it gives.
> >>
> >>
> >>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote:
> >>>
> >>> Hi,
> >>>
> >>> From reading the spec. Precision is always implementation defined. 
> >>> The
> >> spec specifies scale in several cases, but never precision for any 
> >> type
> or
> >> operation (addition/subtraction, multiplication, division).
> >>>
> >>> So we don't implement anything remotely approaching precision and 
> >>> scale
> >> in CQL when it comes to numbers I think? So we aren't going to 
> >> follow
> the
> >> spec for scale. We are already pretty far down that road so I would
> leave
> >> it alone.
> >>>
> >>> I don't think the spec is asking for the most approximate type. 
> >>> It's
> >> just saying the result is approximate, and the precision is
> implementation
> >> defined. We could return either float or double. I think if one of 
> >> the operands is a double we should return a double because clearly 
> >> the
> schema
> >> thought a double was required to represent that number. I would 
> >> also be
> in
> >> favor of returning a double all the time so that people can expect 
> >> a consistent type from expressions involving approximate numbers.
> >>>
> >>> I am a big fan of widening for arithmetic expressions in a 
> >>> database to
> >> avoid having to error on overflow. You can go to the trouble of 
> >> only widening the minimum amount, but I think it's simpler if we 
> >> always
> widen to
> >> bigint and double. This would be something the spec allows.
> >>>
> >>> Definitely if we can make overflow not occur we should and the 
> >>> spec
> >> allows that. We should also not return different types for the same
> operand
> >> types just to work around overflow if we detect we need more precision.
> >>>
> >>> Ariel
> >>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
> >>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for 
> >>>> digging
> this
> >>>> out (and Mike for getting some empirical examples).
> >>>>
> >>>> We still have to decide on the approximate data type to return; 
> >>>> right now, we have float+bigint=double, but float+int=float.  I 
> >>>> think this
> is
> >>>> fairly inconsistent, and either the approximate type should 
> >>>> always
> win,
> >>>> or we should always upgrade to double for mixed operands.
> >>>>
> >>>> The quoted spec also suggests that decimal+float=float, and 
> >>>> decimal
> >>>> +double=double, whereas we currently have decimal+float=decimal, 
> >>>> +and
> >>>> decimal+double=decimal
> >>>>
> >>>> If we’re going to go with an approximate operand implying an
> >> approximate
> >>>> result, I think we should do it consistently (and consistent with 
> >>>> the
> >>>> SQL92 spec), and have the type of the approximate operand always 
> >>>> be
> the
> >>>> return type.
> >>>>
> >>>> This would still leave a decision for float+double, though.  The 
> >>>> most consistent behaviour with that stated above would be to 
> >>>> always take
> the
> >>>> most approximate type to return (i.e. float), but this would seem 
> >>>> to
> me
> >>>> to be fairly unexpected for the user.
> >>>>
> >>>>
> >>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> I agree with what's been said about expectations regarding
> expressions
> >> involving floating point numbers. I think that if one of the inputs 
> >> is approximate then the result should be approximate.
> >>>>>
> >>>>> One thing we could look at for inspiration is the SQL spec. Not 
> >>>>> to
> >> follow dogmatically necessarily.
> >>>>>
> >>>>> From the SQL 92 spec regarding assignment
> >> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
> >>>>> "
> >>>>>       Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
> >>>>>       FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
> >> mutually
> >>>>>       comparable and mutually assignable. If an assignment would
> >> result
> >>>>>       in a loss of the most significant digits, an exception
> condition
> >>>>>       is raised. If least significant digits are lost,
> implementation-
> >>>>>       defined rounding or truncating occurs with no exception
> >> condition
> >>>>>       being raised. The rules for arithmetic are generally 
> >>>>> governed
> by
> >>>>>       Subclause 6.12, "<numeric value expression>".
> >>>>> "
> >>>>>
> >>>>> Section 6.12 numeric value expressions:
> >>>>> "
> >>>>>       1) If the data type of both operands of a dyadic 
> >>>>> arithmetic
> >> opera-
> >>>>>          tor is exact numeric, then the data type of the result 
> >>>>> is
> >> exact
> >>>>>          numeric, with precision and scale determined as follows:
> >>>>> ...
> >>>>>       2) If the data type of either operand of a dyadic 
> >>>>> arithmetic
> op-
> >>>>>          erator is approximate numeric, then the data type of the re-
> >>>>>          sult is approximate numeric. The precision of the result is
> >>>>>          implementation-defined.
> >>>>> "
> >>>>>
> >>>>> And this makes sense to me. I think we should only return an 
> >>>>> exact
> >> result if both of the inputs are exact.
> >>>>>
> >>>>> I think we might want to look closely at the SQL spec and 
> >>>>> especially
> >> when the spec requires an error to be generated. Those are 
> >> sometimes in
> the
> >> spec to prevent subtle paths to wrong answers. Any time we deviate 
> >> from
> the
> >> spec we should be asking why is it in the spec and why are we deviating.
> >>>>>
> >>>>> Another issue besides overflow handling is how we determine 
> >>>>> precision
> >> and scale for expressions involving two exact types.
> >>>>>
> >>>>> Ariel
> >>>>>
> >>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'm not sure if I would prefer the Postgres way of doing 
> >>>>>> things,
> >> which is
> >>>>>> returning just about any type depending on the order of operators.
> >>>>>> Considering it actually mentions in the docs that using
> >> numeric/decimal is
> >>>>>> slow and also multiple times that floating points are inexact. 
> >>>>>> So
> >> doing
> >>>>>> some math with Postgres (9.6.5):
> >>>>>>
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::bigint*1.0::double 
> >>>>>> precision
> returns double
> >>>>>> precision 2147483647 <(214)%20748-3647> SELECT 2147483647 
> >>>>>> <(214)%20748-3647>::bigint*1.0 returns numeric
> 2147483647 <(214)%20748-3647>.0
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::bigint*1.0::real returns
> double
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::double 
> >>>>>> precision*1::bigint
> returns double
> >> 2147483647 <(214)%20748-3647>
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::double 
> >>>>>> precision*1.0::bigint
> returns double
> >> 2147483647 <(214)%20748-3647>
> >>>>>>
> >>>>>> With + - we can get the same amount of mixture of returned types.
> >> There's
> >>>>>> no difference in those calculations, just some casting. To me 
> >>>>>> floating-point math indicates inexactness and has errors and 
> >>>>>> whoever
> >> mixes
> >>>>>> up two different types should understand that. If one didn't 
> >>>>>> want
> >> exact
> >>>>>> numeric type, why would the server return such? The floating 
> >>>>>> point
> >> value
> >>>>>> itself could be wrong already before the calculation - trying 
> >>>>>> to say
> >> we do
> >>>>>> it lossless is just wrong.
> >>>>>>
> >>>>>> Fun with 2.65:
> >>>>>>
> >>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743 
> >>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
> >>>>>>
> >>>>>> SELECT round(2.65) returns numeric 4 SELECT round(2.65::double 
> >>>>>> precision) returns double 4
> >>>>>>
> >>>>>> SELECT 2.65 * 1 returns double 2.65 SELECT 2.65 * 1::bigint 
> >>>>>> returns numeric 2.65 SELECT 2.65 * 1.0 returns numeric 2.650 
> >>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
> >>>>>>
> >>>>>> SELECT round(2.65) * 1 returns numeric 3 SELECT round(2.65) * 
> >>>>>> round(1) returns double 3
> >>>>>>
> >>>>>> So as we're going to have silly values in any case, why pretend
> >> something
> >>>>>> else? Also, exact calculations are slow if we crunch large 
> >>>>>> amount of numbers. I guess I slightly deviated towards 
> >>>>>> Postgres' implemention
> >> in this
> >>>>>> case, but I wish it wasn't used as a benchmark in this case. 
> >>>>>> And
> most
> >>>>>> importantly, I would definitely want the exact same type 
> >>>>>> returned
> >> each time
> >>>>>> I do a calculation.
> >>>>>>
> >>>>>> - Micke
> >>>>>>
> >>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
> >> benedict@apache.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> As far as I can tell we reached a relatively strong consensus 
> >>>>>>> that
> we
> >>>>>>> should implement lossless casts by default?  Does anyone have
> >> anything more
> >>>>>>> to add?
> >>>>>>>
> >>>>>>> Looking at the emails, everyone who participated and expressed 
> >>>>>>> a preference was in favour of the “Postgres approach” of 
> >>>>>>> upcasting to
> >> decimal
> >>>>>>> for mixed float/int operands?
> >>>>>>>
> >>>>>>> I’d like to get a clear-cut decision on this, so we know what 
> >>>>>>> we’re
> >> doing
> >>>>>>> for 4.0.  Then hopefully we can move on to a collective 
> >>>>>>> decision on
> >> Ariel’s
> >>>>>>> concerns about overflow, which I think are also pressing -
> >> particularly for
> >>>>>>> tinyint and smallint.  This does also impact implicit casts 
> >>>>>>> for
> mixed
> >>>>>>> integer type operations, but an approach for these will 
> >>>>>>> probably
> >> fall out
> >>>>>>> of any decision on overflow.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
> >> murukesh.mohanan@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> I think you're conflating two things here. There's the loss
> >> resulting
> >>>>>>> from
> >>>>>>>> using some operators, and loss involved in casting. Dividing 
> >>>>>>>> an
> >> integer
> >>>>>>> by
> >>>>>>>> another integer to obtain an integer result can result in 
> >>>>>>>> loss,
> but
> >>>>>>> there's
> >>>>>>>> no implicit casting there and no loss due to casting.  
> >>>>>>>> Casting an
> >> integer
> >>>>>>>> to a float can also result in loss. So dividing an integer by 
> >>>>>>>> a
> >> float,
> >>>>>>> for
> >>>>>>>> example, with an implicit cast has an additional avenue for loss:
> >> the
> >>>>>>>> implicit cast for the operands so that they're of the same 
> >>>>>>>> type. I
> >>>>>>> believe
> >>>>>>>> this discussion so far has been about the latter, not the 
> >>>>>>>> loss
> from
> >> the
> >>>>>>>> operations themselves.
> >>>>>>>>
> >>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
> >> benjamin.lerer@datastax.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I would like to try to clarify things a bit to help people 
> >>>>>>>>> to
> >> understand
> >>>>>>>>> the true complexity of the problem.
> >>>>>>>>>
> >>>>>>>>> The *float *and *double *types are inexact numeric types. 
> >>>>>>>>> Not
> only
> >> at
> >>>>>>> the
> >>>>>>>>> operation level.
> >>>>>>>>>
> >>>>>>>>> If you insert 676543.21 in a *float* column and then read 
> >>>>>>>>> it, you
> >> will
> >>>>>>>>> realize that the value has been truncated to 676543.2.
> >>>>>>>>>
> >>>>>>>>> If you want accuracy the only way is to avoid those inexact
> types.
> >>>>>>>>> Using *decimals
> >>>>>>>>> *during operations will mitigate the problem but will not 
> >>>>>>>>> remove
> >> it.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I do not recall PostgreSQL behaving has described. If I am 
> >>>>>>>>> not
> >> mistaken
> >>>>>>> in
> >>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to 
> >>>>>>>>> what
> >> MS SQL
> >>>>>>>>> server and Oracle do. So all thoses databases will lose 
> >>>>>>>>> precision
> >> if you
> >>>>>>>>> are not carefull.
> >>>>>>>>>
> >>>>>>>>> If you truly need precision you can have it by using exact
> numeric
> >> types
> >>>>>>>>> for your data types. Of course it has a cost on performance,
> >> memory and
> >>>>>>>>> disk usage.
> >>>>>>>>>
> >>>>>>>>> The advantage of the current approach is that it give you 
> >>>>>>>>> the
> >> choice.
> >>>>>>> It is
> >>>>>>>>> up to you to decide what you need for your application. It 
> >>>>>>>>> is
> also
> >> in
> >>>>>>> line
> >>>>>>>>> with the way CQL behave everywhere else.
> >>>>>>>>>
> >>>>>>>> --
> >>>>>>>>
> >>>>>>>> Muru
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>> ----------------------------------------------------------------
> >>>>> ----- To unsubscribe, e-mail: 
> >>>>> dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>
> >>>>
> >>>>
> >>>> -----------------------------------------------------------------
> >>>> ---- To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>
> >>>
> >>> ------------------------------------------------------------------
> >>> --- To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >>
> >>
> >> -------------------------------------------------------------------
> >> -- To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >> --
> > Jon Haddad
> > http://www.rustyrazorblade.com
> > twitter: rustyrazorblade
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Posted by Michael Burman <ya...@iki.fi>.

Yep, that's a good approach.

  - Micke

On Tue, Nov 20, 2018 at 5:12 PM Ariel Weisberg <ad...@fastmail.fm> wrote:

> Hi,
>
> +1
>
> This is a public API so we will be much better off if we get it right the
> first time.
>
> Ariel
>
> > On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <jo...@jonhaddad.com> wrote:
> >
> > Sounds good to me.
> >
> > On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
> benedict@apache.org>
> > wrote:
> >
> >> So, this thread somewhat petered out.
> >>
> >> There are still a number of unresolved issues, but to make progress I
> >> wonder if it would first be helpful to have a vote on ensuring we are
> ANSI
> >> SQL 92 compliant for our arithmetic?  This seems like a sensible
> baseline,
> >> since we will hopefully minimise surprise to operators this way.
> >>
> >> If people largely agree, I will call a vote, and we can pick up a couple
> >> of more focused discussions afterwards on how we interpret the leeway it
> >> gives.
> >>
> >>
> >>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote:
> >>>
> >>> Hi,
> >>>
> >>> From reading the spec. Precision is always implementation defined. The
> >> spec specifies scale in several cases, but never precision for any type
> or
> >> operation (addition/subtraction, multiplication, division).
> >>>
> >>> So we don't implement anything remotely approaching precision and scale
> >> in CQL when it comes to numbers I think? So we aren't going to follow
> the
> >> spec for scale. We are already pretty far down that road so I would
> leave
> >> it alone.
> >>>
> >>> I don't think the spec is asking for the most approximate type. It's
> >> just saying the result is approximate, and the precision is
> implementation
> >> defined. We could return either float or double. I think if one of the
> >> operands is a double we should return a double because clearly the
> schema
> >> thought a double was required to represent that number. I would also be
> in
> >> favor of returning a double all the time so that people can expect a
> >> consistent type from expressions involving approximate numbers.
> >>>
> >>> I am a big fan of widening for arithmetic expressions in a database to
> >> avoid having to error on overflow. You can go to the trouble of only
> >> widening the minimum amount, but I think it's simpler if we always
> widen to
> >> bigint and double. This would be something the spec allows.
> >>>
> >>> Definitely if we can make overflow not occur we should and the spec
> >> allows that. We should also not return different types for the same
> operand
> >> types just to work around overflow if we detect we need more precision.
> >>>
> >>> Ariel
> >>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
> >>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging
> this
> >>>> out (and Mike for getting some empirical examples).
> >>>>
> >>>> We still have to decide on the approximate data type to return; right
> >>>> now, we have float+bigint=double, but float+int=float.  I think this
> is
> >>>> fairly inconsistent, and either the approximate type should always
> win,
> >>>> or we should always upgrade to double for mixed operands.
> >>>>
> >>>> The quoted spec also suggests that decimal+float=float, and decimal
> >>>> +double=double, whereas we currently have decimal+float=decimal, and
> >>>> decimal+double=decimal
> >>>>
> >>>> If we’re going to go with an approximate operand implying an
> >> approximate
> >>>> result, I think we should do it consistently (and consistent with the
> >>>> SQL92 spec), and have the type of the approximate operand always be
> the
> >>>> return type.
> >>>>
> >>>> This would still leave a decision for float+double, though.  The most
> >>>> consistent behaviour with that stated above would be to always take
> the
> >>>> most approximate type to return (i.e. float), but this would seem to
> me
> >>>> to be fairly unexpected for the user.
> >>>>
> >>>>
> >>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> I agree with what's been said about expectations regarding
> expressions
> >> involving floating point numbers. I think that if one of the inputs is
> >> approximate then the result should be approximate.
> >>>>>
> >>>>> One thing we could look at for inspiration is the SQL spec. Not to
> >> follow dogmatically necessarily.
> >>>>>
> >>>>> From the SQL 92 spec regarding assignment
> >> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
> >>>>> "
> >>>>>       Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
> >>>>>       FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
> >> mutually
> >>>>>       comparable and mutually assignable. If an assignment would
> >> result
> >>>>>       in a loss of the most significant digits, an exception
> condition
> >>>>>       is raised. If least significant digits are lost,
> implementation-
> >>>>>       defined rounding or truncating occurs with no exception
> >> condition
> >>>>>       being raised. The rules for arithmetic are generally governed
> by
> >>>>>       Subclause 6.12, "<numeric value expression>".
> >>>>> "
> >>>>>
> >>>>> Section 6.12 numeric value expressions:
> >>>>> "
> >>>>>       1) If the data type of both operands of a dyadic arithmetic
> >> opera-
> >>>>>          tor is exact numeric, then the data type of the result is
> >> exact
> >>>>>          numeric, with precision and scale determined as follows:
> >>>>> ...
> >>>>>       2) If the data type of either operand of a dyadic arithmetic
> op-
> >>>>>          erator is approximate numeric, then the data type of the re-
> >>>>>          sult is approximate numeric. The precision of the result is
> >>>>>          implementation-defined.
> >>>>> "
> >>>>>
> >>>>> And this makes sense to me. I think we should only return an exact
> >> result if both of the inputs are exact.
> >>>>>
> >>>>> I think we might want to look closely at the SQL spec and especially
> >> when the spec requires an error to be generated. Those are sometimes in
> the
> >> spec to prevent subtle paths to wrong answers. Any time we deviate from
> the
> >> spec we should be asking why is it in the spec and why are we deviating.
> >>>>>
> >>>>> Another issue besides overflow handling is how we determine precision
> >> and scale for expressions involving two exact types.
> >>>>>
> >>>>> Ariel
> >>>>>
> >>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'm not sure if I would prefer the Postgres way of doing things,
> >> which is
> >>>>>> returning just about any type depending on the order of operators.
> >>>>>> Considering it actually mentions in the docs that using
> >> numeric/decimal is
> >>>>>> slow and also multiple times that floating points are inexact. So
> >> doing
> >>>>>> some math with Postgres (9.6.5):
> >>>>>>
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::bigint*1.0::double precision
> returns double
> >>>>>> precision 2147483647 <(214)%20748-3647>
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::bigint*1.0 returns numeric
> 2147483647 <(214)%20748-3647>.0
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::bigint*1.0::real returns
> double
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::double precision*1::bigint
> returns double
> >> 2147483647 <(214)%20748-3647>
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::double precision*1.0::bigint
> returns double
> >> 2147483647 <(214)%20748-3647>
> >>>>>>
> >>>>>> With + - we can get the same amount of mixture of returned types.
> >> There's
> >>>>>> no difference in those calculations, just some casting. To me
> >>>>>> floating-point math indicates inexactness and has errors and whoever
> >> mixes
> >>>>>> up two different types should understand that. If one didn't want
> >> exact
> >>>>>> numeric type, why would the server return such? The floating point
> >> value
> >>>>>> itself could be wrong already before the calculation - trying to say
> >> we do
> >>>>>> it lossless is just wrong.
> >>>>>>
> >>>>>> Fun with 2.65:
> >>>>>>
> >>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
> >>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
> >>>>>>
> >>>>>> SELECT round(2.65) returns numeric 4
> >>>>>> SELECT round(2.65::double precision) returns double 4
> >>>>>>
> >>>>>> SELECT 2.65 * 1 returns double 2.65
> >>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
> >>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
> >>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
> >>>>>>
> >>>>>> SELECT round(2.65) * 1 returns numeric 3
> >>>>>> SELECT round(2.65) * round(1) returns double 3
> >>>>>>
> >>>>>> So as we're going to have silly values in any case, why pretend
> >> something
> >>>>>> else? Also, exact calculations are slow if we crunch large amount of
> >>>>>> numbers. I guess I slightly deviated towards Postgres' implemention
> >> in this
> >>>>>> case, but I wish it wasn't used as a benchmark in this case. And
> most
> >>>>>> importantly, I would definitely want the exact same type returned
> >> each time
> >>>>>> I do a calculation.
> >>>>>>
> >>>>>> - Micke
> >>>>>>
> >>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
> >> benedict@apache.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> As far as I can tell we reached a relatively strong consensus that
> we
> >>>>>>> should implement lossless casts by default?  Does anyone have
> >> anything more
> >>>>>>> to add?
> >>>>>>>
> >>>>>>> Looking at the emails, everyone who participated and expressed a
> >>>>>>> preference was in favour of the “Postgres approach” of upcasting to
> >> decimal
> >>>>>>> for mixed float/int operands?
> >>>>>>>
> >>>>>>> I’d like to get a clear-cut decision on this, so we know what we’re
> >> doing
> >>>>>>> for 4.0.  Then hopefully we can move on to a collective decision on
> >> Ariel’s
> >>>>>>> concerns about overflow, which I think are also pressing -
> >> particularly for
> >>>>>>> tinyint and smallint.  This does also impact implicit casts for
> mixed
> >>>>>>> integer type operations, but an approach for these will probably
> >> fall out
> >>>>>>> of any decision on overflow.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
> >> murukesh.mohanan@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> I think you're conflating two things here. There's the loss
> >> resulting
> >>>>>>> from
> >>>>>>>> using some operators, and loss involved in casting. Dividing an
> >> integer
> >>>>>>> by
> >>>>>>>> another integer to obtain an integer result can result in loss,
> but
> >>>>>>> there's
> >>>>>>>> no implicit casting there and no loss due to casting.  Casting an
> >> integer
> >>>>>>>> to a float can also result in loss. So dividing an integer by a
> >> float,
> >>>>>>> for
> >>>>>>>> example, with an implicit cast has an additional avenue for loss:
> >> the
> >>>>>>>> implicit cast for the operands so that they're of the same type. I
> >>>>>>> believe
> >>>>>>>> this discussion so far has been about the latter, not the loss
> from
> >> the
> >>>>>>>> operations themselves.
> >>>>>>>>
> >>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
> >> benjamin.lerer@datastax.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I would like to try to clarify things a bit to help people to
> >> understand
> >>>>>>>>> the true complexity of the problem.
> >>>>>>>>>
> >>>>>>>>> The *float *and *double *types are inexact numeric types. Not
> only
> >> at
> >>>>>>> the
> >>>>>>>>> operation level.
> >>>>>>>>>
> >>>>>>>>> If you insert 676543.21 in a *float* column and then read it, you
> >> will
> >>>>>>>>> realize that the value has been truncated to 676543.2.
> >>>>>>>>>
> >>>>>>>>> If you want accuracy the only way is to avoid those inexact
> types.
> >>>>>>>>> Using *decimals
> >>>>>>>>> *during operations will mitigate the problem but will not remove
> >> it.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I do not recall PostgreSQL behaving has described. If I am not
> >> mistaken
> >>>>>>> in
> >>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what
> >> MS SQL
> >>>>>>>>> server and Oracle do. So all thoses databases will lose precision
> >> if you
> >>>>>>>>> are not carefull.
> >>>>>>>>>
> >>>>>>>>> If you truly need precision you can have it by using exact
> numeric
> >> types
> >>>>>>>>> for your data types. Of course it has a cost on performance,
> >> memory and
> >>>>>>>>> disk usage.
> >>>>>>>>>
> >>>>>>>>> The advantage of the current approach is that it give you the
> >> choice.
> >>>>>>> It is
> >>>>>>>>> up to you to decide what you need for your application. It is
> also
> >> in
> >>>>>>> line
> >>>>>>>>> with the way CQL behave everywhere else.
> >>>>>>>>>
> >>>>>>>> --
> >>>>>>>>
> >>>>>>>> Muru
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >> --
> > Jon Haddad
> > http://www.rustyrazorblade.com
> > twitter: rustyrazorblade
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Implicit Casts for Arithmetic Operators

Posted by Benedict Elliott Smith <be...@apache.org>.

Thanks for laying this out, this really helps me respond to at least some of your concerns.

Firstly, I’d like to clarify that my position is only a personal one - I don’t expect the project, or you, to necessarily have the same viewpoint.  But it does mean that I don’t have to engage too closely with the points I personally consider to be harmful to the best outcome.  If, in doing so, you manage to convince more people, more power to you.

It doesn’t sound like we’re that far apart, though.  A lot of these are practical implementation concerns, that I tried to suggest we punt on - including a timeline.  My goal here was to decide our ideal goal state, not how or when* we reach it.

Still, I will engage with a couple of your practical concerns:

I agree that it would be great to land in 4.0.  But if released features go unmolested until next release, I don’t think it will be a tragedy.
As for complexity, the only toggle-impacted feature I can think of is built-in aggregates.  In the absolute worst case, it would ok to maintain two copies of these IMO.

FWIW, I’d be happy picking MS SQL Server as our baseline for implicit type conversions, as this looks to be ANSI SQL 92 compliant, from my brief analysis.  We are not presently the same: https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-type-precedence-transact-sql?view=sql-server-2017

As far as competing standards are concerned, I’m genuinely not aware of any.  Do you have any to consider?  That would be great.

* Even so far as ‘ever’ should cold reality bite - but why presuppose this?  You can never say until you go to implement anyway, we only risk giving up before we start.


> On 23 Nov 2018, at 11:09, Sylvain Lebresne <le...@gmail.com> wrote:
> 
>> Anyway, I think we’ve been arguing very unnecessarily about this
> ideological
>> point, given that I’ve already suggested a toggle to permit users to
> continue
>> with present-day semantics should they choose.  Surely this resolves your
>> concerns, unless you think this is intractable?
> 
> Not really :). My beef is really against the idea of deciding upfront about
> doing this when we don't even know what implementing and maintaining such
> toggle even imply (or even if we'll need one really). I don't know if it's
> tractable
> or not, but what's wrong with figuring that out first?
> 
> 
> Anyway, I'll try to lay out my reasoning on the general issue, and this for
> the
> sake of the general conversation here.
> 
> I agree following a standard for our arithmetic would be nice. I have, at
> that
> point, no opinion on which standard would be best (I haven't looked, and no
> one
> shared any analysis), but checking ANSI SQL 92 sure don't seem crazy to me
> on
> principle. In my perfect world, "we" would do a short "competitive"
> analysis of
> reasonable options (where _one_ of the criteria would be "what changes to
> our
> existing code it requires?"), and I'd *love* to see that, but I admit I
> don't
> have time to do it myself any time soon, so I can't, in good conscience,
> strongly object to narrowing it down to ANSI SQL 92 without checking too
> much
> alternatives.
> 
> So I'm all in favor for *looking at* making our arithmetic ANSI SQL 92
> compliant. I have, however, no clue what that entails in practice (I haven't
> looked at all) and if someone knows, he hasn't share that knowledge so far.
> 
> I do disagree however that this adherence to a standard should be decided in
> a vacuum. No decision should, this is bad project management in my book
> and that is kind of why I want to insist on that point. We should be open
> to all
> relevant context. We can debate how to weight each part of the context,
> sure,
> but disregarding the context _by design_, I will have to strongly disagree
> with
> that.
> 
> And to me, the relevant context is (likely forgetting stuffs):
> 1) Adhering to a standard, as Benedict mentioned, bring 2 nice benefits: 1)
> it
>   give us confidence we haven't screwed up something badly and 2) it
> ensures
>   familiarity for people and tools (at least those familiar with that
>   particular standard).
> 2) I haven't seen much evidence so far that we screwed up things badly or
> that
>   things are super unfamiliar. The cast thing which started this thread is
>   certainly worth discussing, but if I understood correctly, we do the
> same as
>   MS SQL Server so far, so it's not exactly unheard of. To be extra clear,
> I'm
>   not trying to imply that this render the previous point moot, *it does
> not*
>   imo, but I do think it is relevant context nonetheless, to be weighted
> in.
> 3) This _could_ create backward incompatibility (or it may not, I don't
>   consider changes to behavior introduced in 4.0 backward incompatibility
> in
>   particular). If so, we should be careful with this (that impact user in
> non
>   pleasant way). Yes, flags might be an option here to lessen the
>   burden on users (not that I love adding more flag by itself btw), but
>   depending on what changes we're talking about, said flag _could_ bring
> non
>   negligible complexity (for the code) that should be factored in.
> 4) I believe everyone more or less agree that if we do this, we should
>   do this in 4.0, so this _could_ create substantial delay for 4.0 (again,
> or
>   not.  Since we don't know what it involves, we simply don't know). As
>   I've expressed some month ago when I pushed for an early freeze, I
>   genuinely believe delay to 4.0 are bad for the project at this point.
>   I'm *not* saying it is the end of the conversation, I absolutely agree
>   the release quality is an important aspect as well for instance, but
>   my point is that it should _all_ be factored in.
> 
> Currently, we have very little information on how bad 3) and 4) are. So my
> current personal opinion is that 1) does justify looking into this much
> more closely, and that if 3) and 4) aren't too bad, that's a good deal for
> the
> project. But in light of 2), I also think there is a "level of badness" for
> 3)
> and 4) at which point it'd become a net negative for the project.
> 
> --
> Sylvain
> 
> 
> On Fri, Nov 23, 2018 at 1:07 AM Benedict Elliott Smith <be...@apache.org>
> wrote:
> 
>> This was a terribly unclear email, sorry.  I was just trying to find new
>> and interesting ways to say the same thing (that we should form our goal
>> state from first principles only).
>> 
>> Anyway, I think we’ve been arguing very unnecessarily about this
>> ideological point, given that I’ve already suggested a toggle to permit
>> users to continue with present-day semantics should they choose.  Surely
>> this resolves your concerns, unless you think this is intractable?
>> 
>> 
>> 
>> 
>> 
>>> On 22 Nov 2018, at 12:13, Benedict Elliott Smith <be...@apache.org>
>> wrote:
>>> 
>>> This is why I said the decision is ideological.  We fundamentally
>> disagree with each other, on points of principle.
>>> 
>>> This also feels like it’s becoming antagonistic, perhaps through
>> misinterpreting each other, which was far from my intent.  So I will limit
>> my reply to the only point of interpretation of my position.
>>> 
>>> Given that I personally consider this to be an ideological or
>> project-axiomatic decision, I therefore only consider other ideological or
>> axiomatic facts to be relevant to a decision like this. So:
>>> 
>>> 1) By “where appropriate” I mean, for instance, that this project will
>> likely never support ANSI SQL in toto, by virtue of the fundamental nature
>> of the project.
>>> 2) I agree that which standard we choose to follow, and why we follow
>> it, are both relevant questions
>>> 
>>> 
>>> 
>>>> On 22 Nov 2018, at 11:56, Sylvain Lebresne <le...@gmail.com> wrote:
>>>> 
>>>> On Thu, Nov 22, 2018 at 11:51 AM Benedict Elliott Smith <
>> benedict@apache.org>
>>>> wrote:
>>>> 
>>>>> We’re not presently voting*; we’re only discussing, whether we should
>> base
>>>>> our behaviour on a widely agreed upon standard.
>>>>> 
>>>> 
>>>> Well, you *explicitely* asked if people though we should do a vote, and
>> I
>>>> responded to that part. Let's not pretend I'm interpreting stuff, it's
>>>> insulting.
>>>> 
>>>> 
>>>>> I think perhaps the nub of our disagreement is that, in my view, this
>> is
>>>>> the only relevant fact to decide. There is no data to base this
>> decision
>>>>> upon.  It’s axiomatic, or ideological; procedural, not technical:  Do
>> we
>>>>> think we should try to hew to standards (where appropriate), or do we
>> think
>>>>> we should stick with what we arrived at in an adhoc manner?
>>>> 
>>>> 
>>>> Yes, that is probably the nub of our disagreement. I disagree that
>> hewing
>>>> to standards is something we should agree on absolutely, with no other
>>>> consideration in the balance. Hell, I read your "where appropriate" as
>> an
>>>> admission that you don't even truly think that. I think this is always a
>>>> pros versus cons analysis. Adhering to standards is certainly a pro.
>>>> 
>>>> *If* e were starting from scratch, I might maybe agree there isn't much
>>>> "cons" in the balance (there is always _some_ consideration though;
>>>> adhering to standard might force you into complexity that might not be
>>>> justified; not saying it's our case here, just pointing again that I
>> don't
>>>> adhere to the absolutist view), making it an easy decision. So that I'm
>> not
>>>> sure we'd even need a vote to agree that "we should try to hew to
>> standards
>>>> (where appropriate)", even if we'd still want to discuss 1) if it is
>>>> appropriate in that case and 2) which standard, so it wouldn't even be a
>>>> "no data involved" decision.
>>>> 
>>>> But we're not starting from scratch. You explicitly say yourself that it
>>>> "extends to any features we have already released". So backward
>>>> compatibility is a parameter we imo *must* take into account. Again,
>>>> doesn't mean we don't end up breaking backward compatibility, just that
>> it
>>>> is a non negligible downside, so we better make sure the "pros" of
>> adhering
>>>> to a standard makes up for it.
>>>> 
>>>> So yes, I do pretty strongly disagree that adhering to a standard is
>>>> something that should be decided absolutely, with no other consideration
>>>> taken into account.
>>>> 
>>>> 
>>>>> and how meandering the discussion was with no clear consensus, it
>> seemed
>>>>> to need a vote in the near future.
>>>> 
>>>> 
>>>> Fwiw, I also don't have the same read here. What I see on this thread
>> is a
>>>> bit of discussion on the specific cast issue you initially brought,
>>>> discussion that didn't feel especially stuck to me, but I don't much on
>> a
>>>> larger discussion on adhering to standards for all our arithmetic before
>>>> your suggestion a vote on it might be warranted.
>>>> 
>>>> --
>>>> Sylvain
>>>> 
>>>> 
>>>>>> On 22 Nov 2018, at 09:26, Sylvain Lebresne <le...@gmail.com>
>> wrote:
>>>>>> 
>>>>>> I'm not saying "let's not do this no matter what and ever fix
>> technical
>>>>>> debt", nor am I fearing decision.
>>>>>> 
>>>>>> But I *do* think decisions, technical ones at least, should be fact
>> and
>>>>>> data driven. And I'm not even sure why we're talking of having a vote
>>>>> here.
>>>>>> The Apache Way is *not* meant to be primarily vote-driven, votes are
>>>>>> supposed to be a last resort when, after having debated facts and
>> data,
>>>>> no
>>>>>> consensus can be reached. Can we have the debate on facts and data
>> first?
>>>>>> Please.
>>>>>> 
>>>>>> At the of the day, I object to: "There are still a number of
>> unresolved
>>>>>> issues, but to make progress I wonder if it would first be helpful to
>>>>> have
>>>>>> a vote on ensuring we are ANSI SQL 92 compliant for our arithmetic?".
>>>>> More
>>>>>> specifically, I disagree that such vote is a good starting point.
>> Let's
>>>>>> identify and discuss the unresolved issues first. Let's check
>> precisely
>>>>>> what getting our arithmetic ANSI SQL 92 compliant means and how we can
>>>>> get
>>>>>> it. I do support the idea of making such analysis btw, it would be
>> good
>>>>>> data, but no vote is needed whatsoever to make it. Again, I object to
>>>>>> voting first and doing the analysis 2nd.
>>>>>> 
>>>>>> --
>>>>>> Sylvain
>>>>>> 
>>>>>> 
>>>>>> On Thu, Nov 22, 2018 at 1:25 AM Jonathan Haddad <jo...@jonhaddad.com>
>>>>> wrote:
>>>>>> 
>>>>>>> I can’t agree more. We should be able to make changes in a manner
>> that
>>>>>>> improves the DB In the long term, rather than live with the technical
>>>>> debt
>>>>>>> of arbitrary decisions made by a handful of people.
>>>>>>> 
>>>>>>> I also agree that putting a knob in place to let people migrate over
>> is
>>>>> a
>>>>>>> reasonable decision.
>>>>>>> 
>>>>>>> Jon
>>>>>>> 
>>>>>>> On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith <
>>>>>>> benedict@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> The goal is simply to agree on a set of well-defined principles for
>> how
>>>>>>> we
>>>>>>>> should behave.  If we don’t like the implications that arise, we’ll
>>>>> have
>>>>>>>> another vote?  A democracy cannot bind itself, so I never understood
>>>>> this
>>>>>>>> fear of a decision.
>>>>>>>> 
>>>>>>>> A database also has a thousand toggles.  If we absolutely need to,
>> we
>>>>> can
>>>>>>>> introduce one more.
>>>>>>>> 
>>>>>>>> We should be doing this upfront a great deal more often.  Doing it
>>>>>>>> retrospectively sucks, but in my opinion it's a bad reason to bind
>>>>>>>> ourselves to whatever made it in.
>>>>>>>> 
>>>>>>>> Do we anywhere define the principles of our current behaviour?  I
>>>>>>> couldn’t
>>>>>>>> find it.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 21 Nov 2018, at 21:08, Sylvain Lebresne <le...@gmail.com>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <
>>>>>>>> benedict@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> FWIW, my meaning of arithmetic in this context extends to any
>>>>> features
>>>>>>>> we
>>>>>>>>>> have already released (such as aggregates, and perhaps other
>> built-in
>>>>>>>>>> functions) that operate on the same domain.  We should be
>> consistent,
>>>>>>>> after
>>>>>>>>>> all.
>>>>>>>>>> 
>>>>>>>>>> Whether or not we need to revisit any existing functionality we
>> can
>>>>>>>> figure
>>>>>>>>>> out after the fact, once we have agreed what our behaviour should
>> be.
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I'm not sure I correctly understand the process suggested, but I
>> don't
>>>>>>>>> particularly like/agree with what I understand. What I understand
>> is a
>>>>>>>>> suggestion for voting on agreeing to be ANSI SQL 92 compliant,
>> with no
>>>>>>>> real
>>>>>>>>> evaluation of what that entails (at least I haven't seen one), and
>>>>> that
>>>>>>>>> this vote, if passed, would imply we'd then make any backward
>>>>>>>> incompatible
>>>>>>>>> change necessary to achieve compliance ("my meaning of arithmetic
>> in
>>>>>>> this
>>>>>>>>> context extends to any features we have already released" and
>> "Whether
>>>>>>> or
>>>>>>>>> not we need to revisit any existing functionality we can figure out
>>>>>>> after
>>>>>>>>> the fact, once we have agreed what our behaviour should be").
>>>>>>>>> 
>>>>>>>>> This might make sense of a new product, but at our stage that seems
>>>>>>>>> backward to me. I think we owe our users to first make the effort
>> of
>>>>>>>>> identifying what "inconsistencies" our existing arithmetic has[1]
>> and
>>>>>>>>> _then_ consider what options we have to fix those, with their pros
>> and
>>>>>>>> cons
>>>>>>>>> (including how bad they break backward compatibility). And if
>> _then_
>>>>>>>>> getting ANSI SQL 92 compliant proves to not be disruptive (or at
>> least
>>>>>>>>> acceptably so), then sure, that's great.
>>>>>>>>> 
>>>>>>>>> [1]: one possibly efficient way to do that could actually be to
>>>>> compare
>>>>>>>> our
>>>>>>>>> arithmetic to ANSI SQL 92. Not that all differences found would
>> imply
>>>>>>>>> inconsistencies/wrongness of our arithmetic, but still, it should
>> be
>>>>>>>>> helpful. And I guess my whole point is that we should that analysis
>>>>>>>> first,
>>>>>>>>> and then maybe decide that being ANSI SQL 92 is a reasonable
>> option,
>>>>>>> not
>>>>>>>>> decide first and live with the consequences no matter what they
>> are.
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Sylvain
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> I will make this more explicit for the vote, but just to clarify
>> the
>>>>>>>>>> intention so that we are all discussing the same thing.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 20 Nov 2018, at 14:18, Ariel Weisberg <ad...@fastmail.fm>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> +1
>>>>>>>>>>> 
>>>>>>>>>>> This is a public API so we will be much better off if we get it
>>>>> right
>>>>>>>>>> the first time.
>>>>>>>>>>> 
>>>>>>>>>>> Ariel
>>>>>>>>>>> 
>>>>>>>>>>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <
>> jon@jonhaddad.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Sounds good to me.
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
>>>>>>>>>> benedict@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> So, this thread somewhat petered out.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> There are still a number of unresolved issues, but to make
>>>>>>> progress I
>>>>>>>>>>>>> wonder if it would first be helpful to have a vote on ensuring
>> we
>>>>>>> are
>>>>>>>>>> ANSI
>>>>>>>>>>>>> SQL 92 compliant for our arithmetic?  This seems like a
>> sensible
>>>>>>>>>> baseline,
>>>>>>>>>>>>> since we will hopefully minimise surprise to operators this
>> way.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If people largely agree, I will call a vote, and we can pick
>> up a
>>>>>>>>>> couple
>>>>>>>>>>>>> of more focused discussions afterwards on how we interpret the
>>>>>>> leeway
>>>>>>>>>> it
>>>>>>>>>>>>> gives.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws>
>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> From reading the spec. Precision is always implementation
>>>>> defined.
>>>>>>>> The
>>>>>>>>>>>>> spec specifies scale in several cases, but never precision for
>> any
>>>>>>>>>> type or
>>>>>>>>>>>>> operation (addition/subtraction, multiplication, division).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So we don't implement anything remotely approaching precision
>> and
>>>>>>>>>> scale
>>>>>>>>>>>>> in CQL when it comes to numbers I think? So we aren't going to
>>>>>>> follow
>>>>>>>>>> the
>>>>>>>>>>>>> spec for scale. We are already pretty far down that road so I
>>>>> would
>>>>>>>>>> leave
>>>>>>>>>>>>> it alone.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I don't think the spec is asking for the most approximate
>> type.
>>>>>>> It's
>>>>>>>>>>>>> just saying the result is approximate, and the precision is
>>>>>>>>>> implementation
>>>>>>>>>>>>> defined. We could return either float or double. I think if
>> one of
>>>>>>>> the
>>>>>>>>>>>>> operands is a double we should return a double because clearly
>> the
>>>>>>>>>> schema
>>>>>>>>>>>>> thought a double was required to represent that number. I would
>>>>>>> also
>>>>>>>>>> be in
>>>>>>>>>>>>> favor of returning a double all the time so that people can
>> expect
>>>>>>> a
>>>>>>>>>>>>> consistent type from expressions involving approximate numbers.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I am a big fan of widening for arithmetic expressions in a
>>>>>>> database
>>>>>>>> to
>>>>>>>>>>>>> avoid having to error on overflow. You can go to the trouble of
>>>>>>> only
>>>>>>>>>>>>> widening the minimum amount, but I think it's simpler if we
>> always
>>>>>>>>>> widen to
>>>>>>>>>>>>> bigint and double. This would be something the spec allows.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Definitely if we can make overflow not occur we should and the
>>>>>>> spec
>>>>>>>>>>>>> allows that. We should also not return different types for the
>>>>> same
>>>>>>>>>> operand
>>>>>>>>>>>>> types just to work around overflow if we detect we need more
>>>>>>>> precision.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Ariel
>>>>>>>>>>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith
>> wrote:
>>>>>>>>>>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for
>>>>>>> digging
>>>>>>>>>> this
>>>>>>>>>>>>>>> out (and Mike for getting some empirical examples).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> We still have to decide on the approximate data type to
>> return;
>>>>>>>> right
>>>>>>>>>>>>>>> now, we have float+bigint=double, but float+int=float.  I
>> think
>>>>>>>> this
>>>>>>>>>> is
>>>>>>>>>>>>>>> fairly inconsistent, and either the approximate type should
>>>>>>> always
>>>>>>>>>> win,
>>>>>>>>>>>>>>> or we should always upgrade to double for mixed operands.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The quoted spec also suggests that decimal+float=float, and
>>>>>>> decimal
>>>>>>>>>>>>>>> +double=double, whereas we currently have
>> decimal+float=decimal,
>>>>>>>> and
>>>>>>>>>>>>>>> decimal+double=decimal
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If we’re going to go with an approximate operand implying an
>>>>>>>>>>>>> approximate
>>>>>>>>>>>>>>> result, I think we should do it consistently (and consistent
>>>>> with
>>>>>>>> the
>>>>>>>>>>>>>>> SQL92 spec), and have the type of the approximate operand
>> always
>>>>>>> be
>>>>>>>>>> the
>>>>>>>>>>>>>>> return type.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This would still leave a decision for float+double, though.
>> The
>>>>>>>> most
>>>>>>>>>>>>>>> consistent behaviour with that stated above would be to
>> always
>>>>>>> take
>>>>>>>>>> the
>>>>>>>>>>>>>>> most approximate type to return (i.e. float), but this would
>>>>> seem
>>>>>>>> to
>>>>>>>>>> me
>>>>>>>>>>>>>>> to be fairly unexpected for the user.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ariel@weisberg.ws
>>> 
>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I agree with what's been said about expectations regarding
>>>>>>>>>> expressions
>>>>>>>>>>>>> involving floating point numbers. I think that if one of the
>>>>> inputs
>>>>>>>> is
>>>>>>>>>>>>> approximate then the result should be approximate.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> One thing we could look at for inspiration is the SQL spec.
>> Not
>>>>>>> to
>>>>>>>>>>>>> follow dogmatically necessarily.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> From the SQL 92 spec regarding assignment
>>>>>>>>>>>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
>> section
>>>>>>>> 4.6:
>>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>> Values of the data types NUMERIC, DECIMAL, INTEGER,
>>>>>>> SMALLINT,
>>>>>>>>>>>>>>>> FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
>>>>>>>>>>>>> mutually
>>>>>>>>>>>>>>>> comparable and mutually assignable. If an assignment would
>>>>>>>>>>>>> result
>>>>>>>>>>>>>>>> in a loss of the most significant digits, an exception
>>>>>>>>>> condition
>>>>>>>>>>>>>>>> is raised. If least significant digits are lost,
>>>>>>>>>> implementation-
>>>>>>>>>>>>>>>> defined rounding or truncating occurs with no exception
>>>>>>>>>>>>> condition
>>>>>>>>>>>>>>>> being raised. The rules for arithmetic are generally
>>>>>>> governed
>>>>>>>>>> by
>>>>>>>>>>>>>>>> Subclause 6.12, "<numeric value expression>".
>>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Section 6.12 numeric value expressions:
>>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>> 1) If the data type of both operands of a dyadic arithmetic
>>>>>>>>>>>>> opera-
>>>>>>>>>>>>>>>>    tor is exact numeric, then the data type of the result
>> is
>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>>    numeric, with precision and scale determined as follows:
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> 2) If the data type of either operand of a dyadic
>> arithmetic
>>>>>>>>>> op-
>>>>>>>>>>>>>>>>    erator is approximate numeric, then the data type of the
>>>>>>>> re-
>>>>>>>>>>>>>>>>    sult is approximate numeric. The precision of the result
>>>>>>> is
>>>>>>>>>>>>>>>>    implementation-defined.
>>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> And this makes sense to me. I think we should only return an
>>>>>>> exact
>>>>>>>>>>>>> result if both of the inputs are exact.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I think we might want to look closely at the SQL spec and
>>>>>>>> especially
>>>>>>>>>>>>> when the spec requires an error to be generated. Those are
>>>>>>> sometimes
>>>>>>>>>> in the
>>>>>>>>>>>>> spec to prevent subtle paths to wrong answers. Any time we
>> deviate
>>>>>>>>>> from the
>>>>>>>>>>>>> spec we should be asking why is it in the spec and why are we
>>>>>>>>>> deviating.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Another issue besides overflow handling is how we determine
>>>>>>>>>> precision
>>>>>>>>>>>>> and scale for expressions involving two exact types.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Ariel
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I'm not sure if I would prefer the Postgres way of doing
>>>>>>> things,
>>>>>>>>>>>>> which is
>>>>>>>>>>>>>>>>> returning just about any type depending on the order of
>>>>>>>> operators.
>>>>>>>>>>>>>>>>> Considering it actually mentions in the docs that using
>>>>>>>>>>>>> numeric/decimal is
>>>>>>>>>>>>>>>>> slow and also multiple times that floating points are
>> inexact.
>>>>>>> So
>>>>>>>>>>>>> doing
>>>>>>>>>>>>>>>>> some math with Postgres (9.6.5):
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns
>> double
>>>>>>>>>>>>>>>>> precision 2147483647
>>>>>>>>>>>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
>>>>>>>>>>>>>>>>> SELECT 2147483647::bigint*1.0::real returns double
>>>>>>>>>>>>>>>>> SELECT 2147483647::double precision*1::bigint returns
>> double
>>>>>>>>>>>>> 2147483647
>>>>>>>>>>>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns
>> double
>>>>>>>>>>>>> 2147483647
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> With + - we can get the same amount of mixture of returned
>>>>>>> types.
>>>>>>>>>>>>> There's
>>>>>>>>>>>>>>>>> no difference in those calculations, just some casting. To
>> me
>>>>>>>>>>>>>>>>> floating-point math indicates inexactness and has errors
>> and
>>>>>>>>>> whoever
>>>>>>>>>>>>> mixes
>>>>>>>>>>>>>>>>> up two different types should understand that. If one
>> didn't
>>>>>>> want
>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>>> numeric type, why would the server return such? The
>> floating
>>>>>>>> point
>>>>>>>>>>>>> value
>>>>>>>>>>>>>>>>> itself could be wrong already before the calculation -
>> trying
>>>>>>> to
>>>>>>>>>> say
>>>>>>>>>>>>> we do
>>>>>>>>>>>>>>>>> it lossless is just wrong.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Fun with 2.65:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
>>>>>>>>>>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> SELECT round(2.65) returns numeric 4
>>>>>>>>>>>>>>>>> SELECT round(2.65::double precision) returns double 4
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> SELECT 2.65 * 1 returns double 2.65
>>>>>>>>>>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
>>>>>>>>>>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
>>>>>>>>>>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
>>>>>>>>>>>>>>>>> SELECT round(2.65) * round(1) returns double 3
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> So as we're going to have silly values in any case, why
>>>>> pretend
>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>> else? Also, exact calculations are slow if we crunch large
>>>>>>> amount
>>>>>>>>>> of
>>>>>>>>>>>>>>>>> numbers. I guess I slightly deviated towards Postgres'
>>>>>>>> implemention
>>>>>>>>>>>>> in this
>>>>>>>>>>>>>>>>> case, but I wish it wasn't used as a benchmark in this
>> case.
>>>>>>> And
>>>>>>>>>> most
>>>>>>>>>>>>>>>>> importantly, I would definitely want the exact same type
>>>>>>> returned
>>>>>>>>>>>>> each time
>>>>>>>>>>>>>>>>> I do a calculation.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> - Micke
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
>>>>>>>>>>>>> benedict@apache.org>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> As far as I can tell we reached a relatively strong
>> consensus
>>>>>>>>>> that we
>>>>>>>>>>>>>>>>>> should implement lossless casts by default?  Does anyone
>> have
>>>>>>>>>>>>> anything more
>>>>>>>>>>>>>>>>>> to add?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Looking at the emails, everyone who participated and
>>>>>>> expressed a
>>>>>>>>>>>>>>>>>> preference was in favour of the “Postgres approach” of
>>>>>>> upcasting
>>>>>>>>>> to
>>>>>>>>>>>>> decimal
>>>>>>>>>>>>>>>>>> for mixed float/int operands?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I’d like to get a clear-cut decision on this, so we know
>> what
>>>>>>>>>> we’re
>>>>>>>>>>>>> doing
>>>>>>>>>>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective
>>>>>>> decision
>>>>>>>>>> on
>>>>>>>>>>>>> Ariel’s
>>>>>>>>>>>>>>>>>> concerns about overflow, which I think are also pressing -
>>>>>>>>>>>>> particularly for
>>>>>>>>>>>>>>>>>> tinyint and smallint.  This does also impact implicit
>> casts
>>>>>>> for
>>>>>>>>>> mixed
>>>>>>>>>>>>>>>>>> integer type operations, but an approach for these will
>>>>>>> probably
>>>>>>>>>>>>> fall out
>>>>>>>>>>>>>>>>>> of any decision on overflow.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
>>>>>>>>>>>>> murukesh.mohanan@gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I think you're conflating two things here. There's the
>> loss
>>>>>>>>>>>>> resulting
>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>> using some operators, and loss involved in casting.
>> Dividing
>>>>>>> an
>>>>>>>>>>>>> integer
>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>> another integer to obtain an integer result can result in
>>>>>>> loss,
>>>>>>>>>> but
>>>>>>>>>>>>>>>>>> there's
>>>>>>>>>>>>>>>>>>> no implicit casting there and no loss due to casting.
>>>>>>> Casting
>>>>>>>> an
>>>>>>>>>>>>> integer
>>>>>>>>>>>>>>>>>>> to a float can also result in loss. So dividing an
>> integer
>>>>>>> by a
>>>>>>>>>>>>> float,
>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> example, with an implicit cast has an additional avenue
>> for
>>>>>>>> loss:
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> implicit cast for the operands so that they're of the
>> same
>>>>>>>> type.
>>>>>>>>>> I
>>>>>>>>>>>>>>>>>> believe
>>>>>>>>>>>>>>>>>>> this discussion so far has been about the latter, not the
>>>>>>> loss
>>>>>>>>>> from
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> operations themselves.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
>>>>>>>>>>>>> benjamin.lerer@datastax.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I would like to try to clarify things a bit to help
>> people
>>>>>>> to
>>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>>>>> the true complexity of the problem.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> The *float *and *double *types are inexact numeric
>> types.
>>>>>>> Not
>>>>>>>>>> only
>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> operation level.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> If you insert 676543.21 in a *float* column and then
>> read
>>>>>>> it,
>>>>>>>>>> you
>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>> realize that the value has been truncated to 676543.2.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> If you want accuracy the only way is to avoid those
>> inexact
>>>>>>>>>> types.
>>>>>>>>>>>>>>>>>>>> Using *decimals
>>>>>>>>>>>>>>>>>>>> *during operations will mitigate the problem but will
>> not
>>>>>>>> remove
>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I
>> am
>>>>>>> not
>>>>>>>>>>>>> mistaken
>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is
>> similar
>>>>> to
>>>>>>>>>> what
>>>>>>>>>>>>> MS SQL
>>>>>>>>>>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
>>>>>>>>>> precision
>>>>>>>>>>>>> if you
>>>>>>>>>>>>>>>>>>>> are not carefull.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> If you truly need precision you can have it by using
>> exact
>>>>>>>>>> numeric
>>>>>>>>>>>>> types
>>>>>>>>>>>>>>>>>>>> for your data types. Of course it has a cost on
>>>>> performance,
>>>>>>>>>>>>> memory and
>>>>>>>>>>>>>>>>>>>> disk usage.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> The advantage of the current approach is that it give
>> you
>>>>>>> the
>>>>>>>>>>>>> choice.
>>>>>>>>>>>>>>>>>> It is
>>>>>>>>>>>>>>>>>>>> up to you to decide what you need for your application.
>> It
>>>>>>> is
>>>>>>>>>> also
>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> line
>>>>>>>>>>>>>>>>>>>> with the way CQL behave everywhere else.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Muru
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>> dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>> dev-help@cassandra.apache.org
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>> dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>>>>>>> For additional commands, e-mail:
>> dev-help@cassandra.apache.org
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>>>>>> For additional commands, e-mail:
>> dev-help@cassandra.apache.org
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>>>>> For additional commands, e-mail:
>> dev-help@cassandra.apache.org
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>> Jon Haddad
>>>>>>>>>>>> http://www.rustyrazorblade.com
>>>>>>>>>>>> twitter: rustyrazorblade
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>> Jon Haddad
>>>>>>> http://www.rustyrazorblade.com
>>>>>>> twitter: rustyrazorblade
>>>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> 
>>>>> 
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>>

Re: Implicit Casts for Arithmetic Operators

Posted by Sylvain Lebresne <le...@gmail.com>.

> Anyway, I think we’ve been arguing very unnecessarily about this
ideological
> point, given that I’ve already suggested a toggle to permit users to
continue
> with present-day semantics should they choose.  Surely this resolves your
> concerns, unless you think this is intractable?

Not really :). My beef is really against the idea of deciding upfront about
doing this when we don't even know what implementing and maintaining such
toggle even imply (or even if we'll need one really). I don't know if it's
tractable
or not, but what's wrong with figuring that out first?


Anyway, I'll try to lay out my reasoning on the general issue, and this for
the
sake of the general conversation here.

I agree following a standard for our arithmetic would be nice. I have, at
that
point, no opinion on which standard would be best (I haven't looked, and no
one
shared any analysis), but checking ANSI SQL 92 sure don't seem crazy to me
on
principle. In my perfect world, "we" would do a short "competitive"
analysis of
reasonable options (where _one_ of the criteria would be "what changes to
our
existing code it requires?"), and I'd *love* to see that, but I admit I
don't
have time to do it myself any time soon, so I can't, in good conscience,
strongly object to narrowing it down to ANSI SQL 92 without checking too
much
alternatives.

So I'm all in favor for *looking at* making our arithmetic ANSI SQL 92
compliant. I have, however, no clue what that entails in practice (I haven't
looked at all) and if someone knows, he hasn't share that knowledge so far.

I do disagree however that this adherence to a standard should be decided in
a vacuum. No decision should, this is bad project management in my book
and that is kind of why I want to insist on that point. We should be open
to all
relevant context. We can debate how to weight each part of the context,
sure,
but disregarding the context _by design_, I will have to strongly disagree
with
that.

And to me, the relevant context is (likely forgetting stuffs):
1) Adhering to a standard, as Benedict mentioned, bring 2 nice benefits: 1)
it
   give us confidence we haven't screwed up something badly and 2) it
ensures
   familiarity for people and tools (at least those familiar with that
   particular standard).
2) I haven't seen much evidence so far that we screwed up things badly or
that
   things are super unfamiliar. The cast thing which started this thread is
   certainly worth discussing, but if I understood correctly, we do the
same as
   MS SQL Server so far, so it's not exactly unheard of. To be extra clear,
I'm
   not trying to imply that this render the previous point moot, *it does
not*
   imo, but I do think it is relevant context nonetheless, to be weighted
in.
3) This _could_ create backward incompatibility (or it may not, I don't
   consider changes to behavior introduced in 4.0 backward incompatibility
in
   particular). If so, we should be careful with this (that impact user in
non
   pleasant way). Yes, flags might be an option here to lessen the
   burden on users (not that I love adding more flag by itself btw), but
   depending on what changes we're talking about, said flag _could_ bring
non
   negligible complexity (for the code) that should be factored in.
4) I believe everyone more or less agree that if we do this, we should
   do this in 4.0, so this _could_ create substantial delay for 4.0 (again,
or
   not.  Since we don't know what it involves, we simply don't know). As
   I've expressed some month ago when I pushed for an early freeze, I
   genuinely believe delay to 4.0 are bad for the project at this point.
   I'm *not* saying it is the end of the conversation, I absolutely agree
   the release quality is an important aspect as well for instance, but
   my point is that it should _all_ be factored in.

Currently, we have very little information on how bad 3) and 4) are. So my
current personal opinion is that 1) does justify looking into this much
more closely, and that if 3) and 4) aren't too bad, that's a good deal for
the
project. But in light of 2), I also think there is a "level of badness" for
3)
and 4) at which point it'd become a net negative for the project.

--
Sylvain


On Fri, Nov 23, 2018 at 1:07 AM Benedict Elliott Smith <be...@apache.org>
wrote:

> This was a terribly unclear email, sorry.  I was just trying to find new
> and interesting ways to say the same thing (that we should form our goal
> state from first principles only).
>
> Anyway, I think we’ve been arguing very unnecessarily about this
> ideological point, given that I’ve already suggested a toggle to permit
> users to continue with present-day semantics should they choose.  Surely
> this resolves your concerns, unless you think this is intractable?
>
>
>
>
>
> > On 22 Nov 2018, at 12:13, Benedict Elliott Smith <be...@apache.org>
> wrote:
> >
> > This is why I said the decision is ideological.  We fundamentally
> disagree with each other, on points of principle.
> >
> > This also feels like it’s becoming antagonistic, perhaps through
> misinterpreting each other, which was far from my intent.  So I will limit
> my reply to the only point of interpretation of my position.
> >
> > Given that I personally consider this to be an ideological or
> project-axiomatic decision, I therefore only consider other ideological or
> axiomatic facts to be relevant to a decision like this. So:
> >
> > 1) By “where appropriate” I mean, for instance, that this project will
> likely never support ANSI SQL in toto, by virtue of the fundamental nature
> of the project.
> > 2) I agree that which standard we choose to follow, and why we follow
> it, are both relevant questions
> >
> >
> >
> >> On 22 Nov 2018, at 11:56, Sylvain Lebresne <le...@gmail.com> wrote:
> >>
> >> On Thu, Nov 22, 2018 at 11:51 AM Benedict Elliott Smith <
> benedict@apache.org>
> >> wrote:
> >>
> >>> We’re not presently voting*; we’re only discussing, whether we should
> base
> >>> our behaviour on a widely agreed upon standard.
> >>>
> >>
> >> Well, you *explicitely* asked if people though we should do a vote, and
> I
> >> responded to that part. Let's not pretend I'm interpreting stuff, it's
> >> insulting.
> >>
> >>
> >>> I think perhaps the nub of our disagreement is that, in my view, this
> is
> >>> the only relevant fact to decide. There is no data to base this
> decision
> >>> upon.  It’s axiomatic, or ideological; procedural, not technical:  Do
> we
> >>> think we should try to hew to standards (where appropriate), or do we
> think
> >>> we should stick with what we arrived at in an adhoc manner?
> >>
> >>
> >> Yes, that is probably the nub of our disagreement. I disagree that
> hewing
> >> to standards is something we should agree on absolutely, with no other
> >> consideration in the balance. Hell, I read your "where appropriate" as
> an
> >> admission that you don't even truly think that. I think this is always a
> >> pros versus cons analysis. Adhering to standards is certainly a pro.
> >>
> >> *If* e were starting from scratch, I might maybe agree there isn't much
> >> "cons" in the balance (there is always _some_ consideration though;
> >> adhering to standard might force you into complexity that might not be
> >> justified; not saying it's our case here, just pointing again that I
> don't
> >> adhere to the absolutist view), making it an easy decision. So that I'm
> not
> >> sure we'd even need a vote to agree that "we should try to hew to
> standards
> >> (where appropriate)", even if we'd still want to discuss 1) if it is
> >> appropriate in that case and 2) which standard, so it wouldn't even be a
> >> "no data involved" decision.
> >>
> >> But we're not starting from scratch. You explicitly say yourself that it
> >> "extends to any features we have already released". So backward
> >> compatibility is a parameter we imo *must* take into account. Again,
> >> doesn't mean we don't end up breaking backward compatibility, just that
> it
> >> is a non negligible downside, so we better make sure the "pros" of
> adhering
> >> to a standard makes up for it.
> >>
> >> So yes, I do pretty strongly disagree that adhering to a standard is
> >> something that should be decided absolutely, with no other consideration
> >> taken into account.
> >>
> >>
> >>> and how meandering the discussion was with no clear consensus, it
> seemed
> >>> to need a vote in the near future.
> >>
> >>
> >> Fwiw, I also don't have the same read here. What I see on this thread
> is a
> >> bit of discussion on the specific cast issue you initially brought,
> >> discussion that didn't feel especially stuck to me, but I don't much on
> a
> >> larger discussion on adhering to standards for all our arithmetic before
> >> your suggestion a vote on it might be warranted.
> >>
> >> --
> >> Sylvain
> >>
> >>
> >>>> On 22 Nov 2018, at 09:26, Sylvain Lebresne <le...@gmail.com>
> wrote:
> >>>>
> >>>> I'm not saying "let's not do this no matter what and ever fix
> technical
> >>>> debt", nor am I fearing decision.
> >>>>
> >>>> But I *do* think decisions, technical ones at least, should be fact
> and
> >>>> data driven. And I'm not even sure why we're talking of having a vote
> >>> here.
> >>>> The Apache Way is *not* meant to be primarily vote-driven, votes are
> >>>> supposed to be a last resort when, after having debated facts and
> data,
> >>> no
> >>>> consensus can be reached. Can we have the debate on facts and data
> first?
> >>>> Please.
> >>>>
> >>>> At the of the day, I object to: "There are still a number of
> unresolved
> >>>> issues, but to make progress I wonder if it would first be helpful to
> >>> have
> >>>> a vote on ensuring we are ANSI SQL 92 compliant for our arithmetic?".
> >>> More
> >>>> specifically, I disagree that such vote is a good starting point.
> Let's
> >>>> identify and discuss the unresolved issues first. Let's check
> precisely
> >>>> what getting our arithmetic ANSI SQL 92 compliant means and how we can
> >>> get
> >>>> it. I do support the idea of making such analysis btw, it would be
> good
> >>>> data, but no vote is needed whatsoever to make it. Again, I object to
> >>>> voting first and doing the analysis 2nd.
> >>>>
> >>>> --
> >>>> Sylvain
> >>>>
> >>>>
> >>>> On Thu, Nov 22, 2018 at 1:25 AM Jonathan Haddad <jo...@jonhaddad.com>
> >>> wrote:
> >>>>
> >>>>> I can’t agree more. We should be able to make changes in a manner
> that
> >>>>> improves the DB In the long term, rather than live with the technical
> >>> debt
> >>>>> of arbitrary decisions made by a handful of people.
> >>>>>
> >>>>> I also agree that putting a knob in place to let people migrate over
> is
> >>> a
> >>>>> reasonable decision.
> >>>>>
> >>>>> Jon
> >>>>>
> >>>>> On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith <
> >>>>> benedict@apache.org>
> >>>>> wrote:
> >>>>>
> >>>>>> The goal is simply to agree on a set of well-defined principles for
> how
> >>>>> we
> >>>>>> should behave.  If we don’t like the implications that arise, we’ll
> >>> have
> >>>>>> another vote?  A democracy cannot bind itself, so I never understood
> >>> this
> >>>>>> fear of a decision.
> >>>>>>
> >>>>>> A database also has a thousand toggles.  If we absolutely need to,
> we
> >>> can
> >>>>>> introduce one more.
> >>>>>>
> >>>>>> We should be doing this upfront a great deal more often.  Doing it
> >>>>>> retrospectively sucks, but in my opinion it's a bad reason to bind
> >>>>>> ourselves to whatever made it in.
> >>>>>>
> >>>>>> Do we anywhere define the principles of our current behaviour?  I
> >>>>> couldn’t
> >>>>>> find it.
> >>>>>>
> >>>>>>
> >>>>>>> On 21 Nov 2018, at 21:08, Sylvain Lebresne <le...@gmail.com>
> >>> wrote:
> >>>>>>>
> >>>>>>> On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <
> >>>>>> benedict@apache.org>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> FWIW, my meaning of arithmetic in this context extends to any
> >>> features
> >>>>>> we
> >>>>>>>> have already released (such as aggregates, and perhaps other
> built-in
> >>>>>>>> functions) that operate on the same domain.  We should be
> consistent,
> >>>>>> after
> >>>>>>>> all.
> >>>>>>>>
> >>>>>>>> Whether or not we need to revisit any existing functionality we
> can
> >>>>>> figure
> >>>>>>>> out after the fact, once we have agreed what our behaviour should
> be.
> >>>>>>>>
> >>>>>>>
> >>>>>>> I'm not sure I correctly understand the process suggested, but I
> don't
> >>>>>>> particularly like/agree with what I understand. What I understand
> is a
> >>>>>>> suggestion for voting on agreeing to be ANSI SQL 92 compliant,
> with no
> >>>>>> real
> >>>>>>> evaluation of what that entails (at least I haven't seen one), and
> >>> that
> >>>>>>> this vote, if passed, would imply we'd then make any backward
> >>>>>> incompatible
> >>>>>>> change necessary to achieve compliance ("my meaning of arithmetic
> in
> >>>>> this
> >>>>>>> context extends to any features we have already released" and
> "Whether
> >>>>> or
> >>>>>>> not we need to revisit any existing functionality we can figure out
> >>>>> after
> >>>>>>> the fact, once we have agreed what our behaviour should be").
> >>>>>>>
> >>>>>>> This might make sense of a new product, but at our stage that seems
> >>>>>>> backward to me. I think we owe our users to first make the effort
> of
> >>>>>>> identifying what "inconsistencies" our existing arithmetic has[1]
> and
> >>>>>>> _then_ consider what options we have to fix those, with their pros
> and
> >>>>>> cons
> >>>>>>> (including how bad they break backward compatibility). And if
> _then_
> >>>>>>> getting ANSI SQL 92 compliant proves to not be disruptive (or at
> least
> >>>>>>> acceptably so), then sure, that's great.
> >>>>>>>
> >>>>>>> [1]: one possibly efficient way to do that could actually be to
> >>> compare
> >>>>>> our
> >>>>>>> arithmetic to ANSI SQL 92. Not that all differences found would
> imply
> >>>>>>> inconsistencies/wrongness of our arithmetic, but still, it should
> be
> >>>>>>> helpful. And I guess my whole point is that we should that analysis
> >>>>>> first,
> >>>>>>> and then maybe decide that being ANSI SQL 92 is a reasonable
> option,
> >>>>> not
> >>>>>>> decide first and live with the consequences no matter what they
> are.
> >>>>>>>
> >>>>>>> --
> >>>>>>> Sylvain
> >>>>>>>
> >>>>>>>
> >>>>>>>> I will make this more explicit for the vote, but just to clarify
> the
> >>>>>>>> intention so that we are all discussing the same thing.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On 20 Nov 2018, at 14:18, Ariel Weisberg <ad...@fastmail.fm>
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> +1
> >>>>>>>>>
> >>>>>>>>> This is a public API so we will be much better off if we get it
> >>> right
> >>>>>>>> the first time.
> >>>>>>>>>
> >>>>>>>>> Ariel
> >>>>>>>>>
> >>>>>>>>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <
> jon@jonhaddad.com>
> >>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Sounds good to me.
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
> >>>>>>>> benedict@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> So, this thread somewhat petered out.
> >>>>>>>>>>>
> >>>>>>>>>>> There are still a number of unresolved issues, but to make
> >>>>> progress I
> >>>>>>>>>>> wonder if it would first be helpful to have a vote on ensuring
> we
> >>>>> are
> >>>>>>>> ANSI
> >>>>>>>>>>> SQL 92 compliant for our arithmetic?  This seems like a
> sensible
> >>>>>>>> baseline,
> >>>>>>>>>>> since we will hopefully minimise surprise to operators this
> way.
> >>>>>>>>>>>
> >>>>>>>>>>> If people largely agree, I will call a vote, and we can pick
> up a
> >>>>>>>> couple
> >>>>>>>>>>> of more focused discussions afterwards on how we interpret the
> >>>>> leeway
> >>>>>>>> it
> >>>>>>>>>>> gives.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws>
> >>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> From reading the spec. Precision is always implementation
> >>> defined.
> >>>>>> The
> >>>>>>>>>>> spec specifies scale in several cases, but never precision for
> any
> >>>>>>>> type or
> >>>>>>>>>>> operation (addition/subtraction, multiplication, division).
> >>>>>>>>>>>>
> >>>>>>>>>>>> So we don't implement anything remotely approaching precision
> and
> >>>>>>>> scale
> >>>>>>>>>>> in CQL when it comes to numbers I think? So we aren't going to
> >>>>> follow
> >>>>>>>> the
> >>>>>>>>>>> spec for scale. We are already pretty far down that road so I
> >>> would
> >>>>>>>> leave
> >>>>>>>>>>> it alone.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I don't think the spec is asking for the most approximate
> type.
> >>>>> It's
> >>>>>>>>>>> just saying the result is approximate, and the precision is
> >>>>>>>> implementation
> >>>>>>>>>>> defined. We could return either float or double. I think if
> one of
> >>>>>> the
> >>>>>>>>>>> operands is a double we should return a double because clearly
> the
> >>>>>>>> schema
> >>>>>>>>>>> thought a double was required to represent that number. I would
> >>>>> also
> >>>>>>>> be in
> >>>>>>>>>>> favor of returning a double all the time so that people can
> expect
> >>>>> a
> >>>>>>>>>>> consistent type from expressions involving approximate numbers.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I am a big fan of widening for arithmetic expressions in a
> >>>>> database
> >>>>>> to
> >>>>>>>>>>> avoid having to error on overflow. You can go to the trouble of
> >>>>> only
> >>>>>>>>>>> widening the minimum amount, but I think it's simpler if we
> always
> >>>>>>>> widen to
> >>>>>>>>>>> bigint and double. This would be something the spec allows.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Definitely if we can make overflow not occur we should and the
> >>>>> spec
> >>>>>>>>>>> allows that. We should also not return different types for the
> >>> same
> >>>>>>>> operand
> >>>>>>>>>>> types just to work around overflow if we detect we need more
> >>>>>> precision.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Ariel
> >>>>>>>>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith
> wrote:
> >>>>>>>>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for
> >>>>> digging
> >>>>>>>> this
> >>>>>>>>>>>>> out (and Mike for getting some empirical examples).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We still have to decide on the approximate data type to
> return;
> >>>>>> right
> >>>>>>>>>>>>> now, we have float+bigint=double, but float+int=float.  I
> think
> >>>>>> this
> >>>>>>>> is
> >>>>>>>>>>>>> fairly inconsistent, and either the approximate type should
> >>>>> always
> >>>>>>>> win,
> >>>>>>>>>>>>> or we should always upgrade to double for mixed operands.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The quoted spec also suggests that decimal+float=float, and
> >>>>> decimal
> >>>>>>>>>>>>> +double=double, whereas we currently have
> decimal+float=decimal,
> >>>>>> and
> >>>>>>>>>>>>> decimal+double=decimal
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If we’re going to go with an approximate operand implying an
> >>>>>>>>>>> approximate
> >>>>>>>>>>>>> result, I think we should do it consistently (and consistent
> >>> with
> >>>>>> the
> >>>>>>>>>>>>> SQL92 spec), and have the type of the approximate operand
> always
> >>>>> be
> >>>>>>>> the
> >>>>>>>>>>>>> return type.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This would still leave a decision for float+double, though.
> The
> >>>>>> most
> >>>>>>>>>>>>> consistent behaviour with that stated above would be to
> always
> >>>>> take
> >>>>>>>> the
> >>>>>>>>>>>>> most approximate type to return (i.e. float), but this would
> >>> seem
> >>>>>> to
> >>>>>>>> me
> >>>>>>>>>>>>> to be fairly unexpected for the user.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ariel@weisberg.ws
> >
> >>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I agree with what's been said about expectations regarding
> >>>>>>>> expressions
> >>>>>>>>>>> involving floating point numbers. I think that if one of the
> >>> inputs
> >>>>>> is
> >>>>>>>>>>> approximate then the result should be approximate.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> One thing we could look at for inspiration is the SQL spec.
> Not
> >>>>> to
> >>>>>>>>>>> follow dogmatically necessarily.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> From the SQL 92 spec regarding assignment
> >>>>>>>>>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
> section
> >>>>>> 4.6:
> >>>>>>>>>>>>>> "
> >>>>>>>>>>>>>>  Values of the data types NUMERIC, DECIMAL, INTEGER,
> >>>>> SMALLINT,
> >>>>>>>>>>>>>>  FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
> >>>>>>>>>>> mutually
> >>>>>>>>>>>>>>  comparable and mutually assignable. If an assignment would
> >>>>>>>>>>> result
> >>>>>>>>>>>>>>  in a loss of the most significant digits, an exception
> >>>>>>>> condition
> >>>>>>>>>>>>>>  is raised. If least significant digits are lost,
> >>>>>>>> implementation-
> >>>>>>>>>>>>>>  defined rounding or truncating occurs with no exception
> >>>>>>>>>>> condition
> >>>>>>>>>>>>>>  being raised. The rules for arithmetic are generally
> >>>>> governed
> >>>>>>>> by
> >>>>>>>>>>>>>>  Subclause 6.12, "<numeric value expression>".
> >>>>>>>>>>>>>> "
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Section 6.12 numeric value expressions:
> >>>>>>>>>>>>>> "
> >>>>>>>>>>>>>>  1) If the data type of both operands of a dyadic arithmetic
> >>>>>>>>>>> opera-
> >>>>>>>>>>>>>>     tor is exact numeric, then the data type of the result
> is
> >>>>>>>>>>> exact
> >>>>>>>>>>>>>>     numeric, with precision and scale determined as follows:
> >>>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>  2) If the data type of either operand of a dyadic
> arithmetic
> >>>>>>>> op-
> >>>>>>>>>>>>>>     erator is approximate numeric, then the data type of the
> >>>>>> re-
> >>>>>>>>>>>>>>     sult is approximate numeric. The precision of the result
> >>>>> is
> >>>>>>>>>>>>>>     implementation-defined.
> >>>>>>>>>>>>>> "
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> And this makes sense to me. I think we should only return an
> >>>>> exact
> >>>>>>>>>>> result if both of the inputs are exact.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think we might want to look closely at the SQL spec and
> >>>>>> especially
> >>>>>>>>>>> when the spec requires an error to be generated. Those are
> >>>>> sometimes
> >>>>>>>> in the
> >>>>>>>>>>> spec to prevent subtle paths to wrong answers. Any time we
> deviate
> >>>>>>>> from the
> >>>>>>>>>>> spec we should be asking why is it in the spec and why are we
> >>>>>>>> deviating.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Another issue besides overflow handling is how we determine
> >>>>>>>> precision
> >>>>>>>>>>> and scale for expressions involving two exact types.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Ariel
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I'm not sure if I would prefer the Postgres way of doing
> >>>>> things,
> >>>>>>>>>>> which is
> >>>>>>>>>>>>>>> returning just about any type depending on the order of
> >>>>>> operators.
> >>>>>>>>>>>>>>> Considering it actually mentions in the docs that using
> >>>>>>>>>>> numeric/decimal is
> >>>>>>>>>>>>>>> slow and also multiple times that floating points are
> inexact.
> >>>>> So
> >>>>>>>>>>> doing
> >>>>>>>>>>>>>>> some math with Postgres (9.6.5):
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns
> double
> >>>>>>>>>>>>>>> precision 2147483647
> >>>>>>>>>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
> >>>>>>>>>>>>>>> SELECT 2147483647::bigint*1.0::real returns double
> >>>>>>>>>>>>>>> SELECT 2147483647::double precision*1::bigint returns
> double
> >>>>>>>>>>> 2147483647
> >>>>>>>>>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns
> double
> >>>>>>>>>>> 2147483647
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> With + - we can get the same amount of mixture of returned
> >>>>> types.
> >>>>>>>>>>> There's
> >>>>>>>>>>>>>>> no difference in those calculations, just some casting. To
> me
> >>>>>>>>>>>>>>> floating-point math indicates inexactness and has errors
> and
> >>>>>>>> whoever
> >>>>>>>>>>> mixes
> >>>>>>>>>>>>>>> up two different types should understand that. If one
> didn't
> >>>>> want
> >>>>>>>>>>> exact
> >>>>>>>>>>>>>>> numeric type, why would the server return such? The
> floating
> >>>>>> point
> >>>>>>>>>>> value
> >>>>>>>>>>>>>>> itself could be wrong already before the calculation -
> trying
> >>>>> to
> >>>>>>>> say
> >>>>>>>>>>> we do
> >>>>>>>>>>>>>>> it lossless is just wrong.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Fun with 2.65:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
> >>>>>>>>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> SELECT round(2.65) returns numeric 4
> >>>>>>>>>>>>>>> SELECT round(2.65::double precision) returns double 4
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> SELECT 2.65 * 1 returns double 2.65
> >>>>>>>>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
> >>>>>>>>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
> >>>>>>>>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
> >>>>>>>>>>>>>>> SELECT round(2.65) * round(1) returns double 3
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> So as we're going to have silly values in any case, why
> >>> pretend
> >>>>>>>>>>> something
> >>>>>>>>>>>>>>> else? Also, exact calculations are slow if we crunch large
> >>>>> amount
> >>>>>>>> of
> >>>>>>>>>>>>>>> numbers. I guess I slightly deviated towards Postgres'
> >>>>>> implemention
> >>>>>>>>>>> in this
> >>>>>>>>>>>>>>> case, but I wish it wasn't used as a benchmark in this
> case.
> >>>>> And
> >>>>>>>> most
> >>>>>>>>>>>>>>> importantly, I would definitely want the exact same type
> >>>>> returned
> >>>>>>>>>>> each time
> >>>>>>>>>>>>>>> I do a calculation.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> - Micke
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
> >>>>>>>>>>> benedict@apache.org>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> As far as I can tell we reached a relatively strong
> consensus
> >>>>>>>> that we
> >>>>>>>>>>>>>>>> should implement lossless casts by default?  Does anyone
> have
> >>>>>>>>>>> anything more
> >>>>>>>>>>>>>>>> to add?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Looking at the emails, everyone who participated and
> >>>>> expressed a
> >>>>>>>>>>>>>>>> preference was in favour of the “Postgres approach” of
> >>>>> upcasting
> >>>>>>>> to
> >>>>>>>>>>> decimal
> >>>>>>>>>>>>>>>> for mixed float/int operands?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I’d like to get a clear-cut decision on this, so we know
> what
> >>>>>>>> we’re
> >>>>>>>>>>> doing
> >>>>>>>>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective
> >>>>> decision
> >>>>>>>> on
> >>>>>>>>>>> Ariel’s
> >>>>>>>>>>>>>>>> concerns about overflow, which I think are also pressing -
> >>>>>>>>>>> particularly for
> >>>>>>>>>>>>>>>> tinyint and smallint.  This does also impact implicit
> casts
> >>>>> for
> >>>>>>>> mixed
> >>>>>>>>>>>>>>>> integer type operations, but an approach for these will
> >>>>> probably
> >>>>>>>>>>> fall out
> >>>>>>>>>>>>>>>> of any decision on overflow.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
> >>>>>>>>>>> murukesh.mohanan@gmail.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I think you're conflating two things here. There's the
> loss
> >>>>>>>>>>> resulting
> >>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>> using some operators, and loss involved in casting.
> Dividing
> >>>>> an
> >>>>>>>>>>> integer
> >>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>> another integer to obtain an integer result can result in
> >>>>> loss,
> >>>>>>>> but
> >>>>>>>>>>>>>>>> there's
> >>>>>>>>>>>>>>>>> no implicit casting there and no loss due to casting.
> >>>>> Casting
> >>>>>> an
> >>>>>>>>>>> integer
> >>>>>>>>>>>>>>>>> to a float can also result in loss. So dividing an
> integer
> >>>>> by a
> >>>>>>>>>>> float,
> >>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>> example, with an implicit cast has an additional avenue
> for
> >>>>>> loss:
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> implicit cast for the operands so that they're of the
> same
> >>>>>> type.
> >>>>>>>> I
> >>>>>>>>>>>>>>>> believe
> >>>>>>>>>>>>>>>>> this discussion so far has been about the latter, not the
> >>>>> loss
> >>>>>>>> from
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> operations themselves.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
> >>>>>>>>>>> benjamin.lerer@datastax.com>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I would like to try to clarify things a bit to help
> people
> >>>>> to
> >>>>>>>>>>> understand
> >>>>>>>>>>>>>>>>>> the true complexity of the problem.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The *float *and *double *types are inexact numeric
> types.
> >>>>> Not
> >>>>>>>> only
> >>>>>>>>>>> at
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> operation level.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> If you insert 676543.21 in a *float* column and then
> read
> >>>>> it,
> >>>>>>>> you
> >>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>> realize that the value has been truncated to 676543.2.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> If you want accuracy the only way is to avoid those
> inexact
> >>>>>>>> types.
> >>>>>>>>>>>>>>>>>> Using *decimals
> >>>>>>>>>>>>>>>>>> *during operations will mitigate the problem but will
> not
> >>>>>> remove
> >>>>>>>>>>> it.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I
> am
> >>>>> not
> >>>>>>>>>>> mistaken
> >>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is
> similar
> >>> to
> >>>>>>>> what
> >>>>>>>>>>> MS SQL
> >>>>>>>>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
> >>>>>>>> precision
> >>>>>>>>>>> if you
> >>>>>>>>>>>>>>>>>> are not carefull.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> If you truly need precision you can have it by using
> exact
> >>>>>>>> numeric
> >>>>>>>>>>> types
> >>>>>>>>>>>>>>>>>> for your data types. Of course it has a cost on
> >>> performance,
> >>>>>>>>>>> memory and
> >>>>>>>>>>>>>>>>>> disk usage.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The advantage of the current approach is that it give
> you
> >>>>> the
> >>>>>>>>>>> choice.
> >>>>>>>>>>>>>>>> It is
> >>>>>>>>>>>>>>>>>> up to you to decide what you need for your application.
> It
> >>>>> is
> >>>>>>>> also
> >>>>>>>>>>> in
> >>>>>>>>>>>>>>>> line
> >>>>>>>>>>>>>>>>>> with the way CQL behave everywhere else.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Muru
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>
> ---------------------------------------------------------------------
> >>>>>>>>>>>>>>>> To unsubscribe, e-mail:
> dev-unsubscribe@cassandra.apache.org
> >>>>>>>>>>>>>>>> For additional commands, e-mail:
> >>>>> dev-help@cassandra.apache.org
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>
> ---------------------------------------------------------------------
> >>>>>>>>>>>>>> To unsubscribe, e-mail:
> dev-unsubscribe@cassandra.apache.org
> >>>>>>>>>>>>>> For additional commands, e-mail:
> dev-help@cassandra.apache.org
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>>>>>>> For additional commands, e-mail:
> dev-help@cassandra.apache.org
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>>>>>> For additional commands, e-mail:
> dev-help@cassandra.apache.org
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>> Jon Haddad
> >>>>>>>>>> http://www.rustyrazorblade.com
> >>>>>>>>>> twitter: rustyrazorblade
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>> --
> >>>>> Jon Haddad
> >>>>> http://www.rustyrazorblade.com
> >>>>> twitter: rustyrazorblade
> >>>>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >>>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Implicit Casts for Arithmetic Operators

Posted by Benedict Elliott Smith <be...@apache.org>.

This was a terribly unclear email, sorry.  I was just trying to find new and interesting ways to say the same thing (that we should form our goal state from first principles only).

Anyway, I think we’ve been arguing very unnecessarily about this ideological point, given that I’ve already suggested a toggle to permit users to continue with present-day semantics should they choose.  Surely this resolves your concerns, unless you think this is intractable?





> On 22 Nov 2018, at 12:13, Benedict Elliott Smith <be...@apache.org> wrote:
> 
> This is why I said the decision is ideological.  We fundamentally disagree with each other, on points of principle.
> 
> This also feels like it’s becoming antagonistic, perhaps through misinterpreting each other, which was far from my intent.  So I will limit my reply to the only point of interpretation of my position.
> 
> Given that I personally consider this to be an ideological or project-axiomatic decision, I therefore only consider other ideological or axiomatic facts to be relevant to a decision like this. So:
> 
> 1) By “where appropriate” I mean, for instance, that this project will likely never support ANSI SQL in toto, by virtue of the fundamental nature of the project.
> 2) I agree that which standard we choose to follow, and why we follow it, are both relevant questions
> 
> 
> 
>> On 22 Nov 2018, at 11:56, Sylvain Lebresne <le...@gmail.com> wrote:
>> 
>> On Thu, Nov 22, 2018 at 11:51 AM Benedict Elliott Smith <be...@apache.org>
>> wrote:
>> 
>>> We’re not presently voting*; we’re only discussing, whether we should base
>>> our behaviour on a widely agreed upon standard.
>>> 
>> 
>> Well, you *explicitely* asked if people though we should do a vote, and I
>> responded to that part. Let's not pretend I'm interpreting stuff, it's
>> insulting.
>> 
>> 
>>> I think perhaps the nub of our disagreement is that, in my view, this is
>>> the only relevant fact to decide. There is no data to base this decision
>>> upon.  It’s axiomatic, or ideological; procedural, not technical:  Do we
>>> think we should try to hew to standards (where appropriate), or do we think
>>> we should stick with what we arrived at in an adhoc manner?
>> 
>> 
>> Yes, that is probably the nub of our disagreement. I disagree that hewing
>> to standards is something we should agree on absolutely, with no other
>> consideration in the balance. Hell, I read your "where appropriate" as an
>> admission that you don't even truly think that. I think this is always a
>> pros versus cons analysis. Adhering to standards is certainly a pro.
>> 
>> *If* e were starting from scratch, I might maybe agree there isn't much
>> "cons" in the balance (there is always _some_ consideration though;
>> adhering to standard might force you into complexity that might not be
>> justified; not saying it's our case here, just pointing again that I don't
>> adhere to the absolutist view), making it an easy decision. So that I'm not
>> sure we'd even need a vote to agree that "we should try to hew to standards
>> (where appropriate)", even if we'd still want to discuss 1) if it is
>> appropriate in that case and 2) which standard, so it wouldn't even be a
>> "no data involved" decision.
>> 
>> But we're not starting from scratch. You explicitly say yourself that it
>> "extends to any features we have already released". So backward
>> compatibility is a parameter we imo *must* take into account. Again,
>> doesn't mean we don't end up breaking backward compatibility, just that it
>> is a non negligible downside, so we better make sure the "pros" of adhering
>> to a standard makes up for it.
>> 
>> So yes, I do pretty strongly disagree that adhering to a standard is
>> something that should be decided absolutely, with no other consideration
>> taken into account.
>> 
>> 
>>> and how meandering the discussion was with no clear consensus, it seemed
>>> to need a vote in the near future.
>> 
>> 
>> Fwiw, I also don't have the same read here. What I see on this thread is a
>> bit of discussion on the specific cast issue you initially brought,
>> discussion that didn't feel especially stuck to me, but I don't much on a
>> larger discussion on adhering to standards for all our arithmetic before
>> your suggestion a vote on it might be warranted.
>> 
>> --
>> Sylvain
>> 
>> 
>>>> On 22 Nov 2018, at 09:26, Sylvain Lebresne <le...@gmail.com> wrote:
>>>> 
>>>> I'm not saying "let's not do this no matter what and ever fix technical
>>>> debt", nor am I fearing decision.
>>>> 
>>>> But I *do* think decisions, technical ones at least, should be fact and
>>>> data driven. And I'm not even sure why we're talking of having a vote
>>> here.
>>>> The Apache Way is *not* meant to be primarily vote-driven, votes are
>>>> supposed to be a last resort when, after having debated facts and data,
>>> no
>>>> consensus can be reached. Can we have the debate on facts and data first?
>>>> Please.
>>>> 
>>>> At the of the day, I object to: "There are still a number of unresolved
>>>> issues, but to make progress I wonder if it would first be helpful to
>>> have
>>>> a vote on ensuring we are ANSI SQL 92 compliant for our arithmetic?".
>>> More
>>>> specifically, I disagree that such vote is a good starting point. Let's
>>>> identify and discuss the unresolved issues first. Let's check precisely
>>>> what getting our arithmetic ANSI SQL 92 compliant means and how we can
>>> get
>>>> it. I do support the idea of making such analysis btw, it would be good
>>>> data, but no vote is needed whatsoever to make it. Again, I object to
>>>> voting first and doing the analysis 2nd.
>>>> 
>>>> --
>>>> Sylvain
>>>> 
>>>> 
>>>> On Thu, Nov 22, 2018 at 1:25 AM Jonathan Haddad <jo...@jonhaddad.com>
>>> wrote:
>>>> 
>>>>> I can’t agree more. We should be able to make changes in a manner that
>>>>> improves the DB In the long term, rather than live with the technical
>>> debt
>>>>> of arbitrary decisions made by a handful of people.
>>>>> 
>>>>> I also agree that putting a knob in place to let people migrate over is
>>> a
>>>>> reasonable decision.
>>>>> 
>>>>> Jon
>>>>> 
>>>>> On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith <
>>>>> benedict@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> The goal is simply to agree on a set of well-defined principles for how
>>>>> we
>>>>>> should behave.  If we don’t like the implications that arise, we’ll
>>> have
>>>>>> another vote?  A democracy cannot bind itself, so I never understood
>>> this
>>>>>> fear of a decision.
>>>>>> 
>>>>>> A database also has a thousand toggles.  If we absolutely need to, we
>>> can
>>>>>> introduce one more.
>>>>>> 
>>>>>> We should be doing this upfront a great deal more often.  Doing it
>>>>>> retrospectively sucks, but in my opinion it's a bad reason to bind
>>>>>> ourselves to whatever made it in.
>>>>>> 
>>>>>> Do we anywhere define the principles of our current behaviour?  I
>>>>> couldn’t
>>>>>> find it.
>>>>>> 
>>>>>> 
>>>>>>> On 21 Nov 2018, at 21:08, Sylvain Lebresne <le...@gmail.com>
>>> wrote:
>>>>>>> 
>>>>>>> On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <
>>>>>> benedict@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> FWIW, my meaning of arithmetic in this context extends to any
>>> features
>>>>>> we
>>>>>>>> have already released (such as aggregates, and perhaps other built-in
>>>>>>>> functions) that operate on the same domain.  We should be consistent,
>>>>>> after
>>>>>>>> all.
>>>>>>>> 
>>>>>>>> Whether or not we need to revisit any existing functionality we can
>>>>>> figure
>>>>>>>> out after the fact, once we have agreed what our behaviour should be.
>>>>>>>> 
>>>>>>> 
>>>>>>> I'm not sure I correctly understand the process suggested, but I don't
>>>>>>> particularly like/agree with what I understand. What I understand is a
>>>>>>> suggestion for voting on agreeing to be ANSI SQL 92 compliant, with no
>>>>>> real
>>>>>>> evaluation of what that entails (at least I haven't seen one), and
>>> that
>>>>>>> this vote, if passed, would imply we'd then make any backward
>>>>>> incompatible
>>>>>>> change necessary to achieve compliance ("my meaning of arithmetic in
>>>>> this
>>>>>>> context extends to any features we have already released" and "Whether
>>>>> or
>>>>>>> not we need to revisit any existing functionality we can figure out
>>>>> after
>>>>>>> the fact, once we have agreed what our behaviour should be").
>>>>>>> 
>>>>>>> This might make sense of a new product, but at our stage that seems
>>>>>>> backward to me. I think we owe our users to first make the effort of
>>>>>>> identifying what "inconsistencies" our existing arithmetic has[1] and
>>>>>>> _then_ consider what options we have to fix those, with their pros and
>>>>>> cons
>>>>>>> (including how bad they break backward compatibility). And if _then_
>>>>>>> getting ANSI SQL 92 compliant proves to not be disruptive (or at least
>>>>>>> acceptably so), then sure, that's great.
>>>>>>> 
>>>>>>> [1]: one possibly efficient way to do that could actually be to
>>> compare
>>>>>> our
>>>>>>> arithmetic to ANSI SQL 92. Not that all differences found would imply
>>>>>>> inconsistencies/wrongness of our arithmetic, but still, it should be
>>>>>>> helpful. And I guess my whole point is that we should that analysis
>>>>>> first,
>>>>>>> and then maybe decide that being ANSI SQL 92 is a reasonable option,
>>>>> not
>>>>>>> decide first and live with the consequences no matter what they are.
>>>>>>> 
>>>>>>> --
>>>>>>> Sylvain
>>>>>>> 
>>>>>>> 
>>>>>>>> I will make this more explicit for the vote, but just to clarify the
>>>>>>>> intention so that we are all discussing the same thing.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 20 Nov 2018, at 14:18, Ariel Weisberg <ad...@fastmail.fm>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> +1
>>>>>>>>> 
>>>>>>>>> This is a public API so we will be much better off if we get it
>>> right
>>>>>>>> the first time.
>>>>>>>>> 
>>>>>>>>> Ariel
>>>>>>>>> 
>>>>>>>>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <jo...@jonhaddad.com>
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Sounds good to me.
>>>>>>>>>> 
>>>>>>>>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
>>>>>>>> benedict@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> So, this thread somewhat petered out.
>>>>>>>>>>> 
>>>>>>>>>>> There are still a number of unresolved issues, but to make
>>>>> progress I
>>>>>>>>>>> wonder if it would first be helpful to have a vote on ensuring we
>>>>> are
>>>>>>>> ANSI
>>>>>>>>>>> SQL 92 compliant for our arithmetic?  This seems like a sensible
>>>>>>>> baseline,
>>>>>>>>>>> since we will hopefully minimise surprise to operators this way.
>>>>>>>>>>> 
>>>>>>>>>>> If people largely agree, I will call a vote, and we can pick up a
>>>>>>>> couple
>>>>>>>>>>> of more focused discussions afterwards on how we interpret the
>>>>> leeway
>>>>>>>> it
>>>>>>>>>>> gives.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws>
>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> From reading the spec. Precision is always implementation
>>> defined.
>>>>>> The
>>>>>>>>>>> spec specifies scale in several cases, but never precision for any
>>>>>>>> type or
>>>>>>>>>>> operation (addition/subtraction, multiplication, division).
>>>>>>>>>>>> 
>>>>>>>>>>>> So we don't implement anything remotely approaching precision and
>>>>>>>> scale
>>>>>>>>>>> in CQL when it comes to numbers I think? So we aren't going to
>>>>> follow
>>>>>>>> the
>>>>>>>>>>> spec for scale. We are already pretty far down that road so I
>>> would
>>>>>>>> leave
>>>>>>>>>>> it alone.
>>>>>>>>>>>> 
>>>>>>>>>>>> I don't think the spec is asking for the most approximate type.
>>>>> It's
>>>>>>>>>>> just saying the result is approximate, and the precision is
>>>>>>>> implementation
>>>>>>>>>>> defined. We could return either float or double. I think if one of
>>>>>> the
>>>>>>>>>>> operands is a double we should return a double because clearly the
>>>>>>>> schema
>>>>>>>>>>> thought a double was required to represent that number. I would
>>>>> also
>>>>>>>> be in
>>>>>>>>>>> favor of returning a double all the time so that people can expect
>>>>> a
>>>>>>>>>>> consistent type from expressions involving approximate numbers.
>>>>>>>>>>>> 
>>>>>>>>>>>> I am a big fan of widening for arithmetic expressions in a
>>>>> database
>>>>>> to
>>>>>>>>>>> avoid having to error on overflow. You can go to the trouble of
>>>>> only
>>>>>>>>>>> widening the minimum amount, but I think it's simpler if we always
>>>>>>>> widen to
>>>>>>>>>>> bigint and double. This would be something the spec allows.
>>>>>>>>>>>> 
>>>>>>>>>>>> Definitely if we can make overflow not occur we should and the
>>>>> spec
>>>>>>>>>>> allows that. We should also not return different types for the
>>> same
>>>>>>>> operand
>>>>>>>>>>> types just to work around overflow if we detect we need more
>>>>>> precision.
>>>>>>>>>>>> 
>>>>>>>>>>>> Ariel
>>>>>>>>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
>>>>>>>>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for
>>>>> digging
>>>>>>>> this
>>>>>>>>>>>>> out (and Mike for getting some empirical examples).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We still have to decide on the approximate data type to return;
>>>>>> right
>>>>>>>>>>>>> now, we have float+bigint=double, but float+int=float.  I think
>>>>>> this
>>>>>>>> is
>>>>>>>>>>>>> fairly inconsistent, and either the approximate type should
>>>>> always
>>>>>>>> win,
>>>>>>>>>>>>> or we should always upgrade to double for mixed operands.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The quoted spec also suggests that decimal+float=float, and
>>>>> decimal
>>>>>>>>>>>>> +double=double, whereas we currently have decimal+float=decimal,
>>>>>> and
>>>>>>>>>>>>> decimal+double=decimal
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If we’re going to go with an approximate operand implying an
>>>>>>>>>>> approximate
>>>>>>>>>>>>> result, I think we should do it consistently (and consistent
>>> with
>>>>>> the
>>>>>>>>>>>>> SQL92 spec), and have the type of the approximate operand always
>>>>> be
>>>>>>>> the
>>>>>>>>>>>>> return type.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This would still leave a decision for float+double, though.  The
>>>>>> most
>>>>>>>>>>>>> consistent behaviour with that stated above would be to always
>>>>> take
>>>>>>>> the
>>>>>>>>>>>>> most approximate type to return (i.e. float), but this would
>>> seem
>>>>>> to
>>>>>>>> me
>>>>>>>>>>>>> to be fairly unexpected for the user.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws>
>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I agree with what's been said about expectations regarding
>>>>>>>> expressions
>>>>>>>>>>> involving floating point numbers. I think that if one of the
>>> inputs
>>>>>> is
>>>>>>>>>>> approximate then the result should be approximate.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> One thing we could look at for inspiration is the SQL spec. Not
>>>>> to
>>>>>>>>>>> follow dogmatically necessarily.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> From the SQL 92 spec regarding assignment
>>>>>>>>>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section
>>>>>> 4.6:
>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>  Values of the data types NUMERIC, DECIMAL, INTEGER,
>>>>> SMALLINT,
>>>>>>>>>>>>>>  FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
>>>>>>>>>>> mutually
>>>>>>>>>>>>>>  comparable and mutually assignable. If an assignment would
>>>>>>>>>>> result
>>>>>>>>>>>>>>  in a loss of the most significant digits, an exception
>>>>>>>> condition
>>>>>>>>>>>>>>  is raised. If least significant digits are lost,
>>>>>>>> implementation-
>>>>>>>>>>>>>>  defined rounding or truncating occurs with no exception
>>>>>>>>>>> condition
>>>>>>>>>>>>>>  being raised. The rules for arithmetic are generally
>>>>> governed
>>>>>>>> by
>>>>>>>>>>>>>>  Subclause 6.12, "<numeric value expression>".
>>>>>>>>>>>>>> "
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Section 6.12 numeric value expressions:
>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>  1) If the data type of both operands of a dyadic arithmetic
>>>>>>>>>>> opera-
>>>>>>>>>>>>>>     tor is exact numeric, then the data type of the result is
>>>>>>>>>>> exact
>>>>>>>>>>>>>>     numeric, with precision and scale determined as follows:
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>  2) If the data type of either operand of a dyadic arithmetic
>>>>>>>> op-
>>>>>>>>>>>>>>     erator is approximate numeric, then the data type of the
>>>>>> re-
>>>>>>>>>>>>>>     sult is approximate numeric. The precision of the result
>>>>> is
>>>>>>>>>>>>>>     implementation-defined.
>>>>>>>>>>>>>> "
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> And this makes sense to me. I think we should only return an
>>>>> exact
>>>>>>>>>>> result if both of the inputs are exact.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I think we might want to look closely at the SQL spec and
>>>>>> especially
>>>>>>>>>>> when the spec requires an error to be generated. Those are
>>>>> sometimes
>>>>>>>> in the
>>>>>>>>>>> spec to prevent subtle paths to wrong answers. Any time we deviate
>>>>>>>> from the
>>>>>>>>>>> spec we should be asking why is it in the spec and why are we
>>>>>>>> deviating.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Another issue besides overflow handling is how we determine
>>>>>>>> precision
>>>>>>>>>>> and scale for expressions involving two exact types.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Ariel
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I'm not sure if I would prefer the Postgres way of doing
>>>>> things,
>>>>>>>>>>> which is
>>>>>>>>>>>>>>> returning just about any type depending on the order of
>>>>>> operators.
>>>>>>>>>>>>>>> Considering it actually mentions in the docs that using
>>>>>>>>>>> numeric/decimal is
>>>>>>>>>>>>>>> slow and also multiple times that floating points are inexact.
>>>>> So
>>>>>>>>>>> doing
>>>>>>>>>>>>>>> some math with Postgres (9.6.5):
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double
>>>>>>>>>>>>>>> precision 2147483647
>>>>>>>>>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
>>>>>>>>>>>>>>> SELECT 2147483647::bigint*1.0::real returns double
>>>>>>>>>>>>>>> SELECT 2147483647::double precision*1::bigint returns double
>>>>>>>>>>> 2147483647
>>>>>>>>>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double
>>>>>>>>>>> 2147483647
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> With + - we can get the same amount of mixture of returned
>>>>> types.
>>>>>>>>>>> There's
>>>>>>>>>>>>>>> no difference in those calculations, just some casting. To me
>>>>>>>>>>>>>>> floating-point math indicates inexactness and has errors and
>>>>>>>> whoever
>>>>>>>>>>> mixes
>>>>>>>>>>>>>>> up two different types should understand that. If one didn't
>>>>> want
>>>>>>>>>>> exact
>>>>>>>>>>>>>>> numeric type, why would the server return such? The floating
>>>>>> point
>>>>>>>>>>> value
>>>>>>>>>>>>>>> itself could be wrong already before the calculation - trying
>>>>> to
>>>>>>>> say
>>>>>>>>>>> we do
>>>>>>>>>>>>>>> it lossless is just wrong.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Fun with 2.65:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
>>>>>>>>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> SELECT round(2.65) returns numeric 4
>>>>>>>>>>>>>>> SELECT round(2.65::double precision) returns double 4
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> SELECT 2.65 * 1 returns double 2.65
>>>>>>>>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
>>>>>>>>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
>>>>>>>>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
>>>>>>>>>>>>>>> SELECT round(2.65) * round(1) returns double 3
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> So as we're going to have silly values in any case, why
>>> pretend
>>>>>>>>>>> something
>>>>>>>>>>>>>>> else? Also, exact calculations are slow if we crunch large
>>>>> amount
>>>>>>>> of
>>>>>>>>>>>>>>> numbers. I guess I slightly deviated towards Postgres'
>>>>>> implemention
>>>>>>>>>>> in this
>>>>>>>>>>>>>>> case, but I wish it wasn't used as a benchmark in this case.
>>>>> And
>>>>>>>> most
>>>>>>>>>>>>>>> importantly, I would definitely want the exact same type
>>>>> returned
>>>>>>>>>>> each time
>>>>>>>>>>>>>>> I do a calculation.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - Micke
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
>>>>>>>>>>> benedict@apache.org>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> As far as I can tell we reached a relatively strong consensus
>>>>>>>> that we
>>>>>>>>>>>>>>>> should implement lossless casts by default?  Does anyone have
>>>>>>>>>>> anything more
>>>>>>>>>>>>>>>> to add?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Looking at the emails, everyone who participated and
>>>>> expressed a
>>>>>>>>>>>>>>>> preference was in favour of the “Postgres approach” of
>>>>> upcasting
>>>>>>>> to
>>>>>>>>>>> decimal
>>>>>>>>>>>>>>>> for mixed float/int operands?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I’d like to get a clear-cut decision on this, so we know what
>>>>>>>> we’re
>>>>>>>>>>> doing
>>>>>>>>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective
>>>>> decision
>>>>>>>> on
>>>>>>>>>>> Ariel’s
>>>>>>>>>>>>>>>> concerns about overflow, which I think are also pressing -
>>>>>>>>>>> particularly for
>>>>>>>>>>>>>>>> tinyint and smallint.  This does also impact implicit casts
>>>>> for
>>>>>>>> mixed
>>>>>>>>>>>>>>>> integer type operations, but an approach for these will
>>>>> probably
>>>>>>>>>>> fall out
>>>>>>>>>>>>>>>> of any decision on overflow.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
>>>>>>>>>>> murukesh.mohanan@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I think you're conflating two things here. There's the loss
>>>>>>>>>>> resulting
>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>> using some operators, and loss involved in casting. Dividing
>>>>> an
>>>>>>>>>>> integer
>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>> another integer to obtain an integer result can result in
>>>>> loss,
>>>>>>>> but
>>>>>>>>>>>>>>>> there's
>>>>>>>>>>>>>>>>> no implicit casting there and no loss due to casting.
>>>>> Casting
>>>>>> an
>>>>>>>>>>> integer
>>>>>>>>>>>>>>>>> to a float can also result in loss. So dividing an integer
>>>>> by a
>>>>>>>>>>> float,
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> example, with an implicit cast has an additional avenue for
>>>>>> loss:
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> implicit cast for the operands so that they're of the same
>>>>>> type.
>>>>>>>> I
>>>>>>>>>>>>>>>> believe
>>>>>>>>>>>>>>>>> this discussion so far has been about the latter, not the
>>>>> loss
>>>>>>>> from
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> operations themselves.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
>>>>>>>>>>> benjamin.lerer@datastax.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I would like to try to clarify things a bit to help people
>>>>> to
>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>>> the true complexity of the problem.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The *float *and *double *types are inexact numeric types.
>>>>> Not
>>>>>>>> only
>>>>>>>>>>> at
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> operation level.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> If you insert 676543.21 in a *float* column and then read
>>>>> it,
>>>>>>>> you
>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>> realize that the value has been truncated to 676543.2.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> If you want accuracy the only way is to avoid those inexact
>>>>>>>> types.
>>>>>>>>>>>>>>>>>> Using *decimals
>>>>>>>>>>>>>>>>>> *during operations will mitigate the problem but will not
>>>>>> remove
>>>>>>>>>>> it.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am
>>>>> not
>>>>>>>>>>> mistaken
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar
>>> to
>>>>>>>> what
>>>>>>>>>>> MS SQL
>>>>>>>>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
>>>>>>>> precision
>>>>>>>>>>> if you
>>>>>>>>>>>>>>>>>> are not carefull.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> If you truly need precision you can have it by using exact
>>>>>>>> numeric
>>>>>>>>>>> types
>>>>>>>>>>>>>>>>>> for your data types. Of course it has a cost on
>>> performance,
>>>>>>>>>>> memory and
>>>>>>>>>>>>>>>>>> disk usage.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The advantage of the current approach is that it give you
>>>>> the
>>>>>>>>>>> choice.
>>>>>>>>>>>>>>>> It is
>>>>>>>>>>>>>>>>>> up to you to decide what you need for your application. It
>>>>> is
>>>>>>>> also
>>>>>>>>>>> in
>>>>>>>>>>>>>>>> line
>>>>>>>>>>>>>>>>>> with the way CQL behave everywhere else.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Muru
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>> dev-help@cassandra.apache.org
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>> Jon Haddad
>>>>>>>>>> http://www.rustyrazorblade.com
>>>>>>>>>> twitter: rustyrazorblade
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> --
>>>>> Jon Haddad
>>>>> http://www.rustyrazorblade.com
>>>>> twitter: rustyrazorblade
>>>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Posted by Benedict Elliott Smith <be...@apache.org>.

This is why I said the decision is ideological.  We fundamentally disagree with each other, on points of principle.

This also feels like it’s becoming antagonistic, perhaps through misinterpreting each other, which was far from my intent.  So I will limit my reply to the only point of interpretation of my position.

Given that I personally consider this to be an ideological or project-axiomatic decision, I therefore only consider other ideological or axiomatic facts to be relevant to a decision like this. So:

1) By “where appropriate” I mean, for instance, that this project will likely never support ANSI SQL in toto, by virtue of the fundamental nature of the project.
2) I agree that which standard we choose to follow, and why we follow it, are both relevant questions



> On 22 Nov 2018, at 11:56, Sylvain Lebresne <le...@gmail.com> wrote:
> 
> On Thu, Nov 22, 2018 at 11:51 AM Benedict Elliott Smith <be...@apache.org>
> wrote:
> 
>> We’re not presently voting*; we’re only discussing, whether we should base
>> our behaviour on a widely agreed upon standard.
>> 
> 
> Well, you *explicitely* asked if people though we should do a vote, and I
> responded to that part. Let's not pretend I'm interpreting stuff, it's
> insulting.
> 
> 
>> I think perhaps the nub of our disagreement is that, in my view, this is
>> the only relevant fact to decide. There is no data to base this decision
>> upon.  It’s axiomatic, or ideological; procedural, not technical:  Do we
>> think we should try to hew to standards (where appropriate), or do we think
>> we should stick with what we arrived at in an adhoc manner?
> 
> 
> Yes, that is probably the nub of our disagreement. I disagree that hewing
> to standards is something we should agree on absolutely, with no other
> consideration in the balance. Hell, I read your "where appropriate" as an
> admission that you don't even truly think that. I think this is always a
> pros versus cons analysis. Adhering to standards is certainly a pro.
> 
> *If* e were starting from scratch, I might maybe agree there isn't much
> "cons" in the balance (there is always _some_ consideration though;
> adhering to standard might force you into complexity that might not be
> justified; not saying it's our case here, just pointing again that I don't
> adhere to the absolutist view), making it an easy decision. So that I'm not
> sure we'd even need a vote to agree that "we should try to hew to standards
> (where appropriate)", even if we'd still want to discuss 1) if it is
> appropriate in that case and 2) which standard, so it wouldn't even be a
> "no data involved" decision.
> 
> But we're not starting from scratch. You explicitly say yourself that it
> "extends to any features we have already released". So backward
> compatibility is a parameter we imo *must* take into account. Again,
> doesn't mean we don't end up breaking backward compatibility, just that it
> is a non negligible downside, so we better make sure the "pros" of adhering
> to a standard makes up for it.
> 
> So yes, I do pretty strongly disagree that adhering to a standard is
> something that should be decided absolutely, with no other consideration
> taken into account.
> 
> 
>> and how meandering the discussion was with no clear consensus, it seemed
>> to need a vote in the near future.
> 
> 
> Fwiw, I also don't have the same read here. What I see on this thread is a
> bit of discussion on the specific cast issue you initially brought,
> discussion that didn't feel especially stuck to me, but I don't much on a
> larger discussion on adhering to standards for all our arithmetic before
> your suggestion a vote on it might be warranted.
> 
> --
> Sylvain
> 
> 
>>> On 22 Nov 2018, at 09:26, Sylvain Lebresne <le...@gmail.com> wrote:
>>> 
>>> I'm not saying "let's not do this no matter what and ever fix technical
>>> debt", nor am I fearing decision.
>>> 
>>> But I *do* think decisions, technical ones at least, should be fact and
>>> data driven. And I'm not even sure why we're talking of having a vote
>> here.
>>> The Apache Way is *not* meant to be primarily vote-driven, votes are
>>> supposed to be a last resort when, after having debated facts and data,
>> no
>>> consensus can be reached. Can we have the debate on facts and data first?
>>> Please.
>>> 
>>> At the of the day, I object to: "There are still a number of unresolved
>>> issues, but to make progress I wonder if it would first be helpful to
>> have
>>> a vote on ensuring we are ANSI SQL 92 compliant for our arithmetic?".
>> More
>>> specifically, I disagree that such vote is a good starting point. Let's
>>> identify and discuss the unresolved issues first. Let's check precisely
>>> what getting our arithmetic ANSI SQL 92 compliant means and how we can
>> get
>>> it. I do support the idea of making such analysis btw, it would be good
>>> data, but no vote is needed whatsoever to make it. Again, I object to
>>> voting first and doing the analysis 2nd.
>>> 
>>> --
>>> Sylvain
>>> 
>>> 
>>> On Thu, Nov 22, 2018 at 1:25 AM Jonathan Haddad <jo...@jonhaddad.com>
>> wrote:
>>> 
>>>> I can’t agree more. We should be able to make changes in a manner that
>>>> improves the DB In the long term, rather than live with the technical
>> debt
>>>> of arbitrary decisions made by a handful of people.
>>>> 
>>>> I also agree that putting a knob in place to let people migrate over is
>> a
>>>> reasonable decision.
>>>> 
>>>> Jon
>>>> 
>>>> On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith <
>>>> benedict@apache.org>
>>>> wrote:
>>>> 
>>>>> The goal is simply to agree on a set of well-defined principles for how
>>>> we
>>>>> should behave.  If we don’t like the implications that arise, we’ll
>> have
>>>>> another vote?  A democracy cannot bind itself, so I never understood
>> this
>>>>> fear of a decision.
>>>>> 
>>>>> A database also has a thousand toggles.  If we absolutely need to, we
>> can
>>>>> introduce one more.
>>>>> 
>>>>> We should be doing this upfront a great deal more often.  Doing it
>>>>> retrospectively sucks, but in my opinion it's a bad reason to bind
>>>>> ourselves to whatever made it in.
>>>>> 
>>>>> Do we anywhere define the principles of our current behaviour?  I
>>>> couldn’t
>>>>> find it.
>>>>> 
>>>>> 
>>>>>> On 21 Nov 2018, at 21:08, Sylvain Lebresne <le...@gmail.com>
>> wrote:
>>>>>> 
>>>>>> On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <
>>>>> benedict@apache.org>
>>>>>> wrote:
>>>>>> 
>>>>>>> FWIW, my meaning of arithmetic in this context extends to any
>> features
>>>>> we
>>>>>>> have already released (such as aggregates, and perhaps other built-in
>>>>>>> functions) that operate on the same domain.  We should be consistent,
>>>>> after
>>>>>>> all.
>>>>>>> 
>>>>>>> Whether or not we need to revisit any existing functionality we can
>>>>> figure
>>>>>>> out after the fact, once we have agreed what our behaviour should be.
>>>>>>> 
>>>>>> 
>>>>>> I'm not sure I correctly understand the process suggested, but I don't
>>>>>> particularly like/agree with what I understand. What I understand is a
>>>>>> suggestion for voting on agreeing to be ANSI SQL 92 compliant, with no
>>>>> real
>>>>>> evaluation of what that entails (at least I haven't seen one), and
>> that
>>>>>> this vote, if passed, would imply we'd then make any backward
>>>>> incompatible
>>>>>> change necessary to achieve compliance ("my meaning of arithmetic in
>>>> this
>>>>>> context extends to any features we have already released" and "Whether
>>>> or
>>>>>> not we need to revisit any existing functionality we can figure out
>>>> after
>>>>>> the fact, once we have agreed what our behaviour should be").
>>>>>> 
>>>>>> This might make sense of a new product, but at our stage that seems
>>>>>> backward to me. I think we owe our users to first make the effort of
>>>>>> identifying what "inconsistencies" our existing arithmetic has[1] and
>>>>>> _then_ consider what options we have to fix those, with their pros and
>>>>> cons
>>>>>> (including how bad they break backward compatibility). And if _then_
>>>>>> getting ANSI SQL 92 compliant proves to not be disruptive (or at least
>>>>>> acceptably so), then sure, that's great.
>>>>>> 
>>>>>> [1]: one possibly efficient way to do that could actually be to
>> compare
>>>>> our
>>>>>> arithmetic to ANSI SQL 92. Not that all differences found would imply
>>>>>> inconsistencies/wrongness of our arithmetic, but still, it should be
>>>>>> helpful. And I guess my whole point is that we should that analysis
>>>>> first,
>>>>>> and then maybe decide that being ANSI SQL 92 is a reasonable option,
>>>> not
>>>>>> decide first and live with the consequences no matter what they are.
>>>>>> 
>>>>>> --
>>>>>> Sylvain
>>>>>> 
>>>>>> 
>>>>>>> I will make this more explicit for the vote, but just to clarify the
>>>>>>> intention so that we are all discussing the same thing.
>>>>>>> 
>>>>>>> 
>>>>>>>> On 20 Nov 2018, at 14:18, Ariel Weisberg <ad...@fastmail.fm>
>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> +1
>>>>>>>> 
>>>>>>>> This is a public API so we will be much better off if we get it
>> right
>>>>>>> the first time.
>>>>>>>> 
>>>>>>>> Ariel
>>>>>>>> 
>>>>>>>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <jo...@jonhaddad.com>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Sounds good to me.
>>>>>>>>> 
>>>>>>>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
>>>>>>> benedict@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> So, this thread somewhat petered out.
>>>>>>>>>> 
>>>>>>>>>> There are still a number of unresolved issues, but to make
>>>> progress I
>>>>>>>>>> wonder if it would first be helpful to have a vote on ensuring we
>>>> are
>>>>>>> ANSI
>>>>>>>>>> SQL 92 compliant for our arithmetic?  This seems like a sensible
>>>>>>> baseline,
>>>>>>>>>> since we will hopefully minimise surprise to operators this way.
>>>>>>>>>> 
>>>>>>>>>> If people largely agree, I will call a vote, and we can pick up a
>>>>>>> couple
>>>>>>>>>> of more focused discussions afterwards on how we interpret the
>>>> leeway
>>>>>>> it
>>>>>>>>>> gives.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws>
>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> From reading the spec. Precision is always implementation
>> defined.
>>>>> The
>>>>>>>>>> spec specifies scale in several cases, but never precision for any
>>>>>>> type or
>>>>>>>>>> operation (addition/subtraction, multiplication, division).
>>>>>>>>>>> 
>>>>>>>>>>> So we don't implement anything remotely approaching precision and
>>>>>>> scale
>>>>>>>>>> in CQL when it comes to numbers I think? So we aren't going to
>>>> follow
>>>>>>> the
>>>>>>>>>> spec for scale. We are already pretty far down that road so I
>> would
>>>>>>> leave
>>>>>>>>>> it alone.
>>>>>>>>>>> 
>>>>>>>>>>> I don't think the spec is asking for the most approximate type.
>>>> It's
>>>>>>>>>> just saying the result is approximate, and the precision is
>>>>>>> implementation
>>>>>>>>>> defined. We could return either float or double. I think if one of
>>>>> the
>>>>>>>>>> operands is a double we should return a double because clearly the
>>>>>>> schema
>>>>>>>>>> thought a double was required to represent that number. I would
>>>> also
>>>>>>> be in
>>>>>>>>>> favor of returning a double all the time so that people can expect
>>>> a
>>>>>>>>>> consistent type from expressions involving approximate numbers.
>>>>>>>>>>> 
>>>>>>>>>>> I am a big fan of widening for arithmetic expressions in a
>>>> database
>>>>> to
>>>>>>>>>> avoid having to error on overflow. You can go to the trouble of
>>>> only
>>>>>>>>>> widening the minimum amount, but I think it's simpler if we always
>>>>>>> widen to
>>>>>>>>>> bigint and double. This would be something the spec allows.
>>>>>>>>>>> 
>>>>>>>>>>> Definitely if we can make overflow not occur we should and the
>>>> spec
>>>>>>>>>> allows that. We should also not return different types for the
>> same
>>>>>>> operand
>>>>>>>>>> types just to work around overflow if we detect we need more
>>>>> precision.
>>>>>>>>>>> 
>>>>>>>>>>> Ariel
>>>>>>>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
>>>>>>>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for
>>>> digging
>>>>>>> this
>>>>>>>>>>>> out (and Mike for getting some empirical examples).
>>>>>>>>>>>> 
>>>>>>>>>>>> We still have to decide on the approximate data type to return;
>>>>> right
>>>>>>>>>>>> now, we have float+bigint=double, but float+int=float.  I think
>>>>> this
>>>>>>> is
>>>>>>>>>>>> fairly inconsistent, and either the approximate type should
>>>> always
>>>>>>> win,
>>>>>>>>>>>> or we should always upgrade to double for mixed operands.
>>>>>>>>>>>> 
>>>>>>>>>>>> The quoted spec also suggests that decimal+float=float, and
>>>> decimal
>>>>>>>>>>>> +double=double, whereas we currently have decimal+float=decimal,
>>>>> and
>>>>>>>>>>>> decimal+double=decimal
>>>>>>>>>>>> 
>>>>>>>>>>>> If we’re going to go with an approximate operand implying an
>>>>>>>>>> approximate
>>>>>>>>>>>> result, I think we should do it consistently (and consistent
>> with
>>>>> the
>>>>>>>>>>>> SQL92 spec), and have the type of the approximate operand always
>>>> be
>>>>>>> the
>>>>>>>>>>>> return type.
>>>>>>>>>>>> 
>>>>>>>>>>>> This would still leave a decision for float+double, though.  The
>>>>> most
>>>>>>>>>>>> consistent behaviour with that stated above would be to always
>>>> take
>>>>>>> the
>>>>>>>>>>>> most approximate type to return (i.e. float), but this would
>> seem
>>>>> to
>>>>>>> me
>>>>>>>>>>>> to be fairly unexpected for the user.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws>
>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I agree with what's been said about expectations regarding
>>>>>>> expressions
>>>>>>>>>> involving floating point numbers. I think that if one of the
>> inputs
>>>>> is
>>>>>>>>>> approximate then the result should be approximate.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> One thing we could look at for inspiration is the SQL spec. Not
>>>> to
>>>>>>>>>> follow dogmatically necessarily.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> From the SQL 92 spec regarding assignment
>>>>>>>>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section
>>>>> 4.6:
>>>>>>>>>>>>> "
>>>>>>>>>>>>>   Values of the data types NUMERIC, DECIMAL, INTEGER,
>>>> SMALLINT,
>>>>>>>>>>>>>   FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
>>>>>>>>>> mutually
>>>>>>>>>>>>>   comparable and mutually assignable. If an assignment would
>>>>>>>>>> result
>>>>>>>>>>>>>   in a loss of the most significant digits, an exception
>>>>>>> condition
>>>>>>>>>>>>>   is raised. If least significant digits are lost,
>>>>>>> implementation-
>>>>>>>>>>>>>   defined rounding or truncating occurs with no exception
>>>>>>>>>> condition
>>>>>>>>>>>>>   being raised. The rules for arithmetic are generally
>>>> governed
>>>>>>> by
>>>>>>>>>>>>>   Subclause 6.12, "<numeric value expression>".
>>>>>>>>>>>>> "
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Section 6.12 numeric value expressions:
>>>>>>>>>>>>> "
>>>>>>>>>>>>>   1) If the data type of both operands of a dyadic arithmetic
>>>>>>>>>> opera-
>>>>>>>>>>>>>      tor is exact numeric, then the data type of the result is
>>>>>>>>>> exact
>>>>>>>>>>>>>      numeric, with precision and scale determined as follows:
>>>>>>>>>>>>> ...
>>>>>>>>>>>>>   2) If the data type of either operand of a dyadic arithmetic
>>>>>>> op-
>>>>>>>>>>>>>      erator is approximate numeric, then the data type of the
>>>>> re-
>>>>>>>>>>>>>      sult is approximate numeric. The precision of the result
>>>> is
>>>>>>>>>>>>>      implementation-defined.
>>>>>>>>>>>>> "
>>>>>>>>>>>>> 
>>>>>>>>>>>>> And this makes sense to me. I think we should only return an
>>>> exact
>>>>>>>>>> result if both of the inputs are exact.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I think we might want to look closely at the SQL spec and
>>>>> especially
>>>>>>>>>> when the spec requires an error to be generated. Those are
>>>> sometimes
>>>>>>> in the
>>>>>>>>>> spec to prevent subtle paths to wrong answers. Any time we deviate
>>>>>>> from the
>>>>>>>>>> spec we should be asking why is it in the spec and why are we
>>>>>>> deviating.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Another issue besides overflow handling is how we determine
>>>>>>> precision
>>>>>>>>>> and scale for expressions involving two exact types.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Ariel
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I'm not sure if I would prefer the Postgres way of doing
>>>> things,
>>>>>>>>>> which is
>>>>>>>>>>>>>> returning just about any type depending on the order of
>>>>> operators.
>>>>>>>>>>>>>> Considering it actually mentions in the docs that using
>>>>>>>>>> numeric/decimal is
>>>>>>>>>>>>>> slow and also multiple times that floating points are inexact.
>>>> So
>>>>>>>>>> doing
>>>>>>>>>>>>>> some math with Postgres (9.6.5):
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double
>>>>>>>>>>>>>> precision 2147483647
>>>>>>>>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
>>>>>>>>>>>>>> SELECT 2147483647::bigint*1.0::real returns double
>>>>>>>>>>>>>> SELECT 2147483647::double precision*1::bigint returns double
>>>>>>>>>> 2147483647
>>>>>>>>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double
>>>>>>>>>> 2147483647
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> With + - we can get the same amount of mixture of returned
>>>> types.
>>>>>>>>>> There's
>>>>>>>>>>>>>> no difference in those calculations, just some casting. To me
>>>>>>>>>>>>>> floating-point math indicates inexactness and has errors and
>>>>>>> whoever
>>>>>>>>>> mixes
>>>>>>>>>>>>>> up two different types should understand that. If one didn't
>>>> want
>>>>>>>>>> exact
>>>>>>>>>>>>>> numeric type, why would the server return such? The floating
>>>>> point
>>>>>>>>>> value
>>>>>>>>>>>>>> itself could be wrong already before the calculation - trying
>>>> to
>>>>>>> say
>>>>>>>>>> we do
>>>>>>>>>>>>>> it lossless is just wrong.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Fun with 2.65:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
>>>>>>>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT round(2.65) returns numeric 4
>>>>>>>>>>>>>> SELECT round(2.65::double precision) returns double 4
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT 2.65 * 1 returns double 2.65
>>>>>>>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
>>>>>>>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
>>>>>>>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
>>>>>>>>>>>>>> SELECT round(2.65) * round(1) returns double 3
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So as we're going to have silly values in any case, why
>> pretend
>>>>>>>>>> something
>>>>>>>>>>>>>> else? Also, exact calculations are slow if we crunch large
>>>> amount
>>>>>>> of
>>>>>>>>>>>>>> numbers. I guess I slightly deviated towards Postgres'
>>>>> implemention
>>>>>>>>>> in this
>>>>>>>>>>>>>> case, but I wish it wasn't used as a benchmark in this case.
>>>> And
>>>>>>> most
>>>>>>>>>>>>>> importantly, I would definitely want the exact same type
>>>> returned
>>>>>>>>>> each time
>>>>>>>>>>>>>> I do a calculation.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - Micke
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
>>>>>>>>>> benedict@apache.org>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> As far as I can tell we reached a relatively strong consensus
>>>>>>> that we
>>>>>>>>>>>>>>> should implement lossless casts by default?  Does anyone have
>>>>>>>>>> anything more
>>>>>>>>>>>>>>> to add?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Looking at the emails, everyone who participated and
>>>> expressed a
>>>>>>>>>>>>>>> preference was in favour of the “Postgres approach” of
>>>> upcasting
>>>>>>> to
>>>>>>>>>> decimal
>>>>>>>>>>>>>>> for mixed float/int operands?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I’d like to get a clear-cut decision on this, so we know what
>>>>>>> we’re
>>>>>>>>>> doing
>>>>>>>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective
>>>> decision
>>>>>>> on
>>>>>>>>>> Ariel’s
>>>>>>>>>>>>>>> concerns about overflow, which I think are also pressing -
>>>>>>>>>> particularly for
>>>>>>>>>>>>>>> tinyint and smallint.  This does also impact implicit casts
>>>> for
>>>>>>> mixed
>>>>>>>>>>>>>>> integer type operations, but an approach for these will
>>>> probably
>>>>>>>>>> fall out
>>>>>>>>>>>>>>> of any decision on overflow.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
>>>>>>>>>> murukesh.mohanan@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I think you're conflating two things here. There's the loss
>>>>>>>>>> resulting
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>> using some operators, and loss involved in casting. Dividing
>>>> an
>>>>>>>>>> integer
>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>> another integer to obtain an integer result can result in
>>>> loss,
>>>>>>> but
>>>>>>>>>>>>>>> there's
>>>>>>>>>>>>>>>> no implicit casting there and no loss due to casting.
>>>> Casting
>>>>> an
>>>>>>>>>> integer
>>>>>>>>>>>>>>>> to a float can also result in loss. So dividing an integer
>>>> by a
>>>>>>>>>> float,
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> example, with an implicit cast has an additional avenue for
>>>>> loss:
>>>>>>>>>> the
>>>>>>>>>>>>>>>> implicit cast for the operands so that they're of the same
>>>>> type.
>>>>>>> I
>>>>>>>>>>>>>>> believe
>>>>>>>>>>>>>>>> this discussion so far has been about the latter, not the
>>>> loss
>>>>>>> from
>>>>>>>>>> the
>>>>>>>>>>>>>>>> operations themselves.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
>>>>>>>>>> benjamin.lerer@datastax.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I would like to try to clarify things a bit to help people
>>>> to
>>>>>>>>>> understand
>>>>>>>>>>>>>>>>> the true complexity of the problem.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> The *float *and *double *types are inexact numeric types.
>>>> Not
>>>>>>> only
>>>>>>>>>> at
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> operation level.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> If you insert 676543.21 in a *float* column and then read
>>>> it,
>>>>>>> you
>>>>>>>>>> will
>>>>>>>>>>>>>>>>> realize that the value has been truncated to 676543.2.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> If you want accuracy the only way is to avoid those inexact
>>>>>>> types.
>>>>>>>>>>>>>>>>> Using *decimals
>>>>>>>>>>>>>>>>> *during operations will mitigate the problem but will not
>>>>> remove
>>>>>>>>>> it.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am
>>>> not
>>>>>>>>>> mistaken
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar
>> to
>>>>>>> what
>>>>>>>>>> MS SQL
>>>>>>>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
>>>>>>> precision
>>>>>>>>>> if you
>>>>>>>>>>>>>>>>> are not carefull.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> If you truly need precision you can have it by using exact
>>>>>>> numeric
>>>>>>>>>> types
>>>>>>>>>>>>>>>>> for your data types. Of course it has a cost on
>> performance,
>>>>>>>>>> memory and
>>>>>>>>>>>>>>>>> disk usage.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> The advantage of the current approach is that it give you
>>>> the
>>>>>>>>>> choice.
>>>>>>>>>>>>>>> It is
>>>>>>>>>>>>>>>>> up to you to decide what you need for your application. It
>>>> is
>>>>>>> also
>>>>>>>>>> in
>>>>>>>>>>>>>>> line
>>>>>>>>>>>>>>>>> with the way CQL behave everywhere else.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Muru
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>> dev-help@cassandra.apache.org
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>> Jon Haddad
>>>>>>>>> http://www.rustyrazorblade.com
>>>>>>>>> twitter: rustyrazorblade
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> --
>>>> Jon Haddad
>>>> http://www.rustyrazorblade.com
>>>> twitter: rustyrazorblade
>>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>>

Re: Implicit Casts for Arithmetic Operators

Posted by Sylvain Lebresne <le...@gmail.com>.

On Thu, Nov 22, 2018 at 11:51 AM Benedict Elliott Smith <be...@apache.org>
wrote:

> We’re not presently voting*; we’re only discussing, whether we should base
> our behaviour on a widely agreed upon standard.
>

Well, you *explicitely* asked if people though we should do a vote, and I
responded to that part. Let's not pretend I'm interpreting stuff, it's
insulting.


> I think perhaps the nub of our disagreement is that, in my view, this is
> the only relevant fact to decide. There is no data to base this decision
> upon.  It’s axiomatic, or ideological; procedural, not technical:  Do we
> think we should try to hew to standards (where appropriate), or do we think
> we should stick with what we arrived at in an adhoc manner?


Yes, that is probably the nub of our disagreement. I disagree that hewing
to standards is something we should agree on absolutely, with no other
consideration in the balance. Hell, I read your "where appropriate" as an
admission that you don't even truly think that. I think this is always a
pros versus cons analysis. Adhering to standards is certainly a pro.

*If* e were starting from scratch, I might maybe agree there isn't much
"cons" in the balance (there is always _some_ consideration though;
adhering to standard might force you into complexity that might not be
justified; not saying it's our case here, just pointing again that I don't
adhere to the absolutist view), making it an easy decision. So that I'm not
sure we'd even need a vote to agree that "we should try to hew to standards
(where appropriate)", even if we'd still want to discuss 1) if it is
appropriate in that case and 2) which standard, so it wouldn't even be a
"no data involved" decision.

But we're not starting from scratch. You explicitly say yourself that it
"extends to any features we have already released". So backward
compatibility is a parameter we imo *must* take into account. Again,
doesn't mean we don't end up breaking backward compatibility, just that it
is a non negligible downside, so we better make sure the "pros" of adhering
to a standard makes up for it.

So yes, I do pretty strongly disagree that adhering to a standard is
something that should be decided absolutely, with no other consideration
taken into account.


> and how meandering the discussion was with no clear consensus, it seemed
> to need a vote in the near future.


Fwiw, I also don't have the same read here. What I see on this thread is a
bit of discussion on the specific cast issue you initially brought,
discussion that didn't feel especially stuck to me, but I don't much on a
larger discussion on adhering to standards for all our arithmetic before
your suggestion a vote on it might be warranted.

--
Sylvain


> > On 22 Nov 2018, at 09:26, Sylvain Lebresne <le...@gmail.com> wrote:
> >
> > I'm not saying "let's not do this no matter what and ever fix technical
> > debt", nor am I fearing decision.
> >
> > But I *do* think decisions, technical ones at least, should be fact and
> > data driven. And I'm not even sure why we're talking of having a vote
> here.
> > The Apache Way is *not* meant to be primarily vote-driven, votes are
> > supposed to be a last resort when, after having debated facts and data,
> no
> > consensus can be reached. Can we have the debate on facts and data first?
> > Please.
> >
> > At the of the day, I object to: "There are still a number of unresolved
> > issues, but to make progress I wonder if it would first be helpful to
> have
> > a vote on ensuring we are ANSI SQL 92 compliant for our arithmetic?".
> More
> > specifically, I disagree that such vote is a good starting point. Let's
> > identify and discuss the unresolved issues first. Let's check precisely
> > what getting our arithmetic ANSI SQL 92 compliant means and how we can
> get
> > it. I do support the idea of making such analysis btw, it would be good
> > data, but no vote is needed whatsoever to make it. Again, I object to
> > voting first and doing the analysis 2nd.
> >
> > --
> > Sylvain
> >
> >
> > On Thu, Nov 22, 2018 at 1:25 AM Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
> >
> >> I can’t agree more. We should be able to make changes in a manner that
> >> improves the DB In the long term, rather than live with the technical
> debt
> >> of arbitrary decisions made by a handful of people.
> >>
> >> I also agree that putting a knob in place to let people migrate over is
> a
> >> reasonable decision.
> >>
> >> Jon
> >>
> >> On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith <
> >> benedict@apache.org>
> >> wrote:
> >>
> >>> The goal is simply to agree on a set of well-defined principles for how
> >> we
> >>> should behave.  If we don’t like the implications that arise, we’ll
> have
> >>> another vote?  A democracy cannot bind itself, so I never understood
> this
> >>> fear of a decision.
> >>>
> >>> A database also has a thousand toggles.  If we absolutely need to, we
> can
> >>> introduce one more.
> >>>
> >>> We should be doing this upfront a great deal more often.  Doing it
> >>> retrospectively sucks, but in my opinion it's a bad reason to bind
> >>> ourselves to whatever made it in.
> >>>
> >>> Do we anywhere define the principles of our current behaviour?  I
> >> couldn’t
> >>> find it.
> >>>
> >>>
> >>>> On 21 Nov 2018, at 21:08, Sylvain Lebresne <le...@gmail.com>
> wrote:
> >>>>
> >>>> On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <
> >>> benedict@apache.org>
> >>>> wrote:
> >>>>
> >>>>> FWIW, my meaning of arithmetic in this context extends to any
> features
> >>> we
> >>>>> have already released (such as aggregates, and perhaps other built-in
> >>>>> functions) that operate on the same domain.  We should be consistent,
> >>> after
> >>>>> all.
> >>>>>
> >>>>> Whether or not we need to revisit any existing functionality we can
> >>> figure
> >>>>> out after the fact, once we have agreed what our behaviour should be.
> >>>>>
> >>>>
> >>>> I'm not sure I correctly understand the process suggested, but I don't
> >>>> particularly like/agree with what I understand. What I understand is a
> >>>> suggestion for voting on agreeing to be ANSI SQL 92 compliant, with no
> >>> real
> >>>> evaluation of what that entails (at least I haven't seen one), and
> that
> >>>> this vote, if passed, would imply we'd then make any backward
> >>> incompatible
> >>>> change necessary to achieve compliance ("my meaning of arithmetic in
> >> this
> >>>> context extends to any features we have already released" and "Whether
> >> or
> >>>> not we need to revisit any existing functionality we can figure out
> >> after
> >>>> the fact, once we have agreed what our behaviour should be").
> >>>>
> >>>> This might make sense of a new product, but at our stage that seems
> >>>> backward to me. I think we owe our users to first make the effort of
> >>>> identifying what "inconsistencies" our existing arithmetic has[1] and
> >>>> _then_ consider what options we have to fix those, with their pros and
> >>> cons
> >>>> (including how bad they break backward compatibility). And if _then_
> >>>> getting ANSI SQL 92 compliant proves to not be disruptive (or at least
> >>>> acceptably so), then sure, that's great.
> >>>>
> >>>> [1]: one possibly efficient way to do that could actually be to
> compare
> >>> our
> >>>> arithmetic to ANSI SQL 92. Not that all differences found would imply
> >>>> inconsistencies/wrongness of our arithmetic, but still, it should be
> >>>> helpful. And I guess my whole point is that we should that analysis
> >>> first,
> >>>> and then maybe decide that being ANSI SQL 92 is a reasonable option,
> >> not
> >>>> decide first and live with the consequences no matter what they are.
> >>>>
> >>>> --
> >>>> Sylvain
> >>>>
> >>>>
> >>>>> I will make this more explicit for the vote, but just to clarify the
> >>>>> intention so that we are all discussing the same thing.
> >>>>>
> >>>>>
> >>>>>> On 20 Nov 2018, at 14:18, Ariel Weisberg <ad...@fastmail.fm>
> >> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> +1
> >>>>>>
> >>>>>> This is a public API so we will be much better off if we get it
> right
> >>>>> the first time.
> >>>>>>
> >>>>>> Ariel
> >>>>>>
> >>>>>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <jo...@jonhaddad.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>> Sounds good to me.
> >>>>>>>
> >>>>>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
> >>>>> benedict@apache.org>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> So, this thread somewhat petered out.
> >>>>>>>>
> >>>>>>>> There are still a number of unresolved issues, but to make
> >> progress I
> >>>>>>>> wonder if it would first be helpful to have a vote on ensuring we
> >> are
> >>>>> ANSI
> >>>>>>>> SQL 92 compliant for our arithmetic?  This seems like a sensible
> >>>>> baseline,
> >>>>>>>> since we will hopefully minimise surprise to operators this way.
> >>>>>>>>
> >>>>>>>> If people largely agree, I will call a vote, and we can pick up a
> >>>>> couple
> >>>>>>>> of more focused discussions afterwards on how we interpret the
> >> leeway
> >>>>> it
> >>>>>>>> gives.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws>
> >> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> From reading the spec. Precision is always implementation
> defined.
> >>> The
> >>>>>>>> spec specifies scale in several cases, but never precision for any
> >>>>> type or
> >>>>>>>> operation (addition/subtraction, multiplication, division).
> >>>>>>>>>
> >>>>>>>>> So we don't implement anything remotely approaching precision and
> >>>>> scale
> >>>>>>>> in CQL when it comes to numbers I think? So we aren't going to
> >> follow
> >>>>> the
> >>>>>>>> spec for scale. We are already pretty far down that road so I
> would
> >>>>> leave
> >>>>>>>> it alone.
> >>>>>>>>>
> >>>>>>>>> I don't think the spec is asking for the most approximate type.
> >> It's
> >>>>>>>> just saying the result is approximate, and the precision is
> >>>>> implementation
> >>>>>>>> defined. We could return either float or double. I think if one of
> >>> the
> >>>>>>>> operands is a double we should return a double because clearly the
> >>>>> schema
> >>>>>>>> thought a double was required to represent that number. I would
> >> also
> >>>>> be in
> >>>>>>>> favor of returning a double all the time so that people can expect
> >> a
> >>>>>>>> consistent type from expressions involving approximate numbers.
> >>>>>>>>>
> >>>>>>>>> I am a big fan of widening for arithmetic expressions in a
> >> database
> >>> to
> >>>>>>>> avoid having to error on overflow. You can go to the trouble of
> >> only
> >>>>>>>> widening the minimum amount, but I think it's simpler if we always
> >>>>> widen to
> >>>>>>>> bigint and double. This would be something the spec allows.
> >>>>>>>>>
> >>>>>>>>> Definitely if we can make overflow not occur we should and the
> >> spec
> >>>>>>>> allows that. We should also not return different types for the
> same
> >>>>> operand
> >>>>>>>> types just to work around overflow if we detect we need more
> >>> precision.
> >>>>>>>>>
> >>>>>>>>> Ariel
> >>>>>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
> >>>>>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for
> >> digging
> >>>>> this
> >>>>>>>>>> out (and Mike for getting some empirical examples).
> >>>>>>>>>>
> >>>>>>>>>> We still have to decide on the approximate data type to return;
> >>> right
> >>>>>>>>>> now, we have float+bigint=double, but float+int=float.  I think
> >>> this
> >>>>> is
> >>>>>>>>>> fairly inconsistent, and either the approximate type should
> >> always
> >>>>> win,
> >>>>>>>>>> or we should always upgrade to double for mixed operands.
> >>>>>>>>>>
> >>>>>>>>>> The quoted spec also suggests that decimal+float=float, and
> >> decimal
> >>>>>>>>>> +double=double, whereas we currently have decimal+float=decimal,
> >>> and
> >>>>>>>>>> decimal+double=decimal
> >>>>>>>>>>
> >>>>>>>>>> If we’re going to go with an approximate operand implying an
> >>>>>>>> approximate
> >>>>>>>>>> result, I think we should do it consistently (and consistent
> with
> >>> the
> >>>>>>>>>> SQL92 spec), and have the type of the approximate operand always
> >> be
> >>>>> the
> >>>>>>>>>> return type.
> >>>>>>>>>>
> >>>>>>>>>> This would still leave a decision for float+double, though.  The
> >>> most
> >>>>>>>>>> consistent behaviour with that stated above would be to always
> >> take
> >>>>> the
> >>>>>>>>>> most approximate type to return (i.e. float), but this would
> seem
> >>> to
> >>>>> me
> >>>>>>>>>> to be fairly unexpected for the user.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws>
> >>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> I agree with what's been said about expectations regarding
> >>>>> expressions
> >>>>>>>> involving floating point numbers. I think that if one of the
> inputs
> >>> is
> >>>>>>>> approximate then the result should be approximate.
> >>>>>>>>>>>
> >>>>>>>>>>> One thing we could look at for inspiration is the SQL spec. Not
> >> to
> >>>>>>>> follow dogmatically necessarily.
> >>>>>>>>>>>
> >>>>>>>>>>> From the SQL 92 spec regarding assignment
> >>>>>>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section
> >>> 4.6:
> >>>>>>>>>>> "
> >>>>>>>>>>>    Values of the data types NUMERIC, DECIMAL, INTEGER,
> >> SMALLINT,
> >>>>>>>>>>>    FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
> >>>>>>>> mutually
> >>>>>>>>>>>    comparable and mutually assignable. If an assignment would
> >>>>>>>> result
> >>>>>>>>>>>    in a loss of the most significant digits, an exception
> >>>>> condition
> >>>>>>>>>>>    is raised. If least significant digits are lost,
> >>>>> implementation-
> >>>>>>>>>>>    defined rounding or truncating occurs with no exception
> >>>>>>>> condition
> >>>>>>>>>>>    being raised. The rules for arithmetic are generally
> >> governed
> >>>>> by
> >>>>>>>>>>>    Subclause 6.12, "<numeric value expression>".
> >>>>>>>>>>> "
> >>>>>>>>>>>
> >>>>>>>>>>> Section 6.12 numeric value expressions:
> >>>>>>>>>>> "
> >>>>>>>>>>>    1) If the data type of both operands of a dyadic arithmetic
> >>>>>>>> opera-
> >>>>>>>>>>>       tor is exact numeric, then the data type of the result is
> >>>>>>>> exact
> >>>>>>>>>>>       numeric, with precision and scale determined as follows:
> >>>>>>>>>>> ...
> >>>>>>>>>>>    2) If the data type of either operand of a dyadic arithmetic
> >>>>> op-
> >>>>>>>>>>>       erator is approximate numeric, then the data type of the
> >>> re-
> >>>>>>>>>>>       sult is approximate numeric. The precision of the result
> >> is
> >>>>>>>>>>>       implementation-defined.
> >>>>>>>>>>> "
> >>>>>>>>>>>
> >>>>>>>>>>> And this makes sense to me. I think we should only return an
> >> exact
> >>>>>>>> result if both of the inputs are exact.
> >>>>>>>>>>>
> >>>>>>>>>>> I think we might want to look closely at the SQL spec and
> >>> especially
> >>>>>>>> when the spec requires an error to be generated. Those are
> >> sometimes
> >>>>> in the
> >>>>>>>> spec to prevent subtle paths to wrong answers. Any time we deviate
> >>>>> from the
> >>>>>>>> spec we should be asking why is it in the spec and why are we
> >>>>> deviating.
> >>>>>>>>>>>
> >>>>>>>>>>> Another issue besides overflow handling is how we determine
> >>>>> precision
> >>>>>>>> and scale for expressions involving two exact types.
> >>>>>>>>>>>
> >>>>>>>>>>> Ariel
> >>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm not sure if I would prefer the Postgres way of doing
> >> things,
> >>>>>>>> which is
> >>>>>>>>>>>> returning just about any type depending on the order of
> >>> operators.
> >>>>>>>>>>>> Considering it actually mentions in the docs that using
> >>>>>>>> numeric/decimal is
> >>>>>>>>>>>> slow and also multiple times that floating points are inexact.
> >> So
> >>>>>>>> doing
> >>>>>>>>>>>> some math with Postgres (9.6.5):
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double
> >>>>>>>>>>>> precision 2147483647
> >>>>>>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
> >>>>>>>>>>>> SELECT 2147483647::bigint*1.0::real returns double
> >>>>>>>>>>>> SELECT 2147483647::double precision*1::bigint returns double
> >>>>>>>> 2147483647
> >>>>>>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double
> >>>>>>>> 2147483647
> >>>>>>>>>>>>
> >>>>>>>>>>>> With + - we can get the same amount of mixture of returned
> >> types.
> >>>>>>>> There's
> >>>>>>>>>>>> no difference in those calculations, just some casting. To me
> >>>>>>>>>>>> floating-point math indicates inexactness and has errors and
> >>>>> whoever
> >>>>>>>> mixes
> >>>>>>>>>>>> up two different types should understand that. If one didn't
> >> want
> >>>>>>>> exact
> >>>>>>>>>>>> numeric type, why would the server return such? The floating
> >>> point
> >>>>>>>> value
> >>>>>>>>>>>> itself could be wrong already before the calculation - trying
> >> to
> >>>>> say
> >>>>>>>> we do
> >>>>>>>>>>>> it lossless is just wrong.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Fun with 2.65:
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
> >>>>>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT round(2.65) returns numeric 4
> >>>>>>>>>>>> SELECT round(2.65::double precision) returns double 4
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT 2.65 * 1 returns double 2.65
> >>>>>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
> >>>>>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
> >>>>>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
> >>>>>>>>>>>> SELECT round(2.65) * round(1) returns double 3
> >>>>>>>>>>>>
> >>>>>>>>>>>> So as we're going to have silly values in any case, why
> pretend
> >>>>>>>> something
> >>>>>>>>>>>> else? Also, exact calculations are slow if we crunch large
> >> amount
> >>>>> of
> >>>>>>>>>>>> numbers. I guess I slightly deviated towards Postgres'
> >>> implemention
> >>>>>>>> in this
> >>>>>>>>>>>> case, but I wish it wasn't used as a benchmark in this case.
> >> And
> >>>>> most
> >>>>>>>>>>>> importantly, I would definitely want the exact same type
> >> returned
> >>>>>>>> each time
> >>>>>>>>>>>> I do a calculation.
> >>>>>>>>>>>>
> >>>>>>>>>>>> - Micke
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
> >>>>>>>> benedict@apache.org>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> As far as I can tell we reached a relatively strong consensus
> >>>>> that we
> >>>>>>>>>>>>> should implement lossless casts by default?  Does anyone have
> >>>>>>>> anything more
> >>>>>>>>>>>>> to add?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Looking at the emails, everyone who participated and
> >> expressed a
> >>>>>>>>>>>>> preference was in favour of the “Postgres approach” of
> >> upcasting
> >>>>> to
> >>>>>>>> decimal
> >>>>>>>>>>>>> for mixed float/int operands?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I’d like to get a clear-cut decision on this, so we know what
> >>>>> we’re
> >>>>>>>> doing
> >>>>>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective
> >> decision
> >>>>> on
> >>>>>>>> Ariel’s
> >>>>>>>>>>>>> concerns about overflow, which I think are also pressing -
> >>>>>>>> particularly for
> >>>>>>>>>>>>> tinyint and smallint.  This does also impact implicit casts
> >> for
> >>>>> mixed
> >>>>>>>>>>>>> integer type operations, but an approach for these will
> >> probably
> >>>>>>>> fall out
> >>>>>>>>>>>>> of any decision on overflow.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
> >>>>>>>> murukesh.mohanan@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think you're conflating two things here. There's the loss
> >>>>>>>> resulting
> >>>>>>>>>>>>> from
> >>>>>>>>>>>>>> using some operators, and loss involved in casting. Dividing
> >> an
> >>>>>>>> integer
> >>>>>>>>>>>>> by
> >>>>>>>>>>>>>> another integer to obtain an integer result can result in
> >> loss,
> >>>>> but
> >>>>>>>>>>>>> there's
> >>>>>>>>>>>>>> no implicit casting there and no loss due to casting.
> >> Casting
> >>> an
> >>>>>>>> integer
> >>>>>>>>>>>>>> to a float can also result in loss. So dividing an integer
> >> by a
> >>>>>>>> float,
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>>> example, with an implicit cast has an additional avenue for
> >>> loss:
> >>>>>>>> the
> >>>>>>>>>>>>>> implicit cast for the operands so that they're of the same
> >>> type.
> >>>>> I
> >>>>>>>>>>>>> believe
> >>>>>>>>>>>>>> this discussion so far has been about the latter, not the
> >> loss
> >>>>> from
> >>>>>>>> the
> >>>>>>>>>>>>>> operations themselves.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
> >>>>>>>> benjamin.lerer@datastax.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I would like to try to clarify things a bit to help people
> >> to
> >>>>>>>> understand
> >>>>>>>>>>>>>>> the true complexity of the problem.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The *float *and *double *types are inexact numeric types.
> >> Not
> >>>>> only
> >>>>>>>> at
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> operation level.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If you insert 676543.21 in a *float* column and then read
> >> it,
> >>>>> you
> >>>>>>>> will
> >>>>>>>>>>>>>>> realize that the value has been truncated to 676543.2.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If you want accuracy the only way is to avoid those inexact
> >>>>> types.
> >>>>>>>>>>>>>>> Using *decimals
> >>>>>>>>>>>>>>> *during operations will mitigate the problem but will not
> >>> remove
> >>>>>>>> it.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am
> >> not
> >>>>>>>> mistaken
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar
> to
> >>>>> what
> >>>>>>>> MS SQL
> >>>>>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
> >>>>> precision
> >>>>>>>> if you
> >>>>>>>>>>>>>>> are not carefull.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If you truly need precision you can have it by using exact
> >>>>> numeric
> >>>>>>>> types
> >>>>>>>>>>>>>>> for your data types. Of course it has a cost on
> performance,
> >>>>>>>> memory and
> >>>>>>>>>>>>>>> disk usage.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The advantage of the current approach is that it give you
> >> the
> >>>>>>>> choice.
> >>>>>>>>>>>>> It is
> >>>>>>>>>>>>>>> up to you to decide what you need for your application. It
> >> is
> >>>>> also
> >>>>>>>> in
> >>>>>>>>>>>>> line
> >>>>>>>>>>>>>>> with the way CQL behave everywhere else.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Muru
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>>>>>>> For additional commands, e-mail:
> >> dev-help@cassandra.apache.org
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>> ---------------------------------------------------------------------
> >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>
> >>>>>>>> --
> >>>>>>> Jon Haddad
> >>>>>>> http://www.rustyrazorblade.com
> >>>>>>> twitter: rustyrazorblade
> >>>>>>
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>
> >>>>>
> >>>
> >>> --
> >> Jon Haddad
> >> http://www.rustyrazorblade.com
> >> twitter: rustyrazorblade
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Implicit Casts for Arithmetic Operators

Posted by Benedict Elliott Smith <be...@apache.org>.

Well, to expand my glib statement, standards exist for at least two reasons that I endorse in this case:

1) They are well thought out, with a great deal more consideration than we have time to give to a problem
2) They are widely implemented, understood and used.  So our users and developers have a common point of reference.



> On 22 Nov 2018, at 11:42, Benjamin Lerer <be...@datastax.com> wrote:
> 
> Sorry, following a standard for the sake of following a standard does not
> make sense to me.
> 
> On Thu, Nov 22, 2018 at 12:33 PM Benedict Elliott Smith <be...@apache.org>
> wrote:
> 
>> Yes.
>> 
>>> On 22 Nov 2018, at 11:32, Benjamin Lerer <be...@datastax.com>
>> wrote:
>>> 
>>> Then I would be interested in knowing `where we should be`. If the answer
>>> is `ANSI SQL92` then my question is: Why? Simply for the sake of
>> following
>>> a standard?
>>> 
>>> 
>>> On Thu, Nov 22, 2018 at 12:19 PM Benedict Elliott Smith <
>> benedict@apache.org <ma...@apache.org>>
>>> wrote:
>>> 
>>>> As I say, for me this is explicitly unhelpful, so I have no intention of
>>>> producing it (though, of course, I cannot prevent you from producing it)
>>>> 
>>>> For me, the correct approach is to decide where we should be, and then
>>>> figure out how to get there.  Where we are has no bearing on where we
>>>> should be, in my view.
>>>> 
>>>> 
>>>> 
>>>>> On 22 Nov 2018, at 11:12, Benjamin Lerer <be...@datastax.com>
>>>> wrote:
>>>>> 
>>>>> I would also like to see an analysis of what being ANSI SQL 92
>> compliant
>>>>> means in term of change of behavior (for arithmetics and *any features
>> we
>>>>> have already released*).
>>>>> Simply because without it, I find the decision pretty hard to make.
>>>>> 
>>>>> On Thu, Nov 22, 2018 at 11:51 AM Benedict Elliott Smith <
>>>> benedict@apache.org <ma...@apache.org> <mailto:
>> benedict@apache.org <ma...@apache.org>>>
>>>>> wrote:
>>>>> 
>>>>>> We’re not presently voting*; we’re only discussing, whether we should
>>>> base
>>>>>> our behaviour on a widely agreed upon standard.
>>>>>> 
>>>>>> I think perhaps the nub of our disagreement is that, in my view, this
>> is
>>>>>> the only relevant fact to decide.  There is no data to base this
>>>> decision
>>>>>> upon.  It’s axiomatic, or ideological; procedural, not technical:  Do
>> we
>>>>>> think we should try to hew to standards (where appropriate), or do we
>>>> think
>>>>>> we should stick with what we arrived at in an adhoc manner?
>>>>>> 
>>>>>> If we believe the former, as I now do, then the current state is only
>>>>>> relevant when we come to implement the decision.
>>>>>> 
>>>>>> 
>>>>>> * But given how peripheral and inherently ideological this decision
>> is,
>>>>>> and how meandering the discussion was with no clear consensus, it
>>>> seemed to
>>>>>> need a vote in the near future.  The prospect of a vote seems to have
>>>>>> brought some healthy debate forward too, which is great, but I
>>>> apologise if
>>>>>> this somehow came across as presumptuous.
>>>>>> 
>>>>>> 
>>>>>>> On 22 Nov 2018, at 09:26, Sylvain Lebresne <lebresne@gmail.com
>> <ma...@gmail.com>> wrote:
>>>>>>> 
>>>>>>> I'm not saying "let's not do this no matter what and ever fix
>> technical
>>>>>>> debt", nor am I fearing decision.
>>>>>>> 
>>>>>>> But I *do* think decisions, technical ones at least, should be fact
>> and
>>>>>>> data driven. And I'm not even sure why we're talking of having a vote
>>>>>> here.
>>>>>>> The Apache Way is *not* meant to be primarily vote-driven, votes are
>>>>>>> supposed to be a last resort when, after having debated facts and
>> data,
>>>>>> no
>>>>>>> consensus can be reached. Can we have the debate on facts and data
>>>> first?
>>>>>>> Please.
>>>>>>> 
>>>>>>> At the of the day, I object to: "There are still a number of
>> unresolved
>>>>>>> issues, but to make progress I wonder if it would first be helpful to
>>>>>> have
>>>>>>> a vote on ensuring we are ANSI SQL 92 compliant for our arithmetic?".
>>>>>> More
>>>>>>> specifically, I disagree that such vote is a good starting point.
>> Let's
>>>>>>> identify and discuss the unresolved issues first. Let's check
>> precisely
>>>>>>> what getting our arithmetic ANSI SQL 92 compliant means and how we
>> can
>>>>>> get
>>>>>>> it. I do support the idea of making such analysis btw, it would be
>> good
>>>>>>> data, but no vote is needed whatsoever to make it. Again, I object to
>>>>>>> voting first and doing the analysis 2nd.
>>>>>>> 
>>>>>>> --
>>>>>>> Sylvain
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Nov 22, 2018 at 1:25 AM Jonathan Haddad <jon@jonhaddad.com
>> <ma...@jonhaddad.com>>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> I can’t agree more. We should be able to make changes in a manner
>> that
>>>>>>>> improves the DB In the long term, rather than live with the
>> technical
>>>>>> debt
>>>>>>>> of arbitrary decisions made by a handful of people.
>>>>>>>> 
>>>>>>>> I also agree that putting a knob in place to let people migrate over
>>>> is
>>>>>> a
>>>>>>>> reasonable decision.
>>>>>>>> 
>>>>>>>> Jon
>>>>>>>> 
>>>>>>>> On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith <
>>>>>>>> benedict@apache.org <ma...@apache.org>>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> The goal is simply to agree on a set of well-defined principles for
>>>> how
>>>>>>>> we
>>>>>>>>> should behave.  If we don’t like the implications that arise, we’ll
>>>>>> have
>>>>>>>>> another vote?  A democracy cannot bind itself, so I never
>> understood
>>>>>> this
>>>>>>>>> fear of a decision.
>>>>>>>>> 
>>>>>>>>> A database also has a thousand toggles.  If we absolutely need to,
>> we
>>>>>> can
>>>>>>>>> introduce one more.
>>>>>>>>> 
>>>>>>>>> We should be doing this upfront a great deal more often.  Doing it
>>>>>>>>> retrospectively sucks, but in my opinion it's a bad reason to bind
>>>>>>>>> ourselves to whatever made it in.
>>>>>>>>> 
>>>>>>>>> Do we anywhere define the principles of our current behaviour?  I
>>>>>>>> couldn’t
>>>>>>>>> find it.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 21 Nov 2018, at 21:08, Sylvain Lebresne <lebresne@gmail.com
>> <ma...@gmail.com>>
>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <
>>>>>>>>> benedict@apache.org <ma...@apache.org>>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> FWIW, my meaning of arithmetic in this context extends to any
>>>>>> features
>>>>>>>>> we
>>>>>>>>>>> have already released (such as aggregates, and perhaps other
>>>> built-in
>>>>>>>>>>> functions) that operate on the same domain.  We should be
>>>> consistent,
>>>>>>>>> after
>>>>>>>>>>> all.
>>>>>>>>>>> 
>>>>>>>>>>> Whether or not we need to revisit any existing functionality we
>> can
>>>>>>>>> figure
>>>>>>>>>>> out after the fact, once we have agreed what our behaviour should
>>>> be.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I'm not sure I correctly understand the process suggested, but I
>>>> don't
>>>>>>>>>> particularly like/agree with what I understand. What I understand
>>>> is a
>>>>>>>>>> suggestion for voting on agreeing to be ANSI SQL 92 compliant,
>> with
>>>> no
>>>>>>>>> real
>>>>>>>>>> evaluation of what that entails (at least I haven't seen one), and
>>>>>> that
>>>>>>>>>> this vote, if passed, would imply we'd then make any backward
>>>>>>>>> incompatible
>>>>>>>>>> change necessary to achieve compliance ("my meaning of arithmetic
>> in
>>>>>>>> this
>>>>>>>>>> context extends to any features we have already released" and
>>>> "Whether
>>>>>>>> or
>>>>>>>>>> not we need to revisit any existing functionality we can figure
>> out
>>>>>>>> after
>>>>>>>>>> the fact, once we have agreed what our behaviour should be").
>>>>>>>>>> 
>>>>>>>>>> This might make sense of a new product, but at our stage that
>> seems
>>>>>>>>>> backward to me. I think we owe our users to first make the effort
>> of
>>>>>>>>>> identifying what "inconsistencies" our existing arithmetic has[1]
>>>> and
>>>>>>>>>> _then_ consider what options we have to fix those, with their pros
>>>> and
>>>>>>>>> cons
>>>>>>>>>> (including how bad they break backward compatibility). And if
>> _then_
>>>>>>>>>> getting ANSI SQL 92 compliant proves to not be disruptive (or at
>>>> least
>>>>>>>>>> acceptably so), then sure, that's great.
>>>>>>>>>> 
>>>>>>>>>> [1]: one possibly efficient way to do that could actually be to
>>>>>> compare
>>>>>>>>> our
>>>>>>>>>> arithmetic to ANSI SQL 92. Not that all differences found would
>>>> imply
>>>>>>>>>> inconsistencies/wrongness of our arithmetic, but still, it should
>> be
>>>>>>>>>> helpful. And I guess my whole point is that we should that
>> analysis
>>>>>>>>> first,
>>>>>>>>>> and then maybe decide that being ANSI SQL 92 is a reasonable
>> option,
>>>>>>>> not
>>>>>>>>>> decide first and live with the consequences no matter what they
>> are.
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Sylvain
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> I will make this more explicit for the vote, but just to clarify
>>>> the
>>>>>>>>>>> intention so that we are all discussing the same thing.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On 20 Nov 2018, at 14:18, Ariel Weisberg <adweisbe@fastmail.fm
>> <ma...@fastmail.fm>>
>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> +1
>>>>>>>>>>>> 
>>>>>>>>>>>> This is a public API so we will be much better off if we get it
>>>>>> right
>>>>>>>>>>> the first time.
>>>>>>>>>>>> 
>>>>>>>>>>>> Ariel
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <
>> jon@jonhaddad.com <ma...@jonhaddad.com>
>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Sounds good to me.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
>>>>>>>>>>> benedict@apache.org <ma...@apache.org>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So, this thread somewhat petered out.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> There are still a number of unresolved issues, but to make
>>>>>>>> progress I
>>>>>>>>>>>>>> wonder if it would first be helpful to have a vote on ensuring
>>>> we
>>>>>>>> are
>>>>>>>>>>> ANSI
>>>>>>>>>>>>>> SQL 92 compliant for our arithmetic?  This seems like a
>> sensible
>>>>>>>>>>> baseline,
>>>>>>>>>>>>>> since we will hopefully minimise surprise to operators this
>> way.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If people largely agree, I will call a vote, and we can pick
>> up
>>>> a
>>>>>>>>>>> couple
>>>>>>>>>>>>>> of more focused discussions afterwards on how we interpret the
>>>>>>>> leeway
>>>>>>>>>>> it
>>>>>>>>>>>>>> gives.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ariel@weisberg.ws
>> <ma...@weisberg.ws>>
>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> From reading the spec. Precision is always implementation
>>>>>> defined.
>>>>>>>>> The
>>>>>>>>>>>>>> spec specifies scale in several cases, but never precision for
>>>> any
>>>>>>>>>>> type or
>>>>>>>>>>>>>> operation (addition/subtraction, multiplication, division).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> So we don't implement anything remotely approaching precision
>>>> and
>>>>>>>>>>> scale
>>>>>>>>>>>>>> in CQL when it comes to numbers I think? So we aren't going to
>>>>>>>> follow
>>>>>>>>>>> the
>>>>>>>>>>>>>> spec for scale. We are already pretty far down that road so I
>>>>>> would
>>>>>>>>>>> leave
>>>>>>>>>>>>>> it alone.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I don't think the spec is asking for the most approximate
>> type.
>>>>>>>> It's
>>>>>>>>>>>>>> just saying the result is approximate, and the precision is
>>>>>>>>>>> implementation
>>>>>>>>>>>>>> defined. We could return either float or double. I think if
>> one
>>>> of
>>>>>>>>> the
>>>>>>>>>>>>>> operands is a double we should return a double because clearly
>>>> the
>>>>>>>>>>> schema
>>>>>>>>>>>>>> thought a double was required to represent that number. I
>> would
>>>>>>>> also
>>>>>>>>>>> be in
>>>>>>>>>>>>>> favor of returning a double all the time so that people can
>>>> expect
>>>>>>>> a
>>>>>>>>>>>>>> consistent type from expressions involving approximate
>> numbers.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I am a big fan of widening for arithmetic expressions in a
>>>>>>>> database
>>>>>>>>> to
>>>>>>>>>>>>>> avoid having to error on overflow. You can go to the trouble
>> of
>>>>>>>> only
>>>>>>>>>>>>>> widening the minimum amount, but I think it's simpler if we
>>>> always
>>>>>>>>>>> widen to
>>>>>>>>>>>>>> bigint and double. This would be something the spec allows.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Definitely if we can make overflow not occur we should and
>> the
>>>>>>>> spec
>>>>>>>>>>>>>> allows that. We should also not return different types for the
>>>>>> same
>>>>>>>>>>> operand
>>>>>>>>>>>>>> types just to work around overflow if we detect we need more
>>>>>>>>> precision.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Ariel
>>>>>>>>>>>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith
>>>> wrote:
>>>>>>>>>>>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for
>>>>>>>> digging
>>>>>>>>>>> this
>>>>>>>>>>>>>>>> out (and Mike for getting some empirical examples).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> We still have to decide on the approximate data type to
>>>> return;
>>>>>>>>> right
>>>>>>>>>>>>>>>> now, we have float+bigint=double, but float+int=float.  I
>>>> think
>>>>>>>>> this
>>>>>>>>>>> is
>>>>>>>>>>>>>>>> fairly inconsistent, and either the approximate type should
>>>>>>>> always
>>>>>>>>>>> win,
>>>>>>>>>>>>>>>> or we should always upgrade to double for mixed operands.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The quoted spec also suggests that decimal+float=float, and
>>>>>>>> decimal
>>>>>>>>>>>>>>>> +double=double, whereas we currently have
>>>> decimal+float=decimal,
>>>>>>>>> and
>>>>>>>>>>>>>>>> decimal+double=decimal
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> If we’re going to go with an approximate operand implying an
>>>>>>>>>>>>>> approximate
>>>>>>>>>>>>>>>> result, I think we should do it consistently (and consistent
>>>>>> with
>>>>>>>>> the
>>>>>>>>>>>>>>>> SQL92 spec), and have the type of the approximate operand
>>>> always
>>>>>>>> be
>>>>>>>>>>> the
>>>>>>>>>>>>>>>> return type.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> This would still leave a decision for float+double, though.
>>>> The
>>>>>>>>> most
>>>>>>>>>>>>>>>> consistent behaviour with that stated above would be to
>> always
>>>>>>>> take
>>>>>>>>>>> the
>>>>>>>>>>>>>>>> most approximate type to return (i.e. float), but this would
>>>>>> seem
>>>>>>>>> to
>>>>>>>>>>> me
>>>>>>>>>>>>>>>> to be fairly unexpected for the user.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <
>> ariel@weisberg.ws <ma...@weisberg.ws>>
>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I agree with what's been said about expectations regarding
>>>>>>>>>>> expressions
>>>>>>>>>>>>>> involving floating point numbers. I think that if one of the
>>>>>> inputs
>>>>>>>>> is
>>>>>>>>>>>>>> approximate then the result should be approximate.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> One thing we could look at for inspiration is the SQL spec.
>>>> Not
>>>>>>>> to
>>>>>>>>>>>>>> follow dogmatically necessarily.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> From the SQL 92 spec regarding assignment
>>>>>>>>>>>>>> 
>>>>>> 
>>>> 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.contrib.andrew.cmu.edu_-7Eshadow_sql_sql1992.txt&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=2dMzYnFvO5Wf7J74IbDE27vxjfOX2xYT4-u7MEXUqHg&e=
>> <
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.contrib.andrew.cmu.edu_-7Eshadow_sql_sql1992.txt&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=2dMzYnFvO5Wf7J74IbDE27vxjfOX2xYT4-u7MEXUqHg&e=
>>> 
>>>> <
>>>> 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.contrib.andrew.cmu.edu_-7Eshadow_sql_sql1992.txt&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=2dMzYnFvO5Wf7J74IbDE27vxjfOX2xYT4-u7MEXUqHg&e=
>> <
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.contrib.andrew.cmu.edu_-7Eshadow_sql_sql1992.txt&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=2dMzYnFvO5Wf7J74IbDE27vxjfOX2xYT4-u7MEXUqHg&e=
>>> 
>>>>> 
>>>>>> section
>>>>>>>>> 4.6:
>>>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>>> Values of the data types NUMERIC, DECIMAL, INTEGER,
>>>>>>>> SMALLINT,
>>>>>>>>>>>>>>>>> FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
>>>>>>>>>>>>>> mutually
>>>>>>>>>>>>>>>>> comparable and mutually assignable. If an assignment would
>>>>>>>>>>>>>> result
>>>>>>>>>>>>>>>>> in a loss of the most significant digits, an exception
>>>>>>>>>>> condition
>>>>>>>>>>>>>>>>> is raised. If least significant digits are lost,
>>>>>>>>>>> implementation-
>>>>>>>>>>>>>>>>> defined rounding or truncating occurs with no exception
>>>>>>>>>>>>>> condition
>>>>>>>>>>>>>>>>> being raised. The rules for arithmetic are generally
>>>>>>>> governed
>>>>>>>>>>> by
>>>>>>>>>>>>>>>>> Subclause 6.12, "<numeric value expression>".
>>>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Section 6.12 numeric value expressions:
>>>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>>> 1) If the data type of both operands of a dyadic
>> arithmetic
>>>>>>>>>>>>>> opera-
>>>>>>>>>>>>>>>>>    tor is exact numeric, then the data type of the result
>>>> is
>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>>>    numeric, with precision and scale determined as
>> follows:
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>> 2) If the data type of either operand of a dyadic
>>>> arithmetic
>>>>>>>>>>> op-
>>>>>>>>>>>>>>>>>    erator is approximate numeric, then the data type of
>> the
>>>>>>>>> re-
>>>>>>>>>>>>>>>>>    sult is approximate numeric. The precision of the
>> result
>>>>>>>> is
>>>>>>>>>>>>>>>>>    implementation-defined.
>>>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> And this makes sense to me. I think we should only return
>> an
>>>>>>>> exact
>>>>>>>>>>>>>> result if both of the inputs are exact.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I think we might want to look closely at the SQL spec and
>>>>>>>>> especially
>>>>>>>>>>>>>> when the spec requires an error to be generated. Those are
>>>>>>>> sometimes
>>>>>>>>>>> in the
>>>>>>>>>>>>>> spec to prevent subtle paths to wrong answers. Any time we
>>>> deviate
>>>>>>>>>>> from the
>>>>>>>>>>>>>> spec we should be asking why is it in the spec and why are we
>>>>>>>>>>> deviating.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Another issue besides overflow handling is how we determine
>>>>>>>>>>> precision
>>>>>>>>>>>>>> and scale for expressions involving two exact types.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Ariel
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I'm not sure if I would prefer the Postgres way of doing
>>>>>>>> things,
>>>>>>>>>>>>>> which is
>>>>>>>>>>>>>>>>>> returning just about any type depending on the order of
>>>>>>>>> operators.
>>>>>>>>>>>>>>>>>> Considering it actually mentions in the docs that using
>>>>>>>>>>>>>> numeric/decimal is
>>>>>>>>>>>>>>>>>> slow and also multiple times that floating points are
>>>> inexact.
>>>>>>>> So
>>>>>>>>>>>>>> doing
>>>>>>>>>>>>>>>>>> some math with Postgres (9.6.5):
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647> <tel:2147483647
>> <tel:2147483647>>::bigint*1.0::double
>>>> precision returns double
>>>>>>>>>>>>>>>>>> precision 2147483647 <tel:2147483647> <tel:2147483647
>> <tel:2147483647>>
>>>>>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647> <tel:2147483647
>> <tel:2147483647>>::bigint*1.0 returns
>>>> numeric 2147483647.0 <tel:2147483647.0> <tel:2147483647.0
>> <tel:2147483647.0>>
>>>>>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647> <tel:2147483647
>> <tel:2147483647>>::bigint*1.0::real
>>>> returns double
>>>>>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647> <tel:2147483647
>> <tel:2147483647>>::double
>>>> precision*1::bigint returns double
>>>>>>>>>>>>>> 2147483647 <tel:2147483647> <tel:2147483647 <tel:2147483647>>
>>>>>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647> <tel:2147483647
>> <tel:2147483647>>::double
>>>> precision*1.0::bigint returns double
>>>>>>>>>>>>>> 2147483647 <tel:2147483647> <tel:2147483647 <tel:2147483647>>
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> With + - we can get the same amount of mixture of returned
>>>>>>>> types.
>>>>>>>>>>>>>> There's
>>>>>>>>>>>>>>>>>> no difference in those calculations, just some casting. To
>>>> me
>>>>>>>>>>>>>>>>>> floating-point math indicates inexactness and has errors
>> and
>>>>>>>>>>> whoever
>>>>>>>>>>>>>> mixes
>>>>>>>>>>>>>>>>>> up two different types should understand that. If one
>> didn't
>>>>>>>> want
>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>>>> numeric type, why would the server return such? The
>> floating
>>>>>>>>> point
>>>>>>>>>>>>>> value
>>>>>>>>>>>>>>>>>> itself could be wrong already before the calculation -
>>>> trying
>>>>>>>> to
>>>>>>>>>>> say
>>>>>>>>>>>>>> we do
>>>>>>>>>>>>>>>>>> it lossless is just wrong.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Fun with 2.65:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
>>>>>>>>>>>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> SELECT round(2.65) returns numeric 4
>>>>>>>>>>>>>>>>>> SELECT round(2.65::double precision) returns double 4
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> SELECT 2.65 * 1 returns double 2.65
>>>>>>>>>>>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
>>>>>>>>>>>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
>>>>>>>>>>>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
>>>>>>>>>>>>>>>>>> SELECT round(2.65) * round(1) returns double 3
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> So as we're going to have silly values in any case, why
>>>>>> pretend
>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>> else? Also, exact calculations are slow if we crunch large
>>>>>>>> amount
>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> numbers. I guess I slightly deviated towards Postgres'
>>>>>>>>> implemention
>>>>>>>>>>>>>> in this
>>>>>>>>>>>>>>>>>> case, but I wish it wasn't used as a benchmark in this
>> case.
>>>>>>>> And
>>>>>>>>>>> most
>>>>>>>>>>>>>>>>>> importantly, I would definitely want the exact same type
>>>>>>>> returned
>>>>>>>>>>>>>> each time
>>>>>>>>>>>>>>>>>> I do a calculation.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> - Micke
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
>>>>>>>>>>>>>> benedict@apache.org <ma...@apache.org> <mailto:
>> benedict@apache.org <ma...@apache.org>>>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> As far as I can tell we reached a relatively strong
>>>> consensus
>>>>>>>>>>> that we
>>>>>>>>>>>>>>>>>>> should implement lossless casts by default?  Does anyone
>>>> have
>>>>>>>>>>>>>> anything more
>>>>>>>>>>>>>>>>>>> to add?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Looking at the emails, everyone who participated and
>>>>>>>> expressed a
>>>>>>>>>>>>>>>>>>> preference was in favour of the “Postgres approach” of
>>>>>>>> upcasting
>>>>>>>>>>> to
>>>>>>>>>>>>>> decimal
>>>>>>>>>>>>>>>>>>> for mixed float/int operands?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I’d like to get a clear-cut decision on this, so we know
>>>> what
>>>>>>>>>>> we’re
>>>>>>>>>>>>>> doing
>>>>>>>>>>>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective
>>>>>>>> decision
>>>>>>>>>>> on
>>>>>>>>>>>>>> Ariel’s
>>>>>>>>>>>>>>>>>>> concerns about overflow, which I think are also pressing
>> -
>>>>>>>>>>>>>> particularly for
>>>>>>>>>>>>>>>>>>> tinyint and smallint.  This does also impact implicit
>> casts
>>>>>>>> for
>>>>>>>>>>> mixed
>>>>>>>>>>>>>>>>>>> integer type operations, but an approach for these will
>>>>>>>> probably
>>>>>>>>>>>>>> fall out
>>>>>>>>>>>>>>>>>>> of any decision on overflow.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
>>>>>>>>>>>>>> murukesh.mohanan@gmail.com <ma...@gmail.com>
>> <mailto:murukesh.mohanan@gmail.com <ma...@gmail.com>>>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I think you're conflating two things here. There's the
>>>> loss
>>>>>>>>>>>>>> resulting
>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>> using some operators, and loss involved in casting.
>>>> Dividing
>>>>>>>> an
>>>>>>>>>>>>>> integer
>>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>> another integer to obtain an integer result can result
>> in
>>>>>>>> loss,
>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>> there's
>>>>>>>>>>>>>>>>>>>> no implicit casting there and no loss due to casting.
>>>>>>>> Casting
>>>>>>>>> an
>>>>>>>>>>>>>> integer
>>>>>>>>>>>>>>>>>>>> to a float can also result in loss. So dividing an
>> integer
>>>>>>>> by a
>>>>>>>>>>>>>> float,
>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>> example, with an implicit cast has an additional avenue
>>>> for
>>>>>>>>> loss:
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> implicit cast for the operands so that they're of the
>> same
>>>>>>>>> type.
>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>> believe
>>>>>>>>>>>>>>>>>>>> this discussion so far has been about the latter, not
>> the
>>>>>>>> loss
>>>>>>>>>>> from
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> operations themselves.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
>>>>>>>>>>>>>> benjamin.lerer@datastax.com <mailto:
>> benjamin.lerer@datastax.com> <mailto:benjamin.lerer@datastax.com <mailto:
>> benjamin.lerer@datastax.com>
>>>>>> 
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I would like to try to clarify things a bit to help
>>>> people
>>>>>>>> to
>>>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>>>>>> the true complexity of the problem.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> The *float *and *double *types are inexact numeric
>> types.
>>>>>>>> Not
>>>>>>>>>>> only
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> operation level.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> If you insert 676543.21 in a *float* column and then
>> read
>>>>>>>> it,
>>>>>>>>>>> you
>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>> realize that the value has been truncated to 676543.2.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> If you want accuracy the only way is to avoid those
>>>> inexact
>>>>>>>>>>> types.
>>>>>>>>>>>>>>>>>>>>> Using *decimals
>>>>>>>>>>>>>>>>>>>>> *during operations will mitigate the problem but will
>> not
>>>>>>>>> remove
>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I
>>>> am
>>>>>>>> not
>>>>>>>>>>>>>> mistaken
>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is
>> similar
>>>>>> to
>>>>>>>>>>> what
>>>>>>>>>>>>>> MS SQL
>>>>>>>>>>>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
>>>>>>>>>>> precision
>>>>>>>>>>>>>> if you
>>>>>>>>>>>>>>>>>>>>> are not carefull.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> If you truly need precision you can have it by using
>>>> exact
>>>>>>>>>>> numeric
>>>>>>>>>>>>>> types
>>>>>>>>>>>>>>>>>>>>> for your data types. Of course it has a cost on
>>>>>> performance,
>>>>>>>>>>>>>> memory and
>>>>>>>>>>>>>>>>>>>>> disk usage.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> The advantage of the current approach is that it give
>> you
>>>>>>>> the
>>>>>>>>>>>>>> choice.
>>>>>>>>>>>>>>>>>>> It is
>>>>>>>>>>>>>>>>>>>>> up to you to decide what you need for your application.
>>>> It
>>>>>>>> is
>>>>>>>>>>> also
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> line
>>>>>>>>>>>>>>>>>>>>> with the way CQL behave everywhere else.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Muru
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>> dev-unsubscribe@cassandra.apache.org <mailto:
>> dev-unsubscribe@cassandra.apache.org> <mailto:
>>>> dev-unsubscribe@cassandra.apache.org <mailto:
>> dev-unsubscribe@cassandra.apache.org>>
>>>>>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>> dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>> <mailto:dev-help@cassandra.apache.org <mailto:
>> dev-help@cassandra.apache.org>>
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>> dev-unsubscribe@cassandra.apache.org <mailto:
>> dev-unsubscribe@cassandra.apache.org>
>>>> <mailto:dev-unsubscribe@cassandra.apache.org <mailto:
>> dev-unsubscribe@cassandra.apache.org>>
>>>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>> dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>> <mailto:dev-help@cassandra.apache.org <mailto:
>> dev-help@cassandra.apache.org>>
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>> dev-unsubscribe@cassandra.apache.org <mailto:
>> dev-unsubscribe@cassandra.apache.org>
>>>> <mailto:dev-unsubscribe@cassandra.apache.org <mailto:
>> dev-unsubscribe@cassandra.apache.org>>
>>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>> dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>> <mailto:dev-help@cassandra.apache.org <mailto:
>> dev-help@cassandra.apache.org>>
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> <ma...@cassandra.apache.org>
>>>> <mailto:dev-unsubscribe@cassandra.apache.org <mailto:
>> dev-unsubscribe@cassandra.apache.org>>
>>>>>>>>>>>>>>> For additional commands, e-mail:
>> dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>>>> <mailto:dev-help@cassandra.apache.org <mailto:
>> dev-help@cassandra.apache.org>>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> <ma...@cassandra.apache.org>
>>>> <mailto:dev-unsubscribe@cassandra.apache.org <mailto:
>> dev-unsubscribe@cassandra.apache.org>>
>>>>>>>>>>>>>> For additional commands, e-mail:
>> dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>>>> <mailto:dev-help@cassandra.apache.org <mailto:
>> dev-help@cassandra.apache.org>>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>> Jon Haddad
>>>>>>>>>>>>> 
>>>>>> 
>>>> 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
>> <
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
>>> 
>>>> <
>>>> 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
>> <
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
>>> 
>>>>> 
>>>>>>>>>>>>> twitter: rustyrazorblade
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> <ma...@cassandra.apache.org>
>>>> <mailto:dev-unsubscribe@cassandra.apache.org <mailto:
>> dev-unsubscribe@cassandra.apache.org>>
>>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> <ma...@cassandra.apache.org>
>>>> <mailto:dev-help@cassandra.apache.org <mailto:
>> dev-help@cassandra.apache.org>>
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> <ma...@cassandra.apache.org>
>>>> <mailto:dev-unsubscribe@cassandra.apache.org <mailto:
>> dev-unsubscribe@cassandra.apache.org>>
>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> <ma...@cassandra.apache.org>
>>>> <mailto:dev-help@cassandra.apache.org <mailto:
>> dev-help@cassandra.apache.org>>
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>> Jon Haddad
>>>>>>>> 
>>>>>> 
>>>> 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
>> <
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
>>> 
>>>> <
>>>> 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
>> <
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
>>> 
>>>>> 
>>>>>>>> twitter: rustyrazorblade
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <mailto:
>> dev-unsubscribe@cassandra.apache.org> <mailto:
>>>> dev-unsubscribe@cassandra.apache.org <mailto:
>> dev-unsubscribe@cassandra.apache.org>>
>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> <ma...@cassandra.apache.org> <mailto:
>>>> dev-help@cassandra.apache.org <ma...@cassandra.apache.org>>
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Posted by Benjamin Lerer <be...@datastax.com>.

Sorry, following a standard for the sake of following a standard does not
make sense to me.

On Thu, Nov 22, 2018 at 12:33 PM Benedict Elliott Smith <be...@apache.org>
wrote:

> Yes.
>
> > On 22 Nov 2018, at 11:32, Benjamin Lerer <be...@datastax.com>
> wrote:
> >
> > Then I would be interested in knowing `where we should be`. If the answer
> > is `ANSI SQL92` then my question is: Why? Simply for the sake of
> following
> > a standard?
> >
> >
> > On Thu, Nov 22, 2018 at 12:19 PM Benedict Elliott Smith <
> benedict@apache.org <ma...@apache.org>>
> > wrote:
> >
> >> As I say, for me this is explicitly unhelpful, so I have no intention of
> >> producing it (though, of course, I cannot prevent you from producing it)
> >>
> >> For me, the correct approach is to decide where we should be, and then
> >> figure out how to get there.  Where we are has no bearing on where we
> >> should be, in my view.
> >>
> >>
> >>
> >>> On 22 Nov 2018, at 11:12, Benjamin Lerer <be...@datastax.com>
> >> wrote:
> >>>
> >>> I would also like to see an analysis of what being ANSI SQL 92
> compliant
> >>> means in term of change of behavior (for arithmetics and *any features
> we
> >>> have already released*).
> >>> Simply because without it, I find the decision pretty hard to make.
> >>>
> >>> On Thu, Nov 22, 2018 at 11:51 AM Benedict Elliott Smith <
> >> benedict@apache.org <ma...@apache.org> <mailto:
> benedict@apache.org <ma...@apache.org>>>
> >>> wrote:
> >>>
> >>>> We’re not presently voting*; we’re only discussing, whether we should
> >> base
> >>>> our behaviour on a widely agreed upon standard.
> >>>>
> >>>> I think perhaps the nub of our disagreement is that, in my view, this
> is
> >>>> the only relevant fact to decide.  There is no data to base this
> >> decision
> >>>> upon.  It’s axiomatic, or ideological; procedural, not technical:  Do
> we
> >>>> think we should try to hew to standards (where appropriate), or do we
> >> think
> >>>> we should stick with what we arrived at in an adhoc manner?
> >>>>
> >>>> If we believe the former, as I now do, then the current state is only
> >>>> relevant when we come to implement the decision.
> >>>>
> >>>>
> >>>> * But given how peripheral and inherently ideological this decision
> is,
> >>>> and how meandering the discussion was with no clear consensus, it
> >> seemed to
> >>>> need a vote in the near future.  The prospect of a vote seems to have
> >>>> brought some healthy debate forward too, which is great, but I
> >> apologise if
> >>>> this somehow came across as presumptuous.
> >>>>
> >>>>
> >>>>> On 22 Nov 2018, at 09:26, Sylvain Lebresne <lebresne@gmail.com
> <ma...@gmail.com>> wrote:
> >>>>>
> >>>>> I'm not saying "let's not do this no matter what and ever fix
> technical
> >>>>> debt", nor am I fearing decision.
> >>>>>
> >>>>> But I *do* think decisions, technical ones at least, should be fact
> and
> >>>>> data driven. And I'm not even sure why we're talking of having a vote
> >>>> here.
> >>>>> The Apache Way is *not* meant to be primarily vote-driven, votes are
> >>>>> supposed to be a last resort when, after having debated facts and
> data,
> >>>> no
> >>>>> consensus can be reached. Can we have the debate on facts and data
> >> first?
> >>>>> Please.
> >>>>>
> >>>>> At the of the day, I object to: "There are still a number of
> unresolved
> >>>>> issues, but to make progress I wonder if it would first be helpful to
> >>>> have
> >>>>> a vote on ensuring we are ANSI SQL 92 compliant for our arithmetic?".
> >>>> More
> >>>>> specifically, I disagree that such vote is a good starting point.
> Let's
> >>>>> identify and discuss the unresolved issues first. Let's check
> precisely
> >>>>> what getting our arithmetic ANSI SQL 92 compliant means and how we
> can
> >>>> get
> >>>>> it. I do support the idea of making such analysis btw, it would be
> good
> >>>>> data, but no vote is needed whatsoever to make it. Again, I object to
> >>>>> voting first and doing the analysis 2nd.
> >>>>>
> >>>>> --
> >>>>> Sylvain
> >>>>>
> >>>>>
> >>>>> On Thu, Nov 22, 2018 at 1:25 AM Jonathan Haddad <jon@jonhaddad.com
> <ma...@jonhaddad.com>>
> >>>> wrote:
> >>>>>
> >>>>>> I can’t agree more. We should be able to make changes in a manner
> that
> >>>>>> improves the DB In the long term, rather than live with the
> technical
> >>>> debt
> >>>>>> of arbitrary decisions made by a handful of people.
> >>>>>>
> >>>>>> I also agree that putting a knob in place to let people migrate over
> >> is
> >>>> a
> >>>>>> reasonable decision.
> >>>>>>
> >>>>>> Jon
> >>>>>>
> >>>>>> On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith <
> >>>>>> benedict@apache.org <ma...@apache.org>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> The goal is simply to agree on a set of well-defined principles for
> >> how
> >>>>>> we
> >>>>>>> should behave.  If we don’t like the implications that arise, we’ll
> >>>> have
> >>>>>>> another vote?  A democracy cannot bind itself, so I never
> understood
> >>>> this
> >>>>>>> fear of a decision.
> >>>>>>>
> >>>>>>> A database also has a thousand toggles.  If we absolutely need to,
> we
> >>>> can
> >>>>>>> introduce one more.
> >>>>>>>
> >>>>>>> We should be doing this upfront a great deal more often.  Doing it
> >>>>>>> retrospectively sucks, but in my opinion it's a bad reason to bind
> >>>>>>> ourselves to whatever made it in.
> >>>>>>>
> >>>>>>> Do we anywhere define the principles of our current behaviour?  I
> >>>>>> couldn’t
> >>>>>>> find it.
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 21 Nov 2018, at 21:08, Sylvain Lebresne <lebresne@gmail.com
> <ma...@gmail.com>>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>> On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <
> >>>>>>> benedict@apache.org <ma...@apache.org>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> FWIW, my meaning of arithmetic in this context extends to any
> >>>> features
> >>>>>>> we
> >>>>>>>>> have already released (such as aggregates, and perhaps other
> >> built-in
> >>>>>>>>> functions) that operate on the same domain.  We should be
> >> consistent,
> >>>>>>> after
> >>>>>>>>> all.
> >>>>>>>>>
> >>>>>>>>> Whether or not we need to revisit any existing functionality we
> can
> >>>>>>> figure
> >>>>>>>>> out after the fact, once we have agreed what our behaviour should
> >> be.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> I'm not sure I correctly understand the process suggested, but I
> >> don't
> >>>>>>>> particularly like/agree with what I understand. What I understand
> >> is a
> >>>>>>>> suggestion for voting on agreeing to be ANSI SQL 92 compliant,
> with
> >> no
> >>>>>>> real
> >>>>>>>> evaluation of what that entails (at least I haven't seen one), and
> >>>> that
> >>>>>>>> this vote, if passed, would imply we'd then make any backward
> >>>>>>> incompatible
> >>>>>>>> change necessary to achieve compliance ("my meaning of arithmetic
> in
> >>>>>> this
> >>>>>>>> context extends to any features we have already released" and
> >> "Whether
> >>>>>> or
> >>>>>>>> not we need to revisit any existing functionality we can figure
> out
> >>>>>> after
> >>>>>>>> the fact, once we have agreed what our behaviour should be").
> >>>>>>>>
> >>>>>>>> This might make sense of a new product, but at our stage that
> seems
> >>>>>>>> backward to me. I think we owe our users to first make the effort
> of
> >>>>>>>> identifying what "inconsistencies" our existing arithmetic has[1]
> >> and
> >>>>>>>> _then_ consider what options we have to fix those, with their pros
> >> and
> >>>>>>> cons
> >>>>>>>> (including how bad they break backward compatibility). And if
> _then_
> >>>>>>>> getting ANSI SQL 92 compliant proves to not be disruptive (or at
> >> least
> >>>>>>>> acceptably so), then sure, that's great.
> >>>>>>>>
> >>>>>>>> [1]: one possibly efficient way to do that could actually be to
> >>>> compare
> >>>>>>> our
> >>>>>>>> arithmetic to ANSI SQL 92. Not that all differences found would
> >> imply
> >>>>>>>> inconsistencies/wrongness of our arithmetic, but still, it should
> be
> >>>>>>>> helpful. And I guess my whole point is that we should that
> analysis
> >>>>>>> first,
> >>>>>>>> and then maybe decide that being ANSI SQL 92 is a reasonable
> option,
> >>>>>> not
> >>>>>>>> decide first and live with the consequences no matter what they
> are.
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Sylvain
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> I will make this more explicit for the vote, but just to clarify
> >> the
> >>>>>>>>> intention so that we are all discussing the same thing.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On 20 Nov 2018, at 14:18, Ariel Weisberg <adweisbe@fastmail.fm
> <ma...@fastmail.fm>>
> >>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> +1
> >>>>>>>>>>
> >>>>>>>>>> This is a public API so we will be much better off if we get it
> >>>> right
> >>>>>>>>> the first time.
> >>>>>>>>>>
> >>>>>>>>>> Ariel
> >>>>>>>>>>
> >>>>>>>>>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <
> jon@jonhaddad.com <ma...@jonhaddad.com>
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Sounds good to me.
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
> >>>>>>>>> benedict@apache.org <ma...@apache.org>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> So, this thread somewhat petered out.
> >>>>>>>>>>>>
> >>>>>>>>>>>> There are still a number of unresolved issues, but to make
> >>>>>> progress I
> >>>>>>>>>>>> wonder if it would first be helpful to have a vote on ensuring
> >> we
> >>>>>> are
> >>>>>>>>> ANSI
> >>>>>>>>>>>> SQL 92 compliant for our arithmetic?  This seems like a
> sensible
> >>>>>>>>> baseline,
> >>>>>>>>>>>> since we will hopefully minimise surprise to operators this
> way.
> >>>>>>>>>>>>
> >>>>>>>>>>>> If people largely agree, I will call a vote, and we can pick
> up
> >> a
> >>>>>>>>> couple
> >>>>>>>>>>>> of more focused discussions afterwards on how we interpret the
> >>>>>> leeway
> >>>>>>>>> it
> >>>>>>>>>>>> gives.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ariel@weisberg.ws
> <ma...@weisberg.ws>>
> >>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> From reading the spec. Precision is always implementation
> >>>> defined.
> >>>>>>> The
> >>>>>>>>>>>> spec specifies scale in several cases, but never precision for
> >> any
> >>>>>>>>> type or
> >>>>>>>>>>>> operation (addition/subtraction, multiplication, division).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> So we don't implement anything remotely approaching precision
> >> and
> >>>>>>>>> scale
> >>>>>>>>>>>> in CQL when it comes to numbers I think? So we aren't going to
> >>>>>> follow
> >>>>>>>>> the
> >>>>>>>>>>>> spec for scale. We are already pretty far down that road so I
> >>>> would
> >>>>>>>>> leave
> >>>>>>>>>>>> it alone.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I don't think the spec is asking for the most approximate
> type.
> >>>>>> It's
> >>>>>>>>>>>> just saying the result is approximate, and the precision is
> >>>>>>>>> implementation
> >>>>>>>>>>>> defined. We could return either float or double. I think if
> one
> >> of
> >>>>>>> the
> >>>>>>>>>>>> operands is a double we should return a double because clearly
> >> the
> >>>>>>>>> schema
> >>>>>>>>>>>> thought a double was required to represent that number. I
> would
> >>>>>> also
> >>>>>>>>> be in
> >>>>>>>>>>>> favor of returning a double all the time so that people can
> >> expect
> >>>>>> a
> >>>>>>>>>>>> consistent type from expressions involving approximate
> numbers.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I am a big fan of widening for arithmetic expressions in a
> >>>>>> database
> >>>>>>> to
> >>>>>>>>>>>> avoid having to error on overflow. You can go to the trouble
> of
> >>>>>> only
> >>>>>>>>>>>> widening the minimum amount, but I think it's simpler if we
> >> always
> >>>>>>>>> widen to
> >>>>>>>>>>>> bigint and double. This would be something the spec allows.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Definitely if we can make overflow not occur we should and
> the
> >>>>>> spec
> >>>>>>>>>>>> allows that. We should also not return different types for the
> >>>> same
> >>>>>>>>> operand
> >>>>>>>>>>>> types just to work around overflow if we detect we need more
> >>>>>>> precision.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Ariel
> >>>>>>>>>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith
> >> wrote:
> >>>>>>>>>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for
> >>>>>> digging
> >>>>>>>>> this
> >>>>>>>>>>>>>> out (and Mike for getting some empirical examples).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> We still have to decide on the approximate data type to
> >> return;
> >>>>>>> right
> >>>>>>>>>>>>>> now, we have float+bigint=double, but float+int=float.  I
> >> think
> >>>>>>> this
> >>>>>>>>> is
> >>>>>>>>>>>>>> fairly inconsistent, and either the approximate type should
> >>>>>> always
> >>>>>>>>> win,
> >>>>>>>>>>>>>> or we should always upgrade to double for mixed operands.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The quoted spec also suggests that decimal+float=float, and
> >>>>>> decimal
> >>>>>>>>>>>>>> +double=double, whereas we currently have
> >> decimal+float=decimal,
> >>>>>>> and
> >>>>>>>>>>>>>> decimal+double=decimal
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> If we’re going to go with an approximate operand implying an
> >>>>>>>>>>>> approximate
> >>>>>>>>>>>>>> result, I think we should do it consistently (and consistent
> >>>> with
> >>>>>>> the
> >>>>>>>>>>>>>> SQL92 spec), and have the type of the approximate operand
> >> always
> >>>>>> be
> >>>>>>>>> the
> >>>>>>>>>>>>>> return type.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This would still leave a decision for float+double, though.
> >> The
> >>>>>>> most
> >>>>>>>>>>>>>> consistent behaviour with that stated above would be to
> always
> >>>>>> take
> >>>>>>>>> the
> >>>>>>>>>>>>>> most approximate type to return (i.e. float), but this would
> >>>> seem
> >>>>>>> to
> >>>>>>>>> me
> >>>>>>>>>>>>>> to be fairly unexpected for the user.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <
> ariel@weisberg.ws <ma...@weisberg.ws>>
> >>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I agree with what's been said about expectations regarding
> >>>>>>>>> expressions
> >>>>>>>>>>>> involving floating point numbers. I think that if one of the
> >>>> inputs
> >>>>>>> is
> >>>>>>>>>>>> approximate then the result should be approximate.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> One thing we could look at for inspiration is the SQL spec.
> >> Not
> >>>>>> to
> >>>>>>>>>>>> follow dogmatically necessarily.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> From the SQL 92 spec regarding assignment
> >>>>>>>>>>>>
> >>>>
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.contrib.andrew.cmu.edu_-7Eshadow_sql_sql1992.txt&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=2dMzYnFvO5Wf7J74IbDE27vxjfOX2xYT4-u7MEXUqHg&e=
> <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.contrib.andrew.cmu.edu_-7Eshadow_sql_sql1992.txt&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=2dMzYnFvO5Wf7J74IbDE27vxjfOX2xYT4-u7MEXUqHg&e=
> >
> >> <
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.contrib.andrew.cmu.edu_-7Eshadow_sql_sql1992.txt&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=2dMzYnFvO5Wf7J74IbDE27vxjfOX2xYT4-u7MEXUqHg&e=
> <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.contrib.andrew.cmu.edu_-7Eshadow_sql_sql1992.txt&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=2dMzYnFvO5Wf7J74IbDE27vxjfOX2xYT4-u7MEXUqHg&e=
> >
> >>>
> >>>> section
> >>>>>>> 4.6:
> >>>>>>>>>>>>>>> "
> >>>>>>>>>>>>>>>  Values of the data types NUMERIC, DECIMAL, INTEGER,
> >>>>>> SMALLINT,
> >>>>>>>>>>>>>>>  FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
> >>>>>>>>>>>> mutually
> >>>>>>>>>>>>>>>  comparable and mutually assignable. If an assignment would
> >>>>>>>>>>>> result
> >>>>>>>>>>>>>>>  in a loss of the most significant digits, an exception
> >>>>>>>>> condition
> >>>>>>>>>>>>>>>  is raised. If least significant digits are lost,
> >>>>>>>>> implementation-
> >>>>>>>>>>>>>>>  defined rounding or truncating occurs with no exception
> >>>>>>>>>>>> condition
> >>>>>>>>>>>>>>>  being raised. The rules for arithmetic are generally
> >>>>>> governed
> >>>>>>>>> by
> >>>>>>>>>>>>>>>  Subclause 6.12, "<numeric value expression>".
> >>>>>>>>>>>>>>> "
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Section 6.12 numeric value expressions:
> >>>>>>>>>>>>>>> "
> >>>>>>>>>>>>>>>  1) If the data type of both operands of a dyadic
> arithmetic
> >>>>>>>>>>>> opera-
> >>>>>>>>>>>>>>>     tor is exact numeric, then the data type of the result
> >> is
> >>>>>>>>>>>> exact
> >>>>>>>>>>>>>>>     numeric, with precision and scale determined as
> follows:
> >>>>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>>  2) If the data type of either operand of a dyadic
> >> arithmetic
> >>>>>>>>> op-
> >>>>>>>>>>>>>>>     erator is approximate numeric, then the data type of
> the
> >>>>>>> re-
> >>>>>>>>>>>>>>>     sult is approximate numeric. The precision of the
> result
> >>>>>> is
> >>>>>>>>>>>>>>>     implementation-defined.
> >>>>>>>>>>>>>>> "
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> And this makes sense to me. I think we should only return
> an
> >>>>>> exact
> >>>>>>>>>>>> result if both of the inputs are exact.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I think we might want to look closely at the SQL spec and
> >>>>>>> especially
> >>>>>>>>>>>> when the spec requires an error to be generated. Those are
> >>>>>> sometimes
> >>>>>>>>> in the
> >>>>>>>>>>>> spec to prevent subtle paths to wrong answers. Any time we
> >> deviate
> >>>>>>>>> from the
> >>>>>>>>>>>> spec we should be asking why is it in the spec and why are we
> >>>>>>>>> deviating.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Another issue besides overflow handling is how we determine
> >>>>>>>>> precision
> >>>>>>>>>>>> and scale for expressions involving two exact types.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Ariel
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> >>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I'm not sure if I would prefer the Postgres way of doing
> >>>>>> things,
> >>>>>>>>>>>> which is
> >>>>>>>>>>>>>>>> returning just about any type depending on the order of
> >>>>>>> operators.
> >>>>>>>>>>>>>>>> Considering it actually mentions in the docs that using
> >>>>>>>>>>>> numeric/decimal is
> >>>>>>>>>>>>>>>> slow and also multiple times that floating points are
> >> inexact.
> >>>>>> So
> >>>>>>>>>>>> doing
> >>>>>>>>>>>>>>>> some math with Postgres (9.6.5):
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647> <tel:2147483647
> <tel:2147483647>>::bigint*1.0::double
> >> precision returns double
> >>>>>>>>>>>>>>>> precision 2147483647 <tel:2147483647> <tel:2147483647
> <tel:2147483647>>
> >>>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647> <tel:2147483647
> <tel:2147483647>>::bigint*1.0 returns
> >> numeric 2147483647.0 <tel:2147483647.0> <tel:2147483647.0
> <tel:2147483647.0>>
> >>>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647> <tel:2147483647
> <tel:2147483647>>::bigint*1.0::real
> >> returns double
> >>>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647> <tel:2147483647
> <tel:2147483647>>::double
> >> precision*1::bigint returns double
> >>>>>>>>>>>> 2147483647 <tel:2147483647> <tel:2147483647 <tel:2147483647>>
> >>>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647> <tel:2147483647
> <tel:2147483647>>::double
> >> precision*1.0::bigint returns double
> >>>>>>>>>>>> 2147483647 <tel:2147483647> <tel:2147483647 <tel:2147483647>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> With + - we can get the same amount of mixture of returned
> >>>>>> types.
> >>>>>>>>>>>> There's
> >>>>>>>>>>>>>>>> no difference in those calculations, just some casting. To
> >> me
> >>>>>>>>>>>>>>>> floating-point math indicates inexactness and has errors
> and
> >>>>>>>>> whoever
> >>>>>>>>>>>> mixes
> >>>>>>>>>>>>>>>> up two different types should understand that. If one
> didn't
> >>>>>> want
> >>>>>>>>>>>> exact
> >>>>>>>>>>>>>>>> numeric type, why would the server return such? The
> floating
> >>>>>>> point
> >>>>>>>>>>>> value
> >>>>>>>>>>>>>>>> itself could be wrong already before the calculation -
> >> trying
> >>>>>> to
> >>>>>>>>> say
> >>>>>>>>>>>> we do
> >>>>>>>>>>>>>>>> it lossless is just wrong.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Fun with 2.65:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
> >>>>>>>>>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> SELECT round(2.65) returns numeric 4
> >>>>>>>>>>>>>>>> SELECT round(2.65::double precision) returns double 4
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> SELECT 2.65 * 1 returns double 2.65
> >>>>>>>>>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
> >>>>>>>>>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
> >>>>>>>>>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
> >>>>>>>>>>>>>>>> SELECT round(2.65) * round(1) returns double 3
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> So as we're going to have silly values in any case, why
> >>>> pretend
> >>>>>>>>>>>> something
> >>>>>>>>>>>>>>>> else? Also, exact calculations are slow if we crunch large
> >>>>>> amount
> >>>>>>>>> of
> >>>>>>>>>>>>>>>> numbers. I guess I slightly deviated towards Postgres'
> >>>>>>> implemention
> >>>>>>>>>>>> in this
> >>>>>>>>>>>>>>>> case, but I wish it wasn't used as a benchmark in this
> case.
> >>>>>> And
> >>>>>>>>> most
> >>>>>>>>>>>>>>>> importantly, I would definitely want the exact same type
> >>>>>> returned
> >>>>>>>>>>>> each time
> >>>>>>>>>>>>>>>> I do a calculation.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> - Micke
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
> >>>>>>>>>>>> benedict@apache.org <ma...@apache.org> <mailto:
> benedict@apache.org <ma...@apache.org>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> As far as I can tell we reached a relatively strong
> >> consensus
> >>>>>>>>> that we
> >>>>>>>>>>>>>>>>> should implement lossless casts by default?  Does anyone
> >> have
> >>>>>>>>>>>> anything more
> >>>>>>>>>>>>>>>>> to add?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Looking at the emails, everyone who participated and
> >>>>>> expressed a
> >>>>>>>>>>>>>>>>> preference was in favour of the “Postgres approach” of
> >>>>>> upcasting
> >>>>>>>>> to
> >>>>>>>>>>>> decimal
> >>>>>>>>>>>>>>>>> for mixed float/int operands?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I’d like to get a clear-cut decision on this, so we know
> >> what
> >>>>>>>>> we’re
> >>>>>>>>>>>> doing
> >>>>>>>>>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective
> >>>>>> decision
> >>>>>>>>> on
> >>>>>>>>>>>> Ariel’s
> >>>>>>>>>>>>>>>>> concerns about overflow, which I think are also pressing
> -
> >>>>>>>>>>>> particularly for
> >>>>>>>>>>>>>>>>> tinyint and smallint.  This does also impact implicit
> casts
> >>>>>> for
> >>>>>>>>> mixed
> >>>>>>>>>>>>>>>>> integer type operations, but an approach for these will
> >>>>>> probably
> >>>>>>>>>>>> fall out
> >>>>>>>>>>>>>>>>> of any decision on overflow.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
> >>>>>>>>>>>> murukesh.mohanan@gmail.com <ma...@gmail.com>
> <mailto:murukesh.mohanan@gmail.com <ma...@gmail.com>>>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I think you're conflating two things here. There's the
> >> loss
> >>>>>>>>>>>> resulting
> >>>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>> using some operators, and loss involved in casting.
> >> Dividing
> >>>>>> an
> >>>>>>>>>>>> integer
> >>>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>> another integer to obtain an integer result can result
> in
> >>>>>> loss,
> >>>>>>>>> but
> >>>>>>>>>>>>>>>>> there's
> >>>>>>>>>>>>>>>>>> no implicit casting there and no loss due to casting.
> >>>>>> Casting
> >>>>>>> an
> >>>>>>>>>>>> integer
> >>>>>>>>>>>>>>>>>> to a float can also result in loss. So dividing an
> integer
> >>>>>> by a
> >>>>>>>>>>>> float,
> >>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>> example, with an implicit cast has an additional avenue
> >> for
> >>>>>>> loss:
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> implicit cast for the operands so that they're of the
> same
> >>>>>>> type.
> >>>>>>>>> I
> >>>>>>>>>>>>>>>>> believe
> >>>>>>>>>>>>>>>>>> this discussion so far has been about the latter, not
> the
> >>>>>> loss
> >>>>>>>>> from
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> operations themselves.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
> >>>>>>>>>>>> benjamin.lerer@datastax.com <mailto:
> benjamin.lerer@datastax.com> <mailto:benjamin.lerer@datastax.com <mailto:
> benjamin.lerer@datastax.com>
> >>>>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I would like to try to clarify things a bit to help
> >> people
> >>>>>> to
> >>>>>>>>>>>> understand
> >>>>>>>>>>>>>>>>>>> the true complexity of the problem.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> The *float *and *double *types are inexact numeric
> types.
> >>>>>> Not
> >>>>>>>>> only
> >>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> operation level.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> If you insert 676543.21 in a *float* column and then
> read
> >>>>>> it,
> >>>>>>>>> you
> >>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>> realize that the value has been truncated to 676543.2.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> If you want accuracy the only way is to avoid those
> >> inexact
> >>>>>>>>> types.
> >>>>>>>>>>>>>>>>>>> Using *decimals
> >>>>>>>>>>>>>>>>>>> *during operations will mitigate the problem but will
> not
> >>>>>>> remove
> >>>>>>>>>>>> it.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I
> >> am
> >>>>>> not
> >>>>>>>>>>>> mistaken
> >>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is
> similar
> >>>> to
> >>>>>>>>> what
> >>>>>>>>>>>> MS SQL
> >>>>>>>>>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
> >>>>>>>>> precision
> >>>>>>>>>>>> if you
> >>>>>>>>>>>>>>>>>>> are not carefull.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> If you truly need precision you can have it by using
> >> exact
> >>>>>>>>> numeric
> >>>>>>>>>>>> types
> >>>>>>>>>>>>>>>>>>> for your data types. Of course it has a cost on
> >>>> performance,
> >>>>>>>>>>>> memory and
> >>>>>>>>>>>>>>>>>>> disk usage.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> The advantage of the current approach is that it give
> you
> >>>>>> the
> >>>>>>>>>>>> choice.
> >>>>>>>>>>>>>>>>> It is
> >>>>>>>>>>>>>>>>>>> up to you to decide what you need for your application.
> >> It
> >>>>>> is
> >>>>>>>>> also
> >>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>> line
> >>>>>>>>>>>>>>>>>>> with the way CQL behave everywhere else.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Muru
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>>>>>>>>>>> To unsubscribe, e-mail:
> >> dev-unsubscribe@cassandra.apache.org <mailto:
> dev-unsubscribe@cassandra.apache.org> <mailto:
> >> dev-unsubscribe@cassandra.apache.org <mailto:
> dev-unsubscribe@cassandra.apache.org>>
> >>>>>>>>>>>>>>>>> For additional commands, e-mail:
> >>>>>> dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
> <mailto:dev-help@cassandra.apache.org <mailto:
> dev-help@cassandra.apache.org>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>>>>>>>>> To unsubscribe, e-mail:
> dev-unsubscribe@cassandra.apache.org <mailto:
> dev-unsubscribe@cassandra.apache.org>
> >> <mailto:dev-unsubscribe@cassandra.apache.org <mailto:
> dev-unsubscribe@cassandra.apache.org>>
> >>>>>>>>>>>>>>> For additional commands, e-mail:
> >> dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
> <mailto:dev-help@cassandra.apache.org <mailto:
> dev-help@cassandra.apache.org>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>>>>>>>>> To unsubscribe, e-mail:
> dev-unsubscribe@cassandra.apache.org <mailto:
> dev-unsubscribe@cassandra.apache.org>
> >> <mailto:dev-unsubscribe@cassandra.apache.org <mailto:
> dev-unsubscribe@cassandra.apache.org>>
> >>>>>>>>>>>>>> For additional commands, e-mail:
> >> dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
> <mailto:dev-help@cassandra.apache.org <mailto:
> dev-help@cassandra.apache.org>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >> <mailto:dev-unsubscribe@cassandra.apache.org <mailto:
> dev-unsubscribe@cassandra.apache.org>>
> >>>>>>>>>>>>> For additional commands, e-mail:
> dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
> >> <mailto:dev-help@cassandra.apache.org <mailto:
> dev-help@cassandra.apache.org>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >> <mailto:dev-unsubscribe@cassandra.apache.org <mailto:
> dev-unsubscribe@cassandra.apache.org>>
> >>>>>>>>>>>> For additional commands, e-mail:
> dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
> >> <mailto:dev-help@cassandra.apache.org <mailto:
> dev-help@cassandra.apache.org>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>> Jon Haddad
> >>>>>>>>>>>
> >>>>
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
> <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
> >
> >> <
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
> <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
> >
> >>>
> >>>>>>>>>>> twitter: rustyrazorblade
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>> ---------------------------------------------------------------------
> >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >> <mailto:dev-unsubscribe@cassandra.apache.org <mailto:
> dev-unsubscribe@cassandra.apache.org>>
> >>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >> <mailto:dev-help@cassandra.apache.org <mailto:
> dev-help@cassandra.apache.org>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >> <mailto:dev-unsubscribe@cassandra.apache.org <mailto:
> dev-unsubscribe@cassandra.apache.org>>
> >>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >> <mailto:dev-help@cassandra.apache.org <mailto:
> dev-help@cassandra.apache.org>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>> Jon Haddad
> >>>>>>
> >>>>
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
> <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
> >
> >> <
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
> <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
> >
> >>>
> >>>>>> twitter: rustyrazorblade
> >>>>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <mailto:
> dev-unsubscribe@cassandra.apache.org> <mailto:
> >> dev-unsubscribe@cassandra.apache.org <mailto:
> dev-unsubscribe@cassandra.apache.org>>
> >>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> <ma...@cassandra.apache.org> <mailto:
> >> dev-help@cassandra.apache.org <ma...@cassandra.apache.org>>
>
>

Re: Implicit Casts for Arithmetic Operators

Posted by Benedict Elliott Smith <be...@apache.org>.

Yes.

> On 22 Nov 2018, at 11:32, Benjamin Lerer <be...@datastax.com> wrote:
> 
> Then I would be interested in knowing `where we should be`. If the answer
> is `ANSI SQL92` then my question is: Why? Simply for the sake of following
> a standard?
> 
> 
> On Thu, Nov 22, 2018 at 12:19 PM Benedict Elliott Smith <benedict@apache.org <ma...@apache.org>>
> wrote:
> 
>> As I say, for me this is explicitly unhelpful, so I have no intention of
>> producing it (though, of course, I cannot prevent you from producing it)
>> 
>> For me, the correct approach is to decide where we should be, and then
>> figure out how to get there.  Where we are has no bearing on where we
>> should be, in my view.
>> 
>> 
>> 
>>> On 22 Nov 2018, at 11:12, Benjamin Lerer <be...@datastax.com>
>> wrote:
>>> 
>>> I would also like to see an analysis of what being ANSI SQL 92 compliant
>>> means in term of change of behavior (for arithmetics and *any features we
>>> have already released*).
>>> Simply because without it, I find the decision pretty hard to make.
>>> 
>>> On Thu, Nov 22, 2018 at 11:51 AM Benedict Elliott Smith <
>> benedict@apache.org <ma...@apache.org> <mailto:benedict@apache.org <ma...@apache.org>>>
>>> wrote:
>>> 
>>>> We’re not presently voting*; we’re only discussing, whether we should
>> base
>>>> our behaviour on a widely agreed upon standard.
>>>> 
>>>> I think perhaps the nub of our disagreement is that, in my view, this is
>>>> the only relevant fact to decide.  There is no data to base this
>> decision
>>>> upon.  It’s axiomatic, or ideological; procedural, not technical:  Do we
>>>> think we should try to hew to standards (where appropriate), or do we
>> think
>>>> we should stick with what we arrived at in an adhoc manner?
>>>> 
>>>> If we believe the former, as I now do, then the current state is only
>>>> relevant when we come to implement the decision.
>>>> 
>>>> 
>>>> * But given how peripheral and inherently ideological this decision is,
>>>> and how meandering the discussion was with no clear consensus, it
>> seemed to
>>>> need a vote in the near future.  The prospect of a vote seems to have
>>>> brought some healthy debate forward too, which is great, but I
>> apologise if
>>>> this somehow came across as presumptuous.
>>>> 
>>>> 
>>>>> On 22 Nov 2018, at 09:26, Sylvain Lebresne <lebresne@gmail.com <ma...@gmail.com>> wrote:
>>>>> 
>>>>> I'm not saying "let's not do this no matter what and ever fix technical
>>>>> debt", nor am I fearing decision.
>>>>> 
>>>>> But I *do* think decisions, technical ones at least, should be fact and
>>>>> data driven. And I'm not even sure why we're talking of having a vote
>>>> here.
>>>>> The Apache Way is *not* meant to be primarily vote-driven, votes are
>>>>> supposed to be a last resort when, after having debated facts and data,
>>>> no
>>>>> consensus can be reached. Can we have the debate on facts and data
>> first?
>>>>> Please.
>>>>> 
>>>>> At the of the day, I object to: "There are still a number of unresolved
>>>>> issues, but to make progress I wonder if it would first be helpful to
>>>> have
>>>>> a vote on ensuring we are ANSI SQL 92 compliant for our arithmetic?".
>>>> More
>>>>> specifically, I disagree that such vote is a good starting point. Let's
>>>>> identify and discuss the unresolved issues first. Let's check precisely
>>>>> what getting our arithmetic ANSI SQL 92 compliant means and how we can
>>>> get
>>>>> it. I do support the idea of making such analysis btw, it would be good
>>>>> data, but no vote is needed whatsoever to make it. Again, I object to
>>>>> voting first and doing the analysis 2nd.
>>>>> 
>>>>> --
>>>>> Sylvain
>>>>> 
>>>>> 
>>>>> On Thu, Nov 22, 2018 at 1:25 AM Jonathan Haddad <jon@jonhaddad.com <ma...@jonhaddad.com>>
>>>> wrote:
>>>>> 
>>>>>> I can’t agree more. We should be able to make changes in a manner that
>>>>>> improves the DB In the long term, rather than live with the technical
>>>> debt
>>>>>> of arbitrary decisions made by a handful of people.
>>>>>> 
>>>>>> I also agree that putting a knob in place to let people migrate over
>> is
>>>> a
>>>>>> reasonable decision.
>>>>>> 
>>>>>> Jon
>>>>>> 
>>>>>> On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith <
>>>>>> benedict@apache.org <ma...@apache.org>>
>>>>>> wrote:
>>>>>> 
>>>>>>> The goal is simply to agree on a set of well-defined principles for
>> how
>>>>>> we
>>>>>>> should behave.  If we don’t like the implications that arise, we’ll
>>>> have
>>>>>>> another vote?  A democracy cannot bind itself, so I never understood
>>>> this
>>>>>>> fear of a decision.
>>>>>>> 
>>>>>>> A database also has a thousand toggles.  If we absolutely need to, we
>>>> can
>>>>>>> introduce one more.
>>>>>>> 
>>>>>>> We should be doing this upfront a great deal more often.  Doing it
>>>>>>> retrospectively sucks, but in my opinion it's a bad reason to bind
>>>>>>> ourselves to whatever made it in.
>>>>>>> 
>>>>>>> Do we anywhere define the principles of our current behaviour?  I
>>>>>> couldn’t
>>>>>>> find it.
>>>>>>> 
>>>>>>> 
>>>>>>>> On 21 Nov 2018, at 21:08, Sylvain Lebresne <lebresne@gmail.com <ma...@gmail.com>>
>>>> wrote:
>>>>>>>> 
>>>>>>>> On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <
>>>>>>> benedict@apache.org <ma...@apache.org>>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> FWIW, my meaning of arithmetic in this context extends to any
>>>> features
>>>>>>> we
>>>>>>>>> have already released (such as aggregates, and perhaps other
>> built-in
>>>>>>>>> functions) that operate on the same domain.  We should be
>> consistent,
>>>>>>> after
>>>>>>>>> all.
>>>>>>>>> 
>>>>>>>>> Whether or not we need to revisit any existing functionality we can
>>>>>>> figure
>>>>>>>>> out after the fact, once we have agreed what our behaviour should
>> be.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> I'm not sure I correctly understand the process suggested, but I
>> don't
>>>>>>>> particularly like/agree with what I understand. What I understand
>> is a
>>>>>>>> suggestion for voting on agreeing to be ANSI SQL 92 compliant, with
>> no
>>>>>>> real
>>>>>>>> evaluation of what that entails (at least I haven't seen one), and
>>>> that
>>>>>>>> this vote, if passed, would imply we'd then make any backward
>>>>>>> incompatible
>>>>>>>> change necessary to achieve compliance ("my meaning of arithmetic in
>>>>>> this
>>>>>>>> context extends to any features we have already released" and
>> "Whether
>>>>>> or
>>>>>>>> not we need to revisit any existing functionality we can figure out
>>>>>> after
>>>>>>>> the fact, once we have agreed what our behaviour should be").
>>>>>>>> 
>>>>>>>> This might make sense of a new product, but at our stage that seems
>>>>>>>> backward to me. I think we owe our users to first make the effort of
>>>>>>>> identifying what "inconsistencies" our existing arithmetic has[1]
>> and
>>>>>>>> _then_ consider what options we have to fix those, with their pros
>> and
>>>>>>> cons
>>>>>>>> (including how bad they break backward compatibility). And if _then_
>>>>>>>> getting ANSI SQL 92 compliant proves to not be disruptive (or at
>> least
>>>>>>>> acceptably so), then sure, that's great.
>>>>>>>> 
>>>>>>>> [1]: one possibly efficient way to do that could actually be to
>>>> compare
>>>>>>> our
>>>>>>>> arithmetic to ANSI SQL 92. Not that all differences found would
>> imply
>>>>>>>> inconsistencies/wrongness of our arithmetic, but still, it should be
>>>>>>>> helpful. And I guess my whole point is that we should that analysis
>>>>>>> first,
>>>>>>>> and then maybe decide that being ANSI SQL 92 is a reasonable option,
>>>>>> not
>>>>>>>> decide first and live with the consequences no matter what they are.
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Sylvain
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> I will make this more explicit for the vote, but just to clarify
>> the
>>>>>>>>> intention so that we are all discussing the same thing.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 20 Nov 2018, at 14:18, Ariel Weisberg <adweisbe@fastmail.fm <ma...@fastmail.fm>>
>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> +1
>>>>>>>>>> 
>>>>>>>>>> This is a public API so we will be much better off if we get it
>>>> right
>>>>>>>>> the first time.
>>>>>>>>>> 
>>>>>>>>>> Ariel
>>>>>>>>>> 
>>>>>>>>>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <jon@jonhaddad.com <ma...@jonhaddad.com>
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Sounds good to me.
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
>>>>>>>>> benedict@apache.org <ma...@apache.org>>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> So, this thread somewhat petered out.
>>>>>>>>>>>> 
>>>>>>>>>>>> There are still a number of unresolved issues, but to make
>>>>>> progress I
>>>>>>>>>>>> wonder if it would first be helpful to have a vote on ensuring
>> we
>>>>>> are
>>>>>>>>> ANSI
>>>>>>>>>>>> SQL 92 compliant for our arithmetic?  This seems like a sensible
>>>>>>>>> baseline,
>>>>>>>>>>>> since we will hopefully minimise surprise to operators this way.
>>>>>>>>>>>> 
>>>>>>>>>>>> If people largely agree, I will call a vote, and we can pick up
>> a
>>>>>>>>> couple
>>>>>>>>>>>> of more focused discussions afterwards on how we interpret the
>>>>>> leeway
>>>>>>>>> it
>>>>>>>>>>>> gives.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ariel@weisberg.ws <ma...@weisberg.ws>>
>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> From reading the spec. Precision is always implementation
>>>> defined.
>>>>>>> The
>>>>>>>>>>>> spec specifies scale in several cases, but never precision for
>> any
>>>>>>>>> type or
>>>>>>>>>>>> operation (addition/subtraction, multiplication, division).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So we don't implement anything remotely approaching precision
>> and
>>>>>>>>> scale
>>>>>>>>>>>> in CQL when it comes to numbers I think? So we aren't going to
>>>>>> follow
>>>>>>>>> the
>>>>>>>>>>>> spec for scale. We are already pretty far down that road so I
>>>> would
>>>>>>>>> leave
>>>>>>>>>>>> it alone.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I don't think the spec is asking for the most approximate type.
>>>>>> It's
>>>>>>>>>>>> just saying the result is approximate, and the precision is
>>>>>>>>> implementation
>>>>>>>>>>>> defined. We could return either float or double. I think if one
>> of
>>>>>>> the
>>>>>>>>>>>> operands is a double we should return a double because clearly
>> the
>>>>>>>>> schema
>>>>>>>>>>>> thought a double was required to represent that number. I would
>>>>>> also
>>>>>>>>> be in
>>>>>>>>>>>> favor of returning a double all the time so that people can
>> expect
>>>>>> a
>>>>>>>>>>>> consistent type from expressions involving approximate numbers.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I am a big fan of widening for arithmetic expressions in a
>>>>>> database
>>>>>>> to
>>>>>>>>>>>> avoid having to error on overflow. You can go to the trouble of
>>>>>> only
>>>>>>>>>>>> widening the minimum amount, but I think it's simpler if we
>> always
>>>>>>>>> widen to
>>>>>>>>>>>> bigint and double. This would be something the spec allows.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Definitely if we can make overflow not occur we should and the
>>>>>> spec
>>>>>>>>>>>> allows that. We should also not return different types for the
>>>> same
>>>>>>>>> operand
>>>>>>>>>>>> types just to work around overflow if we detect we need more
>>>>>>> precision.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Ariel
>>>>>>>>>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith
>> wrote:
>>>>>>>>>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for
>>>>>> digging
>>>>>>>>> this
>>>>>>>>>>>>>> out (and Mike for getting some empirical examples).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We still have to decide on the approximate data type to
>> return;
>>>>>>> right
>>>>>>>>>>>>>> now, we have float+bigint=double, but float+int=float.  I
>> think
>>>>>>> this
>>>>>>>>> is
>>>>>>>>>>>>>> fairly inconsistent, and either the approximate type should
>>>>>> always
>>>>>>>>> win,
>>>>>>>>>>>>>> or we should always upgrade to double for mixed operands.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The quoted spec also suggests that decimal+float=float, and
>>>>>> decimal
>>>>>>>>>>>>>> +double=double, whereas we currently have
>> decimal+float=decimal,
>>>>>>> and
>>>>>>>>>>>>>> decimal+double=decimal
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If we’re going to go with an approximate operand implying an
>>>>>>>>>>>> approximate
>>>>>>>>>>>>>> result, I think we should do it consistently (and consistent
>>>> with
>>>>>>> the
>>>>>>>>>>>>>> SQL92 spec), and have the type of the approximate operand
>> always
>>>>>> be
>>>>>>>>> the
>>>>>>>>>>>>>> return type.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This would still leave a decision for float+double, though.
>> The
>>>>>>> most
>>>>>>>>>>>>>> consistent behaviour with that stated above would be to always
>>>>>> take
>>>>>>>>> the
>>>>>>>>>>>>>> most approximate type to return (i.e. float), but this would
>>>> seem
>>>>>>> to
>>>>>>>>> me
>>>>>>>>>>>>>> to be fairly unexpected for the user.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ariel@weisberg.ws <ma...@weisberg.ws>>
>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I agree with what's been said about expectations regarding
>>>>>>>>> expressions
>>>>>>>>>>>> involving floating point numbers. I think that if one of the
>>>> inputs
>>>>>>> is
>>>>>>>>>>>> approximate then the result should be approximate.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> One thing we could look at for inspiration is the SQL spec.
>> Not
>>>>>> to
>>>>>>>>>>>> follow dogmatically necessarily.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> From the SQL 92 spec regarding assignment
>>>>>>>>>>>> 
>>>> 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.contrib.andrew.cmu.edu_-7Eshadow_sql_sql1992.txt&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=2dMzYnFvO5Wf7J74IbDE27vxjfOX2xYT4-u7MEXUqHg&e= <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.contrib.andrew.cmu.edu_-7Eshadow_sql_sql1992.txt&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=2dMzYnFvO5Wf7J74IbDE27vxjfOX2xYT4-u7MEXUqHg&e=>
>> <
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.contrib.andrew.cmu.edu_-7Eshadow_sql_sql1992.txt&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=2dMzYnFvO5Wf7J74IbDE27vxjfOX2xYT4-u7MEXUqHg&e= <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.contrib.andrew.cmu.edu_-7Eshadow_sql_sql1992.txt&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=2dMzYnFvO5Wf7J74IbDE27vxjfOX2xYT4-u7MEXUqHg&e=>
>>> 
>>>> section
>>>>>>> 4.6:
>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>  Values of the data types NUMERIC, DECIMAL, INTEGER,
>>>>>> SMALLINT,
>>>>>>>>>>>>>>>  FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
>>>>>>>>>>>> mutually
>>>>>>>>>>>>>>>  comparable and mutually assignable. If an assignment would
>>>>>>>>>>>> result
>>>>>>>>>>>>>>>  in a loss of the most significant digits, an exception
>>>>>>>>> condition
>>>>>>>>>>>>>>>  is raised. If least significant digits are lost,
>>>>>>>>> implementation-
>>>>>>>>>>>>>>>  defined rounding or truncating occurs with no exception
>>>>>>>>>>>> condition
>>>>>>>>>>>>>>>  being raised. The rules for arithmetic are generally
>>>>>> governed
>>>>>>>>> by
>>>>>>>>>>>>>>>  Subclause 6.12, "<numeric value expression>".
>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Section 6.12 numeric value expressions:
>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>  1) If the data type of both operands of a dyadic arithmetic
>>>>>>>>>>>> opera-
>>>>>>>>>>>>>>>     tor is exact numeric, then the data type of the result
>> is
>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>     numeric, with precision and scale determined as follows:
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>  2) If the data type of either operand of a dyadic
>> arithmetic
>>>>>>>>> op-
>>>>>>>>>>>>>>>     erator is approximate numeric, then the data type of the
>>>>>>> re-
>>>>>>>>>>>>>>>     sult is approximate numeric. The precision of the result
>>>>>> is
>>>>>>>>>>>>>>>     implementation-defined.
>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> And this makes sense to me. I think we should only return an
>>>>>> exact
>>>>>>>>>>>> result if both of the inputs are exact.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I think we might want to look closely at the SQL spec and
>>>>>>> especially
>>>>>>>>>>>> when the spec requires an error to be generated. Those are
>>>>>> sometimes
>>>>>>>>> in the
>>>>>>>>>>>> spec to prevent subtle paths to wrong answers. Any time we
>> deviate
>>>>>>>>> from the
>>>>>>>>>>>> spec we should be asking why is it in the spec and why are we
>>>>>>>>> deviating.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Another issue besides overflow handling is how we determine
>>>>>>>>> precision
>>>>>>>>>>>> and scale for expressions involving two exact types.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Ariel
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I'm not sure if I would prefer the Postgres way of doing
>>>>>> things,
>>>>>>>>>>>> which is
>>>>>>>>>>>>>>>> returning just about any type depending on the order of
>>>>>>> operators.
>>>>>>>>>>>>>>>> Considering it actually mentions in the docs that using
>>>>>>>>>>>> numeric/decimal is
>>>>>>>>>>>>>>>> slow and also multiple times that floating points are
>> inexact.
>>>>>> So
>>>>>>>>>>>> doing
>>>>>>>>>>>>>>>> some math with Postgres (9.6.5):
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647> <tel:2147483647 <tel:2147483647>>::bigint*1.0::double
>> precision returns double
>>>>>>>>>>>>>>>> precision 2147483647 <tel:2147483647> <tel:2147483647 <tel:2147483647>>
>>>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647> <tel:2147483647 <tel:2147483647>>::bigint*1.0 returns
>> numeric 2147483647.0 <tel:2147483647.0> <tel:2147483647.0 <tel:2147483647.0>>
>>>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647> <tel:2147483647 <tel:2147483647>>::bigint*1.0::real
>> returns double
>>>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647> <tel:2147483647 <tel:2147483647>>::double
>> precision*1::bigint returns double
>>>>>>>>>>>> 2147483647 <tel:2147483647> <tel:2147483647 <tel:2147483647>>
>>>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647> <tel:2147483647 <tel:2147483647>>::double
>> precision*1.0::bigint returns double
>>>>>>>>>>>> 2147483647 <tel:2147483647> <tel:2147483647 <tel:2147483647>>
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> With + - we can get the same amount of mixture of returned
>>>>>> types.
>>>>>>>>>>>> There's
>>>>>>>>>>>>>>>> no difference in those calculations, just some casting. To
>> me
>>>>>>>>>>>>>>>> floating-point math indicates inexactness and has errors and
>>>>>>>>> whoever
>>>>>>>>>>>> mixes
>>>>>>>>>>>>>>>> up two different types should understand that. If one didn't
>>>>>> want
>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>> numeric type, why would the server return such? The floating
>>>>>>> point
>>>>>>>>>>>> value
>>>>>>>>>>>>>>>> itself could be wrong already before the calculation -
>> trying
>>>>>> to
>>>>>>>>> say
>>>>>>>>>>>> we do
>>>>>>>>>>>>>>>> it lossless is just wrong.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Fun with 2.65:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
>>>>>>>>>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> SELECT round(2.65) returns numeric 4
>>>>>>>>>>>>>>>> SELECT round(2.65::double precision) returns double 4
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> SELECT 2.65 * 1 returns double 2.65
>>>>>>>>>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
>>>>>>>>>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
>>>>>>>>>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
>>>>>>>>>>>>>>>> SELECT round(2.65) * round(1) returns double 3
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> So as we're going to have silly values in any case, why
>>>> pretend
>>>>>>>>>>>> something
>>>>>>>>>>>>>>>> else? Also, exact calculations are slow if we crunch large
>>>>>> amount
>>>>>>>>> of
>>>>>>>>>>>>>>>> numbers. I guess I slightly deviated towards Postgres'
>>>>>>> implemention
>>>>>>>>>>>> in this
>>>>>>>>>>>>>>>> case, but I wish it wasn't used as a benchmark in this case.
>>>>>> And
>>>>>>>>> most
>>>>>>>>>>>>>>>> importantly, I would definitely want the exact same type
>>>>>> returned
>>>>>>>>>>>> each time
>>>>>>>>>>>>>>>> I do a calculation.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> - Micke
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
>>>>>>>>>>>> benedict@apache.org <ma...@apache.org> <mailto:benedict@apache.org <ma...@apache.org>>>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> As far as I can tell we reached a relatively strong
>> consensus
>>>>>>>>> that we
>>>>>>>>>>>>>>>>> should implement lossless casts by default?  Does anyone
>> have
>>>>>>>>>>>> anything more
>>>>>>>>>>>>>>>>> to add?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Looking at the emails, everyone who participated and
>>>>>> expressed a
>>>>>>>>>>>>>>>>> preference was in favour of the “Postgres approach” of
>>>>>> upcasting
>>>>>>>>> to
>>>>>>>>>>>> decimal
>>>>>>>>>>>>>>>>> for mixed float/int operands?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I’d like to get a clear-cut decision on this, so we know
>> what
>>>>>>>>> we’re
>>>>>>>>>>>> doing
>>>>>>>>>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective
>>>>>> decision
>>>>>>>>> on
>>>>>>>>>>>> Ariel’s
>>>>>>>>>>>>>>>>> concerns about overflow, which I think are also pressing -
>>>>>>>>>>>> particularly for
>>>>>>>>>>>>>>>>> tinyint and smallint.  This does also impact implicit casts
>>>>>> for
>>>>>>>>> mixed
>>>>>>>>>>>>>>>>> integer type operations, but an approach for these will
>>>>>> probably
>>>>>>>>>>>> fall out
>>>>>>>>>>>>>>>>> of any decision on overflow.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
>>>>>>>>>>>> murukesh.mohanan@gmail.com <ma...@gmail.com> <mailto:murukesh.mohanan@gmail.com <ma...@gmail.com>>>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I think you're conflating two things here. There's the
>> loss
>>>>>>>>>>>> resulting
>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>> using some operators, and loss involved in casting.
>> Dividing
>>>>>> an
>>>>>>>>>>>> integer
>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>> another integer to obtain an integer result can result in
>>>>>> loss,
>>>>>>>>> but
>>>>>>>>>>>>>>>>> there's
>>>>>>>>>>>>>>>>>> no implicit casting there and no loss due to casting.
>>>>>> Casting
>>>>>>> an
>>>>>>>>>>>> integer
>>>>>>>>>>>>>>>>>> to a float can also result in loss. So dividing an integer
>>>>>> by a
>>>>>>>>>>>> float,
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> example, with an implicit cast has an additional avenue
>> for
>>>>>>> loss:
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> implicit cast for the operands so that they're of the same
>>>>>>> type.
>>>>>>>>> I
>>>>>>>>>>>>>>>>> believe
>>>>>>>>>>>>>>>>>> this discussion so far has been about the latter, not the
>>>>>> loss
>>>>>>>>> from
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> operations themselves.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
>>>>>>>>>>>> benjamin.lerer@datastax.com <ma...@datastax.com> <mailto:benjamin.lerer@datastax.com <ma...@datastax.com>
>>>> 
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I would like to try to clarify things a bit to help
>> people
>>>>>> to
>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>>>> the true complexity of the problem.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> The *float *and *double *types are inexact numeric types.
>>>>>> Not
>>>>>>>>> only
>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> operation level.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> If you insert 676543.21 in a *float* column and then read
>>>>>> it,
>>>>>>>>> you
>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>> realize that the value has been truncated to 676543.2.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> If you want accuracy the only way is to avoid those
>> inexact
>>>>>>>>> types.
>>>>>>>>>>>>>>>>>>> Using *decimals
>>>>>>>>>>>>>>>>>>> *during operations will mitigate the problem but will not
>>>>>>> remove
>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I
>> am
>>>>>> not
>>>>>>>>>>>> mistaken
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar
>>>> to
>>>>>>>>> what
>>>>>>>>>>>> MS SQL
>>>>>>>>>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
>>>>>>>>> precision
>>>>>>>>>>>> if you
>>>>>>>>>>>>>>>>>>> are not carefull.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> If you truly need precision you can have it by using
>> exact
>>>>>>>>> numeric
>>>>>>>>>>>> types
>>>>>>>>>>>>>>>>>>> for your data types. Of course it has a cost on
>>>> performance,
>>>>>>>>>>>> memory and
>>>>>>>>>>>>>>>>>>> disk usage.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> The advantage of the current approach is that it give you
>>>>>> the
>>>>>>>>>>>> choice.
>>>>>>>>>>>>>>>>> It is
>>>>>>>>>>>>>>>>>>> up to you to decide what you need for your application.
>> It
>>>>>> is
>>>>>>>>> also
>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> line
>>>>>>>>>>>>>>>>>>> with the way CQL behave everywhere else.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Muru
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>> dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org> <mailto:
>> dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>>
>>>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>> dev-help@cassandra.apache.org <ma...@cassandra.apache.org> <mailto:dev-help@cassandra.apache.org <ma...@cassandra.apache.org>>
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>> <mailto:dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>>
>>>>>>>>>>>>>>> For additional commands, e-mail:
>> dev-help@cassandra.apache.org <ma...@cassandra.apache.org> <mailto:dev-help@cassandra.apache.org <ma...@cassandra.apache.org>>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>> <mailto:dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>>
>>>>>>>>>>>>>> For additional commands, e-mail:
>> dev-help@cassandra.apache.org <ma...@cassandra.apache.org> <mailto:dev-help@cassandra.apache.org <ma...@cassandra.apache.org>>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>> <mailto:dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>>
>>>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>> <mailto:dev-help@cassandra.apache.org <ma...@cassandra.apache.org>>
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>> <mailto:dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>>
>>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>> <mailto:dev-help@cassandra.apache.org <ma...@cassandra.apache.org>>
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>> Jon Haddad
>>>>>>>>>>> 
>>>> 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e= <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=>
>> <
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e= <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=>
>>> 
>>>>>>>>>>> twitter: rustyrazorblade
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>> <mailto:dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>>
>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>> <mailto:dev-help@cassandra.apache.org <ma...@cassandra.apache.org>>
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>> <mailto:dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>>
>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>> <mailto:dev-help@cassandra.apache.org <ma...@cassandra.apache.org>>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>> Jon Haddad
>>>>>> 
>>>> 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e= <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=>
>> <
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e= <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=>
>>> 
>>>>>> twitter: rustyrazorblade
>>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org> <mailto:
>> dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>>
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org> <mailto:
>> dev-help@cassandra.apache.org <ma...@cassandra.apache.org>>

Re: Implicit Casts for Arithmetic Operators

Posted by Benjamin Lerer <be...@datastax.com>.

Then I would be interested in knowing `where we should be`. If the answer
is `ANSI SQL92` then my question is: Why? Simply for the sake of following
a standard?


On Thu, Nov 22, 2018 at 12:19 PM Benedict Elliott Smith <be...@apache.org>
wrote:

> As I say, for me this is explicitly unhelpful, so I have no intention of
> producing it (though, of course, I cannot prevent you from producing it)
>
> For me, the correct approach is to decide where we should be, and then
> figure out how to get there.  Where we are has no bearing on where we
> should be, in my view.
>
>
>
> > On 22 Nov 2018, at 11:12, Benjamin Lerer <be...@datastax.com>
> wrote:
> >
> > I would also like to see an analysis of what being ANSI SQL 92 compliant
> > means in term of change of behavior (for arithmetics and *any features we
> > have already released*).
> > Simply because without it, I find the decision pretty hard to make.
> >
> > On Thu, Nov 22, 2018 at 11:51 AM Benedict Elliott Smith <
> benedict@apache.org <ma...@apache.org>>
> > wrote:
> >
> >> We’re not presently voting*; we’re only discussing, whether we should
> base
> >> our behaviour on a widely agreed upon standard.
> >>
> >> I think perhaps the nub of our disagreement is that, in my view, this is
> >> the only relevant fact to decide.  There is no data to base this
> decision
> >> upon.  It’s axiomatic, or ideological; procedural, not technical:  Do we
> >> think we should try to hew to standards (where appropriate), or do we
> think
> >> we should stick with what we arrived at in an adhoc manner?
> >>
> >> If we believe the former, as I now do, then the current state is only
> >> relevant when we come to implement the decision.
> >>
> >>
> >> * But given how peripheral and inherently ideological this decision is,
> >> and how meandering the discussion was with no clear consensus, it
> seemed to
> >> need a vote in the near future.  The prospect of a vote seems to have
> >> brought some healthy debate forward too, which is great, but I
> apologise if
> >> this somehow came across as presumptuous.
> >>
> >>
> >>> On 22 Nov 2018, at 09:26, Sylvain Lebresne <le...@gmail.com> wrote:
> >>>
> >>> I'm not saying "let's not do this no matter what and ever fix technical
> >>> debt", nor am I fearing decision.
> >>>
> >>> But I *do* think decisions, technical ones at least, should be fact and
> >>> data driven. And I'm not even sure why we're talking of having a vote
> >> here.
> >>> The Apache Way is *not* meant to be primarily vote-driven, votes are
> >>> supposed to be a last resort when, after having debated facts and data,
> >> no
> >>> consensus can be reached. Can we have the debate on facts and data
> first?
> >>> Please.
> >>>
> >>> At the of the day, I object to: "There are still a number of unresolved
> >>> issues, but to make progress I wonder if it would first be helpful to
> >> have
> >>> a vote on ensuring we are ANSI SQL 92 compliant for our arithmetic?".
> >> More
> >>> specifically, I disagree that such vote is a good starting point. Let's
> >>> identify and discuss the unresolved issues first. Let's check precisely
> >>> what getting our arithmetic ANSI SQL 92 compliant means and how we can
> >> get
> >>> it. I do support the idea of making such analysis btw, it would be good
> >>> data, but no vote is needed whatsoever to make it. Again, I object to
> >>> voting first and doing the analysis 2nd.
> >>>
> >>> --
> >>> Sylvain
> >>>
> >>>
> >>> On Thu, Nov 22, 2018 at 1:25 AM Jonathan Haddad <jo...@jonhaddad.com>
> >> wrote:
> >>>
> >>>> I can’t agree more. We should be able to make changes in a manner that
> >>>> improves the DB In the long term, rather than live with the technical
> >> debt
> >>>> of arbitrary decisions made by a handful of people.
> >>>>
> >>>> I also agree that putting a knob in place to let people migrate over
> is
> >> a
> >>>> reasonable decision.
> >>>>
> >>>> Jon
> >>>>
> >>>> On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith <
> >>>> benedict@apache.org>
> >>>> wrote:
> >>>>
> >>>>> The goal is simply to agree on a set of well-defined principles for
> how
> >>>> we
> >>>>> should behave.  If we don’t like the implications that arise, we’ll
> >> have
> >>>>> another vote?  A democracy cannot bind itself, so I never understood
> >> this
> >>>>> fear of a decision.
> >>>>>
> >>>>> A database also has a thousand toggles.  If we absolutely need to, we
> >> can
> >>>>> introduce one more.
> >>>>>
> >>>>> We should be doing this upfront a great deal more often.  Doing it
> >>>>> retrospectively sucks, but in my opinion it's a bad reason to bind
> >>>>> ourselves to whatever made it in.
> >>>>>
> >>>>> Do we anywhere define the principles of our current behaviour?  I
> >>>> couldn’t
> >>>>> find it.
> >>>>>
> >>>>>
> >>>>>> On 21 Nov 2018, at 21:08, Sylvain Lebresne <le...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>> On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <
> >>>>> benedict@apache.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> FWIW, my meaning of arithmetic in this context extends to any
> >> features
> >>>>> we
> >>>>>>> have already released (such as aggregates, and perhaps other
> built-in
> >>>>>>> functions) that operate on the same domain.  We should be
> consistent,
> >>>>> after
> >>>>>>> all.
> >>>>>>>
> >>>>>>> Whether or not we need to revisit any existing functionality we can
> >>>>> figure
> >>>>>>> out after the fact, once we have agreed what our behaviour should
> be.
> >>>>>>>
> >>>>>>
> >>>>>> I'm not sure I correctly understand the process suggested, but I
> don't
> >>>>>> particularly like/agree with what I understand. What I understand
> is a
> >>>>>> suggestion for voting on agreeing to be ANSI SQL 92 compliant, with
> no
> >>>>> real
> >>>>>> evaluation of what that entails (at least I haven't seen one), and
> >> that
> >>>>>> this vote, if passed, would imply we'd then make any backward
> >>>>> incompatible
> >>>>>> change necessary to achieve compliance ("my meaning of arithmetic in
> >>>> this
> >>>>>> context extends to any features we have already released" and
> "Whether
> >>>> or
> >>>>>> not we need to revisit any existing functionality we can figure out
> >>>> after
> >>>>>> the fact, once we have agreed what our behaviour should be").
> >>>>>>
> >>>>>> This might make sense of a new product, but at our stage that seems
> >>>>>> backward to me. I think we owe our users to first make the effort of
> >>>>>> identifying what "inconsistencies" our existing arithmetic has[1]
> and
> >>>>>> _then_ consider what options we have to fix those, with their pros
> and
> >>>>> cons
> >>>>>> (including how bad they break backward compatibility). And if _then_
> >>>>>> getting ANSI SQL 92 compliant proves to not be disruptive (or at
> least
> >>>>>> acceptably so), then sure, that's great.
> >>>>>>
> >>>>>> [1]: one possibly efficient way to do that could actually be to
> >> compare
> >>>>> our
> >>>>>> arithmetic to ANSI SQL 92. Not that all differences found would
> imply
> >>>>>> inconsistencies/wrongness of our arithmetic, but still, it should be
> >>>>>> helpful. And I guess my whole point is that we should that analysis
> >>>>> first,
> >>>>>> and then maybe decide that being ANSI SQL 92 is a reasonable option,
> >>>> not
> >>>>>> decide first and live with the consequences no matter what they are.
> >>>>>>
> >>>>>> --
> >>>>>> Sylvain
> >>>>>>
> >>>>>>
> >>>>>>> I will make this more explicit for the vote, but just to clarify
> the
> >>>>>>> intention so that we are all discussing the same thing.
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 20 Nov 2018, at 14:18, Ariel Weisberg <ad...@fastmail.fm>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> +1
> >>>>>>>>
> >>>>>>>> This is a public API so we will be much better off if we get it
> >> right
> >>>>>>> the first time.
> >>>>>>>>
> >>>>>>>> Ariel
> >>>>>>>>
> >>>>>>>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <jon@jonhaddad.com
> >
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Sounds good to me.
> >>>>>>>>>
> >>>>>>>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
> >>>>>>> benedict@apache.org>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> So, this thread somewhat petered out.
> >>>>>>>>>>
> >>>>>>>>>> There are still a number of unresolved issues, but to make
> >>>> progress I
> >>>>>>>>>> wonder if it would first be helpful to have a vote on ensuring
> we
> >>>> are
> >>>>>>> ANSI
> >>>>>>>>>> SQL 92 compliant for our arithmetic?  This seems like a sensible
> >>>>>>> baseline,
> >>>>>>>>>> since we will hopefully minimise surprise to operators this way.
> >>>>>>>>>>
> >>>>>>>>>> If people largely agree, I will call a vote, and we can pick up
> a
> >>>>>>> couple
> >>>>>>>>>> of more focused discussions afterwards on how we interpret the
> >>>> leeway
> >>>>>>> it
> >>>>>>>>>> gives.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws>
> >>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> From reading the spec. Precision is always implementation
> >> defined.
> >>>>> The
> >>>>>>>>>> spec specifies scale in several cases, but never precision for
> any
> >>>>>>> type or
> >>>>>>>>>> operation (addition/subtraction, multiplication, division).
> >>>>>>>>>>>
> >>>>>>>>>>> So we don't implement anything remotely approaching precision
> and
> >>>>>>> scale
> >>>>>>>>>> in CQL when it comes to numbers I think? So we aren't going to
> >>>> follow
> >>>>>>> the
> >>>>>>>>>> spec for scale. We are already pretty far down that road so I
> >> would
> >>>>>>> leave
> >>>>>>>>>> it alone.
> >>>>>>>>>>>
> >>>>>>>>>>> I don't think the spec is asking for the most approximate type.
> >>>> It's
> >>>>>>>>>> just saying the result is approximate, and the precision is
> >>>>>>> implementation
> >>>>>>>>>> defined. We could return either float or double. I think if one
> of
> >>>>> the
> >>>>>>>>>> operands is a double we should return a double because clearly
> the
> >>>>>>> schema
> >>>>>>>>>> thought a double was required to represent that number. I would
> >>>> also
> >>>>>>> be in
> >>>>>>>>>> favor of returning a double all the time so that people can
> expect
> >>>> a
> >>>>>>>>>> consistent type from expressions involving approximate numbers.
> >>>>>>>>>>>
> >>>>>>>>>>> I am a big fan of widening for arithmetic expressions in a
> >>>> database
> >>>>> to
> >>>>>>>>>> avoid having to error on overflow. You can go to the trouble of
> >>>> only
> >>>>>>>>>> widening the minimum amount, but I think it's simpler if we
> always
> >>>>>>> widen to
> >>>>>>>>>> bigint and double. This would be something the spec allows.
> >>>>>>>>>>>
> >>>>>>>>>>> Definitely if we can make overflow not occur we should and the
> >>>> spec
> >>>>>>>>>> allows that. We should also not return different types for the
> >> same
> >>>>>>> operand
> >>>>>>>>>> types just to work around overflow if we detect we need more
> >>>>> precision.
> >>>>>>>>>>>
> >>>>>>>>>>> Ariel
> >>>>>>>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith
> wrote:
> >>>>>>>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for
> >>>> digging
> >>>>>>> this
> >>>>>>>>>>>> out (and Mike for getting some empirical examples).
> >>>>>>>>>>>>
> >>>>>>>>>>>> We still have to decide on the approximate data type to
> return;
> >>>>> right
> >>>>>>>>>>>> now, we have float+bigint=double, but float+int=float.  I
> think
> >>>>> this
> >>>>>>> is
> >>>>>>>>>>>> fairly inconsistent, and either the approximate type should
> >>>> always
> >>>>>>> win,
> >>>>>>>>>>>> or we should always upgrade to double for mixed operands.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The quoted spec also suggests that decimal+float=float, and
> >>>> decimal
> >>>>>>>>>>>> +double=double, whereas we currently have
> decimal+float=decimal,
> >>>>> and
> >>>>>>>>>>>> decimal+double=decimal
> >>>>>>>>>>>>
> >>>>>>>>>>>> If we’re going to go with an approximate operand implying an
> >>>>>>>>>> approximate
> >>>>>>>>>>>> result, I think we should do it consistently (and consistent
> >> with
> >>>>> the
> >>>>>>>>>>>> SQL92 spec), and have the type of the approximate operand
> always
> >>>> be
> >>>>>>> the
> >>>>>>>>>>>> return type.
> >>>>>>>>>>>>
> >>>>>>>>>>>> This would still leave a decision for float+double, though.
> The
> >>>>> most
> >>>>>>>>>>>> consistent behaviour with that stated above would be to always
> >>>> take
> >>>>>>> the
> >>>>>>>>>>>> most approximate type to return (i.e. float), but this would
> >> seem
> >>>>> to
> >>>>>>> me
> >>>>>>>>>>>> to be fairly unexpected for the user.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws>
> >>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I agree with what's been said about expectations regarding
> >>>>>>> expressions
> >>>>>>>>>> involving floating point numbers. I think that if one of the
> >> inputs
> >>>>> is
> >>>>>>>>>> approximate then the result should be approximate.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> One thing we could look at for inspiration is the SQL spec.
> Not
> >>>> to
> >>>>>>>>>> follow dogmatically necessarily.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> From the SQL 92 spec regarding assignment
> >>>>>>>>>>
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.contrib.andrew.cmu.edu_-7Eshadow_sql_sql1992.txt&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=2dMzYnFvO5Wf7J74IbDE27vxjfOX2xYT4-u7MEXUqHg&e=
> <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.contrib.andrew.cmu.edu_-7Eshadow_sql_sql1992.txt&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=2dMzYnFvO5Wf7J74IbDE27vxjfOX2xYT4-u7MEXUqHg&e=
> >
> >> section
> >>>>> 4.6:
> >>>>>>>>>>>>> "
> >>>>>>>>>>>>>   Values of the data types NUMERIC, DECIMAL, INTEGER,
> >>>> SMALLINT,
> >>>>>>>>>>>>>   FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
> >>>>>>>>>> mutually
> >>>>>>>>>>>>>   comparable and mutually assignable. If an assignment would
> >>>>>>>>>> result
> >>>>>>>>>>>>>   in a loss of the most significant digits, an exception
> >>>>>>> condition
> >>>>>>>>>>>>>   is raised. If least significant digits are lost,
> >>>>>>> implementation-
> >>>>>>>>>>>>>   defined rounding or truncating occurs with no exception
> >>>>>>>>>> condition
> >>>>>>>>>>>>>   being raised. The rules for arithmetic are generally
> >>>> governed
> >>>>>>> by
> >>>>>>>>>>>>>   Subclause 6.12, "<numeric value expression>".
> >>>>>>>>>>>>> "
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Section 6.12 numeric value expressions:
> >>>>>>>>>>>>> "
> >>>>>>>>>>>>>   1) If the data type of both operands of a dyadic arithmetic
> >>>>>>>>>> opera-
> >>>>>>>>>>>>>      tor is exact numeric, then the data type of the result
> is
> >>>>>>>>>> exact
> >>>>>>>>>>>>>      numeric, with precision and scale determined as follows:
> >>>>>>>>>>>>> ...
> >>>>>>>>>>>>>   2) If the data type of either operand of a dyadic
> arithmetic
> >>>>>>> op-
> >>>>>>>>>>>>>      erator is approximate numeric, then the data type of the
> >>>>> re-
> >>>>>>>>>>>>>      sult is approximate numeric. The precision of the result
> >>>> is
> >>>>>>>>>>>>>      implementation-defined.
> >>>>>>>>>>>>> "
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> And this makes sense to me. I think we should only return an
> >>>> exact
> >>>>>>>>>> result if both of the inputs are exact.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I think we might want to look closely at the SQL spec and
> >>>>> especially
> >>>>>>>>>> when the spec requires an error to be generated. Those are
> >>>> sometimes
> >>>>>>> in the
> >>>>>>>>>> spec to prevent subtle paths to wrong answers. Any time we
> deviate
> >>>>>>> from the
> >>>>>>>>>> spec we should be asking why is it in the spec and why are we
> >>>>>>> deviating.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Another issue besides overflow handling is how we determine
> >>>>>>> precision
> >>>>>>>>>> and scale for expressions involving two exact types.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Ariel
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I'm not sure if I would prefer the Postgres way of doing
> >>>> things,
> >>>>>>>>>> which is
> >>>>>>>>>>>>>> returning just about any type depending on the order of
> >>>>> operators.
> >>>>>>>>>>>>>> Considering it actually mentions in the docs that using
> >>>>>>>>>> numeric/decimal is
> >>>>>>>>>>>>>> slow and also multiple times that floating points are
> inexact.
> >>>> So
> >>>>>>>>>> doing
> >>>>>>>>>>>>>> some math with Postgres (9.6.5):
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647>::bigint*1.0::double
> precision returns double
> >>>>>>>>>>>>>> precision 2147483647 <tel:2147483647>
> >>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647>::bigint*1.0 returns
> numeric 2147483647.0 <tel:2147483647.0>
> >>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647>::bigint*1.0::real
> returns double
> >>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647>::double
> precision*1::bigint returns double
> >>>>>>>>>> 2147483647 <tel:2147483647>
> >>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647>::double
> precision*1.0::bigint returns double
> >>>>>>>>>> 2147483647 <tel:2147483647>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> With + - we can get the same amount of mixture of returned
> >>>> types.
> >>>>>>>>>> There's
> >>>>>>>>>>>>>> no difference in those calculations, just some casting. To
> me
> >>>>>>>>>>>>>> floating-point math indicates inexactness and has errors and
> >>>>>>> whoever
> >>>>>>>>>> mixes
> >>>>>>>>>>>>>> up two different types should understand that. If one didn't
> >>>> want
> >>>>>>>>>> exact
> >>>>>>>>>>>>>> numeric type, why would the server return such? The floating
> >>>>> point
> >>>>>>>>>> value
> >>>>>>>>>>>>>> itself could be wrong already before the calculation -
> trying
> >>>> to
> >>>>>>> say
> >>>>>>>>>> we do
> >>>>>>>>>>>>>> it lossless is just wrong.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Fun with 2.65:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
> >>>>>>>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> SELECT round(2.65) returns numeric 4
> >>>>>>>>>>>>>> SELECT round(2.65::double precision) returns double 4
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> SELECT 2.65 * 1 returns double 2.65
> >>>>>>>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
> >>>>>>>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
> >>>>>>>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
> >>>>>>>>>>>>>> SELECT round(2.65) * round(1) returns double 3
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So as we're going to have silly values in any case, why
> >> pretend
> >>>>>>>>>> something
> >>>>>>>>>>>>>> else? Also, exact calculations are slow if we crunch large
> >>>> amount
> >>>>>>> of
> >>>>>>>>>>>>>> numbers. I guess I slightly deviated towards Postgres'
> >>>>> implemention
> >>>>>>>>>> in this
> >>>>>>>>>>>>>> case, but I wish it wasn't used as a benchmark in this case.
> >>>> And
> >>>>>>> most
> >>>>>>>>>>>>>> importantly, I would definitely want the exact same type
> >>>> returned
> >>>>>>>>>> each time
> >>>>>>>>>>>>>> I do a calculation.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> - Micke
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
> >>>>>>>>>> benedict@apache.org <ma...@apache.org>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> As far as I can tell we reached a relatively strong
> consensus
> >>>>>>> that we
> >>>>>>>>>>>>>>> should implement lossless casts by default?  Does anyone
> have
> >>>>>>>>>> anything more
> >>>>>>>>>>>>>>> to add?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Looking at the emails, everyone who participated and
> >>>> expressed a
> >>>>>>>>>>>>>>> preference was in favour of the “Postgres approach” of
> >>>> upcasting
> >>>>>>> to
> >>>>>>>>>> decimal
> >>>>>>>>>>>>>>> for mixed float/int operands?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I’d like to get a clear-cut decision on this, so we know
> what
> >>>>>>> we’re
> >>>>>>>>>> doing
> >>>>>>>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective
> >>>> decision
> >>>>>>> on
> >>>>>>>>>> Ariel’s
> >>>>>>>>>>>>>>> concerns about overflow, which I think are also pressing -
> >>>>>>>>>> particularly for
> >>>>>>>>>>>>>>> tinyint and smallint.  This does also impact implicit casts
> >>>> for
> >>>>>>> mixed
> >>>>>>>>>>>>>>> integer type operations, but an approach for these will
> >>>> probably
> >>>>>>>>>> fall out
> >>>>>>>>>>>>>>> of any decision on overflow.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
> >>>>>>>>>> murukesh.mohanan@gmail.com <ma...@gmail.com>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I think you're conflating two things here. There's the
> loss
> >>>>>>>>>> resulting
> >>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>> using some operators, and loss involved in casting.
> Dividing
> >>>> an
> >>>>>>>>>> integer
> >>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>> another integer to obtain an integer result can result in
> >>>> loss,
> >>>>>>> but
> >>>>>>>>>>>>>>> there's
> >>>>>>>>>>>>>>>> no implicit casting there and no loss due to casting.
> >>>> Casting
> >>>>> an
> >>>>>>>>>> integer
> >>>>>>>>>>>>>>>> to a float can also result in loss. So dividing an integer
> >>>> by a
> >>>>>>>>>> float,
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>> example, with an implicit cast has an additional avenue
> for
> >>>>> loss:
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>> implicit cast for the operands so that they're of the same
> >>>>> type.
> >>>>>>> I
> >>>>>>>>>>>>>>> believe
> >>>>>>>>>>>>>>>> this discussion so far has been about the latter, not the
> >>>> loss
> >>>>>>> from
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>> operations themselves.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
> >>>>>>>>>> benjamin.lerer@datastax.com <mailto:benjamin.lerer@datastax.com
> >>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I would like to try to clarify things a bit to help
> people
> >>>> to
> >>>>>>>>>> understand
> >>>>>>>>>>>>>>>>> the true complexity of the problem.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The *float *and *double *types are inexact numeric types.
> >>>> Not
> >>>>>>> only
> >>>>>>>>>> at
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> operation level.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> If you insert 676543.21 in a *float* column and then read
> >>>> it,
> >>>>>>> you
> >>>>>>>>>> will
> >>>>>>>>>>>>>>>>> realize that the value has been truncated to 676543.2.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> If you want accuracy the only way is to avoid those
> inexact
> >>>>>>> types.
> >>>>>>>>>>>>>>>>> Using *decimals
> >>>>>>>>>>>>>>>>> *during operations will mitigate the problem but will not
> >>>>> remove
> >>>>>>>>>> it.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I
> am
> >>>> not
> >>>>>>>>>> mistaken
> >>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar
> >> to
> >>>>>>> what
> >>>>>>>>>> MS SQL
> >>>>>>>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
> >>>>>>> precision
> >>>>>>>>>> if you
> >>>>>>>>>>>>>>>>> are not carefull.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> If you truly need precision you can have it by using
> exact
> >>>>>>> numeric
> >>>>>>>>>> types
> >>>>>>>>>>>>>>>>> for your data types. Of course it has a cost on
> >> performance,
> >>>>>>>>>> memory and
> >>>>>>>>>>>>>>>>> disk usage.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The advantage of the current approach is that it give you
> >>>> the
> >>>>>>>>>> choice.
> >>>>>>>>>>>>>>> It is
> >>>>>>>>>>>>>>>>> up to you to decide what you need for your application.
> It
> >>>> is
> >>>>>>> also
> >>>>>>>>>> in
> >>>>>>>>>>>>>>> line
> >>>>>>>>>>>>>>>>> with the way CQL behave everywhere else.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Muru
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>>>>>>>>>> To unsubscribe, e-mail:
> dev-unsubscribe@cassandra.apache.org <mailto:
> dev-unsubscribe@cassandra.apache.org>
> >>>>>>>>>>>>>>> For additional commands, e-mail:
> >>>> dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >>>>>>>>>>>>> For additional commands, e-mail:
> dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >>>>>>>>>>>> For additional commands, e-mail:
> dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>> ---------------------------------------------------------------------
> >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>> Jon Haddad
> >>>>>>>>>
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
> <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
> >
> >>>>>>>>> twitter: rustyrazorblade
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> <ma...@cassandra.apache.org>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>> --
> >>>> Jon Haddad
> >>>>
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
> <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
> >
> >>>> twitter: rustyrazorblade
> >>>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <mailto:
> dev-unsubscribe@cassandra.apache.org>
> >> For additional commands, e-mail: dev-help@cassandra.apache.org <mailto:
> dev-help@cassandra.apache.org>
>

Re: Implicit Casts for Arithmetic Operators

Posted by Benedict Elliott Smith <be...@apache.org>.

As I say, for me this is explicitly unhelpful, so I have no intention of producing it (though, of course, I cannot prevent you from producing it)

For me, the correct approach is to decide where we should be, and then figure out how to get there.  Where we are has no bearing on where we should be, in my view.



> On 22 Nov 2018, at 11:12, Benjamin Lerer <be...@datastax.com> wrote:
> 
> I would also like to see an analysis of what being ANSI SQL 92 compliant
> means in term of change of behavior (for arithmetics and *any features we
> have already released*).
> Simply because without it, I find the decision pretty hard to make.
> 
> On Thu, Nov 22, 2018 at 11:51 AM Benedict Elliott Smith <benedict@apache.org <ma...@apache.org>>
> wrote:
> 
>> We’re not presently voting*; we’re only discussing, whether we should base
>> our behaviour on a widely agreed upon standard.
>> 
>> I think perhaps the nub of our disagreement is that, in my view, this is
>> the only relevant fact to decide.  There is no data to base this decision
>> upon.  It’s axiomatic, or ideological; procedural, not technical:  Do we
>> think we should try to hew to standards (where appropriate), or do we think
>> we should stick with what we arrived at in an adhoc manner?
>> 
>> If we believe the former, as I now do, then the current state is only
>> relevant when we come to implement the decision.
>> 
>> 
>> * But given how peripheral and inherently ideological this decision is,
>> and how meandering the discussion was with no clear consensus, it seemed to
>> need a vote in the near future.  The prospect of a vote seems to have
>> brought some healthy debate forward too, which is great, but I apologise if
>> this somehow came across as presumptuous.
>> 
>> 
>>> On 22 Nov 2018, at 09:26, Sylvain Lebresne <le...@gmail.com> wrote:
>>> 
>>> I'm not saying "let's not do this no matter what and ever fix technical
>>> debt", nor am I fearing decision.
>>> 
>>> But I *do* think decisions, technical ones at least, should be fact and
>>> data driven. And I'm not even sure why we're talking of having a vote
>> here.
>>> The Apache Way is *not* meant to be primarily vote-driven, votes are
>>> supposed to be a last resort when, after having debated facts and data,
>> no
>>> consensus can be reached. Can we have the debate on facts and data first?
>>> Please.
>>> 
>>> At the of the day, I object to: "There are still a number of unresolved
>>> issues, but to make progress I wonder if it would first be helpful to
>> have
>>> a vote on ensuring we are ANSI SQL 92 compliant for our arithmetic?".
>> More
>>> specifically, I disagree that such vote is a good starting point. Let's
>>> identify and discuss the unresolved issues first. Let's check precisely
>>> what getting our arithmetic ANSI SQL 92 compliant means and how we can
>> get
>>> it. I do support the idea of making such analysis btw, it would be good
>>> data, but no vote is needed whatsoever to make it. Again, I object to
>>> voting first and doing the analysis 2nd.
>>> 
>>> --
>>> Sylvain
>>> 
>>> 
>>> On Thu, Nov 22, 2018 at 1:25 AM Jonathan Haddad <jo...@jonhaddad.com>
>> wrote:
>>> 
>>>> I can’t agree more. We should be able to make changes in a manner that
>>>> improves the DB In the long term, rather than live with the technical
>> debt
>>>> of arbitrary decisions made by a handful of people.
>>>> 
>>>> I also agree that putting a knob in place to let people migrate over is
>> a
>>>> reasonable decision.
>>>> 
>>>> Jon
>>>> 
>>>> On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith <
>>>> benedict@apache.org>
>>>> wrote:
>>>> 
>>>>> The goal is simply to agree on a set of well-defined principles for how
>>>> we
>>>>> should behave.  If we don’t like the implications that arise, we’ll
>> have
>>>>> another vote?  A democracy cannot bind itself, so I never understood
>> this
>>>>> fear of a decision.
>>>>> 
>>>>> A database also has a thousand toggles.  If we absolutely need to, we
>> can
>>>>> introduce one more.
>>>>> 
>>>>> We should be doing this upfront a great deal more often.  Doing it
>>>>> retrospectively sucks, but in my opinion it's a bad reason to bind
>>>>> ourselves to whatever made it in.
>>>>> 
>>>>> Do we anywhere define the principles of our current behaviour?  I
>>>> couldn’t
>>>>> find it.
>>>>> 
>>>>> 
>>>>>> On 21 Nov 2018, at 21:08, Sylvain Lebresne <le...@gmail.com>
>> wrote:
>>>>>> 
>>>>>> On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <
>>>>> benedict@apache.org>
>>>>>> wrote:
>>>>>> 
>>>>>>> FWIW, my meaning of arithmetic in this context extends to any
>> features
>>>>> we
>>>>>>> have already released (such as aggregates, and perhaps other built-in
>>>>>>> functions) that operate on the same domain.  We should be consistent,
>>>>> after
>>>>>>> all.
>>>>>>> 
>>>>>>> Whether or not we need to revisit any existing functionality we can
>>>>> figure
>>>>>>> out after the fact, once we have agreed what our behaviour should be.
>>>>>>> 
>>>>>> 
>>>>>> I'm not sure I correctly understand the process suggested, but I don't
>>>>>> particularly like/agree with what I understand. What I understand is a
>>>>>> suggestion for voting on agreeing to be ANSI SQL 92 compliant, with no
>>>>> real
>>>>>> evaluation of what that entails (at least I haven't seen one), and
>> that
>>>>>> this vote, if passed, would imply we'd then make any backward
>>>>> incompatible
>>>>>> change necessary to achieve compliance ("my meaning of arithmetic in
>>>> this
>>>>>> context extends to any features we have already released" and "Whether
>>>> or
>>>>>> not we need to revisit any existing functionality we can figure out
>>>> after
>>>>>> the fact, once we have agreed what our behaviour should be").
>>>>>> 
>>>>>> This might make sense of a new product, but at our stage that seems
>>>>>> backward to me. I think we owe our users to first make the effort of
>>>>>> identifying what "inconsistencies" our existing arithmetic has[1] and
>>>>>> _then_ consider what options we have to fix those, with their pros and
>>>>> cons
>>>>>> (including how bad they break backward compatibility). And if _then_
>>>>>> getting ANSI SQL 92 compliant proves to not be disruptive (or at least
>>>>>> acceptably so), then sure, that's great.
>>>>>> 
>>>>>> [1]: one possibly efficient way to do that could actually be to
>> compare
>>>>> our
>>>>>> arithmetic to ANSI SQL 92. Not that all differences found would imply
>>>>>> inconsistencies/wrongness of our arithmetic, but still, it should be
>>>>>> helpful. And I guess my whole point is that we should that analysis
>>>>> first,
>>>>>> and then maybe decide that being ANSI SQL 92 is a reasonable option,
>>>> not
>>>>>> decide first and live with the consequences no matter what they are.
>>>>>> 
>>>>>> --
>>>>>> Sylvain
>>>>>> 
>>>>>> 
>>>>>>> I will make this more explicit for the vote, but just to clarify the
>>>>>>> intention so that we are all discussing the same thing.
>>>>>>> 
>>>>>>> 
>>>>>>>> On 20 Nov 2018, at 14:18, Ariel Weisberg <ad...@fastmail.fm>
>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> +1
>>>>>>>> 
>>>>>>>> This is a public API so we will be much better off if we get it
>> right
>>>>>>> the first time.
>>>>>>>> 
>>>>>>>> Ariel
>>>>>>>> 
>>>>>>>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <jo...@jonhaddad.com>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Sounds good to me.
>>>>>>>>> 
>>>>>>>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
>>>>>>> benedict@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> So, this thread somewhat petered out.
>>>>>>>>>> 
>>>>>>>>>> There are still a number of unresolved issues, but to make
>>>> progress I
>>>>>>>>>> wonder if it would first be helpful to have a vote on ensuring we
>>>> are
>>>>>>> ANSI
>>>>>>>>>> SQL 92 compliant for our arithmetic?  This seems like a sensible
>>>>>>> baseline,
>>>>>>>>>> since we will hopefully minimise surprise to operators this way.
>>>>>>>>>> 
>>>>>>>>>> If people largely agree, I will call a vote, and we can pick up a
>>>>>>> couple
>>>>>>>>>> of more focused discussions afterwards on how we interpret the
>>>> leeway
>>>>>>> it
>>>>>>>>>> gives.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws>
>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> From reading the spec. Precision is always implementation
>> defined.
>>>>> The
>>>>>>>>>> spec specifies scale in several cases, but never precision for any
>>>>>>> type or
>>>>>>>>>> operation (addition/subtraction, multiplication, division).
>>>>>>>>>>> 
>>>>>>>>>>> So we don't implement anything remotely approaching precision and
>>>>>>> scale
>>>>>>>>>> in CQL when it comes to numbers I think? So we aren't going to
>>>> follow
>>>>>>> the
>>>>>>>>>> spec for scale. We are already pretty far down that road so I
>> would
>>>>>>> leave
>>>>>>>>>> it alone.
>>>>>>>>>>> 
>>>>>>>>>>> I don't think the spec is asking for the most approximate type.
>>>> It's
>>>>>>>>>> just saying the result is approximate, and the precision is
>>>>>>> implementation
>>>>>>>>>> defined. We could return either float or double. I think if one of
>>>>> the
>>>>>>>>>> operands is a double we should return a double because clearly the
>>>>>>> schema
>>>>>>>>>> thought a double was required to represent that number. I would
>>>> also
>>>>>>> be in
>>>>>>>>>> favor of returning a double all the time so that people can expect
>>>> a
>>>>>>>>>> consistent type from expressions involving approximate numbers.
>>>>>>>>>>> 
>>>>>>>>>>> I am a big fan of widening for arithmetic expressions in a
>>>> database
>>>>> to
>>>>>>>>>> avoid having to error on overflow. You can go to the trouble of
>>>> only
>>>>>>>>>> widening the minimum amount, but I think it's simpler if we always
>>>>>>> widen to
>>>>>>>>>> bigint and double. This would be something the spec allows.
>>>>>>>>>>> 
>>>>>>>>>>> Definitely if we can make overflow not occur we should and the
>>>> spec
>>>>>>>>>> allows that. We should also not return different types for the
>> same
>>>>>>> operand
>>>>>>>>>> types just to work around overflow if we detect we need more
>>>>> precision.
>>>>>>>>>>> 
>>>>>>>>>>> Ariel
>>>>>>>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
>>>>>>>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for
>>>> digging
>>>>>>> this
>>>>>>>>>>>> out (and Mike for getting some empirical examples).
>>>>>>>>>>>> 
>>>>>>>>>>>> We still have to decide on the approximate data type to return;
>>>>> right
>>>>>>>>>>>> now, we have float+bigint=double, but float+int=float.  I think
>>>>> this
>>>>>>> is
>>>>>>>>>>>> fairly inconsistent, and either the approximate type should
>>>> always
>>>>>>> win,
>>>>>>>>>>>> or we should always upgrade to double for mixed operands.
>>>>>>>>>>>> 
>>>>>>>>>>>> The quoted spec also suggests that decimal+float=float, and
>>>> decimal
>>>>>>>>>>>> +double=double, whereas we currently have decimal+float=decimal,
>>>>> and
>>>>>>>>>>>> decimal+double=decimal
>>>>>>>>>>>> 
>>>>>>>>>>>> If we’re going to go with an approximate operand implying an
>>>>>>>>>> approximate
>>>>>>>>>>>> result, I think we should do it consistently (and consistent
>> with
>>>>> the
>>>>>>>>>>>> SQL92 spec), and have the type of the approximate operand always
>>>> be
>>>>>>> the
>>>>>>>>>>>> return type.
>>>>>>>>>>>> 
>>>>>>>>>>>> This would still leave a decision for float+double, though.  The
>>>>> most
>>>>>>>>>>>> consistent behaviour with that stated above would be to always
>>>> take
>>>>>>> the
>>>>>>>>>>>> most approximate type to return (i.e. float), but this would
>> seem
>>>>> to
>>>>>>> me
>>>>>>>>>>>> to be fairly unexpected for the user.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws>
>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I agree with what's been said about expectations regarding
>>>>>>> expressions
>>>>>>>>>> involving floating point numbers. I think that if one of the
>> inputs
>>>>> is
>>>>>>>>>> approximate then the result should be approximate.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> One thing we could look at for inspiration is the SQL spec. Not
>>>> to
>>>>>>>>>> follow dogmatically necessarily.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> From the SQL 92 spec regarding assignment
>>>>>>>>>> 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.contrib.andrew.cmu.edu_-7Eshadow_sql_sql1992.txt&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=2dMzYnFvO5Wf7J74IbDE27vxjfOX2xYT4-u7MEXUqHg&e= <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.contrib.andrew.cmu.edu_-7Eshadow_sql_sql1992.txt&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=2dMzYnFvO5Wf7J74IbDE27vxjfOX2xYT4-u7MEXUqHg&e=>
>> section
>>>>> 4.6:
>>>>>>>>>>>>> "
>>>>>>>>>>>>>   Values of the data types NUMERIC, DECIMAL, INTEGER,
>>>> SMALLINT,
>>>>>>>>>>>>>   FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
>>>>>>>>>> mutually
>>>>>>>>>>>>>   comparable and mutually assignable. If an assignment would
>>>>>>>>>> result
>>>>>>>>>>>>>   in a loss of the most significant digits, an exception
>>>>>>> condition
>>>>>>>>>>>>>   is raised. If least significant digits are lost,
>>>>>>> implementation-
>>>>>>>>>>>>>   defined rounding or truncating occurs with no exception
>>>>>>>>>> condition
>>>>>>>>>>>>>   being raised. The rules for arithmetic are generally
>>>> governed
>>>>>>> by
>>>>>>>>>>>>>   Subclause 6.12, "<numeric value expression>".
>>>>>>>>>>>>> "
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Section 6.12 numeric value expressions:
>>>>>>>>>>>>> "
>>>>>>>>>>>>>   1) If the data type of both operands of a dyadic arithmetic
>>>>>>>>>> opera-
>>>>>>>>>>>>>      tor is exact numeric, then the data type of the result is
>>>>>>>>>> exact
>>>>>>>>>>>>>      numeric, with precision and scale determined as follows:
>>>>>>>>>>>>> ...
>>>>>>>>>>>>>   2) If the data type of either operand of a dyadic arithmetic
>>>>>>> op-
>>>>>>>>>>>>>      erator is approximate numeric, then the data type of the
>>>>> re-
>>>>>>>>>>>>>      sult is approximate numeric. The precision of the result
>>>> is
>>>>>>>>>>>>>      implementation-defined.
>>>>>>>>>>>>> "
>>>>>>>>>>>>> 
>>>>>>>>>>>>> And this makes sense to me. I think we should only return an
>>>> exact
>>>>>>>>>> result if both of the inputs are exact.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I think we might want to look closely at the SQL spec and
>>>>> especially
>>>>>>>>>> when the spec requires an error to be generated. Those are
>>>> sometimes
>>>>>>> in the
>>>>>>>>>> spec to prevent subtle paths to wrong answers. Any time we deviate
>>>>>>> from the
>>>>>>>>>> spec we should be asking why is it in the spec and why are we
>>>>>>> deviating.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Another issue besides overflow handling is how we determine
>>>>>>> precision
>>>>>>>>>> and scale for expressions involving two exact types.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Ariel
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I'm not sure if I would prefer the Postgres way of doing
>>>> things,
>>>>>>>>>> which is
>>>>>>>>>>>>>> returning just about any type depending on the order of
>>>>> operators.
>>>>>>>>>>>>>> Considering it actually mentions in the docs that using
>>>>>>>>>> numeric/decimal is
>>>>>>>>>>>>>> slow and also multiple times that floating points are inexact.
>>>> So
>>>>>>>>>> doing
>>>>>>>>>>>>>> some math with Postgres (9.6.5):
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647>::bigint*1.0::double precision returns double
>>>>>>>>>>>>>> precision 2147483647 <tel:2147483647>
>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647>::bigint*1.0 returns numeric 2147483647.0 <tel:2147483647.0>
>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647>::bigint*1.0::real returns double
>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647>::double precision*1::bigint returns double
>>>>>>>>>> 2147483647 <tel:2147483647>
>>>>>>>>>>>>>> SELECT 2147483647 <tel:2147483647>::double precision*1.0::bigint returns double
>>>>>>>>>> 2147483647 <tel:2147483647>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> With + - we can get the same amount of mixture of returned
>>>> types.
>>>>>>>>>> There's
>>>>>>>>>>>>>> no difference in those calculations, just some casting. To me
>>>>>>>>>>>>>> floating-point math indicates inexactness and has errors and
>>>>>>> whoever
>>>>>>>>>> mixes
>>>>>>>>>>>>>> up two different types should understand that. If one didn't
>>>> want
>>>>>>>>>> exact
>>>>>>>>>>>>>> numeric type, why would the server return such? The floating
>>>>> point
>>>>>>>>>> value
>>>>>>>>>>>>>> itself could be wrong already before the calculation - trying
>>>> to
>>>>>>> say
>>>>>>>>>> we do
>>>>>>>>>>>>>> it lossless is just wrong.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Fun with 2.65:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
>>>>>>>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT round(2.65) returns numeric 4
>>>>>>>>>>>>>> SELECT round(2.65::double precision) returns double 4
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT 2.65 * 1 returns double 2.65
>>>>>>>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
>>>>>>>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
>>>>>>>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
>>>>>>>>>>>>>> SELECT round(2.65) * round(1) returns double 3
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So as we're going to have silly values in any case, why
>> pretend
>>>>>>>>>> something
>>>>>>>>>>>>>> else? Also, exact calculations are slow if we crunch large
>>>> amount
>>>>>>> of
>>>>>>>>>>>>>> numbers. I guess I slightly deviated towards Postgres'
>>>>> implemention
>>>>>>>>>> in this
>>>>>>>>>>>>>> case, but I wish it wasn't used as a benchmark in this case.
>>>> And
>>>>>>> most
>>>>>>>>>>>>>> importantly, I would definitely want the exact same type
>>>> returned
>>>>>>>>>> each time
>>>>>>>>>>>>>> I do a calculation.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - Micke
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
>>>>>>>>>> benedict@apache.org <ma...@apache.org>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> As far as I can tell we reached a relatively strong consensus
>>>>>>> that we
>>>>>>>>>>>>>>> should implement lossless casts by default?  Does anyone have
>>>>>>>>>> anything more
>>>>>>>>>>>>>>> to add?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Looking at the emails, everyone who participated and
>>>> expressed a
>>>>>>>>>>>>>>> preference was in favour of the “Postgres approach” of
>>>> upcasting
>>>>>>> to
>>>>>>>>>> decimal
>>>>>>>>>>>>>>> for mixed float/int operands?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I’d like to get a clear-cut decision on this, so we know what
>>>>>>> we’re
>>>>>>>>>> doing
>>>>>>>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective
>>>> decision
>>>>>>> on
>>>>>>>>>> Ariel’s
>>>>>>>>>>>>>>> concerns about overflow, which I think are also pressing -
>>>>>>>>>> particularly for
>>>>>>>>>>>>>>> tinyint and smallint.  This does also impact implicit casts
>>>> for
>>>>>>> mixed
>>>>>>>>>>>>>>> integer type operations, but an approach for these will
>>>> probably
>>>>>>>>>> fall out
>>>>>>>>>>>>>>> of any decision on overflow.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
>>>>>>>>>> murukesh.mohanan@gmail.com <ma...@gmail.com>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I think you're conflating two things here. There's the loss
>>>>>>>>>> resulting
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>> using some operators, and loss involved in casting. Dividing
>>>> an
>>>>>>>>>> integer
>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>> another integer to obtain an integer result can result in
>>>> loss,
>>>>>>> but
>>>>>>>>>>>>>>> there's
>>>>>>>>>>>>>>>> no implicit casting there and no loss due to casting.
>>>> Casting
>>>>> an
>>>>>>>>>> integer
>>>>>>>>>>>>>>>> to a float can also result in loss. So dividing an integer
>>>> by a
>>>>>>>>>> float,
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> example, with an implicit cast has an additional avenue for
>>>>> loss:
>>>>>>>>>> the
>>>>>>>>>>>>>>>> implicit cast for the operands so that they're of the same
>>>>> type.
>>>>>>> I
>>>>>>>>>>>>>>> believe
>>>>>>>>>>>>>>>> this discussion so far has been about the latter, not the
>>>> loss
>>>>>>> from
>>>>>>>>>> the
>>>>>>>>>>>>>>>> operations themselves.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
>>>>>>>>>> benjamin.lerer@datastax.com <ma...@datastax.com>>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I would like to try to clarify things a bit to help people
>>>> to
>>>>>>>>>> understand
>>>>>>>>>>>>>>>>> the true complexity of the problem.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> The *float *and *double *types are inexact numeric types.
>>>> Not
>>>>>>> only
>>>>>>>>>> at
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> operation level.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> If you insert 676543.21 in a *float* column and then read
>>>> it,
>>>>>>> you
>>>>>>>>>> will
>>>>>>>>>>>>>>>>> realize that the value has been truncated to 676543.2.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> If you want accuracy the only way is to avoid those inexact
>>>>>>> types.
>>>>>>>>>>>>>>>>> Using *decimals
>>>>>>>>>>>>>>>>> *during operations will mitigate the problem but will not
>>>>> remove
>>>>>>>>>> it.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am
>>>> not
>>>>>>>>>> mistaken
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar
>> to
>>>>>>> what
>>>>>>>>>> MS SQL
>>>>>>>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
>>>>>>> precision
>>>>>>>>>> if you
>>>>>>>>>>>>>>>>> are not carefull.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> If you truly need precision you can have it by using exact
>>>>>>> numeric
>>>>>>>>>> types
>>>>>>>>>>>>>>>>> for your data types. Of course it has a cost on
>> performance,
>>>>>>>>>> memory and
>>>>>>>>>>>>>>>>> disk usage.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> The advantage of the current approach is that it give you
>>>> the
>>>>>>>>>> choice.
>>>>>>>>>>>>>>> It is
>>>>>>>>>>>>>>>>> up to you to decide what you need for your application. It
>>>> is
>>>>>>> also
>>>>>>>>>> in
>>>>>>>>>>>>>>> line
>>>>>>>>>>>>>>>>> with the way CQL behave everywhere else.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Muru
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>> dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>> Jon Haddad
>>>>>>>>> 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e= <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=>
>>>>>>>>> twitter: rustyrazorblade
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> --
>>>> Jon Haddad
>>>> 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e= <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=>
>>>> twitter: rustyrazorblade
>>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>

Re: Implicit Casts for Arithmetic Operators

Posted by Benjamin Lerer <be...@datastax.com>.

I would also like to see an analysis of what being ANSI SQL 92 compliant
means in term of change of behavior (for arithmetics and *any features we
have already released*).
Simply because without it, I find the decision pretty hard to make.

On Thu, Nov 22, 2018 at 11:51 AM Benedict Elliott Smith <be...@apache.org>
wrote:

> We’re not presently voting*; we’re only discussing, whether we should base
> our behaviour on a widely agreed upon standard.
>
> I think perhaps the nub of our disagreement is that, in my view, this is
> the only relevant fact to decide.  There is no data to base this decision
> upon.  It’s axiomatic, or ideological; procedural, not technical:  Do we
> think we should try to hew to standards (where appropriate), or do we think
> we should stick with what we arrived at in an adhoc manner?
>
> If we believe the former, as I now do, then the current state is only
> relevant when we come to implement the decision.
>
>
> * But given how peripheral and inherently ideological this decision is,
> and how meandering the discussion was with no clear consensus, it seemed to
> need a vote in the near future.  The prospect of a vote seems to have
> brought some healthy debate forward too, which is great, but I apologise if
> this somehow came across as presumptuous.
>
>
> > On 22 Nov 2018, at 09:26, Sylvain Lebresne <le...@gmail.com> wrote:
> >
> > I'm not saying "let's not do this no matter what and ever fix technical
> > debt", nor am I fearing decision.
> >
> > But I *do* think decisions, technical ones at least, should be fact and
> > data driven. And I'm not even sure why we're talking of having a vote
> here.
> > The Apache Way is *not* meant to be primarily vote-driven, votes are
> > supposed to be a last resort when, after having debated facts and data,
> no
> > consensus can be reached. Can we have the debate on facts and data first?
> > Please.
> >
> > At the of the day, I object to: "There are still a number of unresolved
> > issues, but to make progress I wonder if it would first be helpful to
> have
> > a vote on ensuring we are ANSI SQL 92 compliant for our arithmetic?".
> More
> > specifically, I disagree that such vote is a good starting point. Let's
> > identify and discuss the unresolved issues first. Let's check precisely
> > what getting our arithmetic ANSI SQL 92 compliant means and how we can
> get
> > it. I do support the idea of making such analysis btw, it would be good
> > data, but no vote is needed whatsoever to make it. Again, I object to
> > voting first and doing the analysis 2nd.
> >
> > --
> > Sylvain
> >
> >
> > On Thu, Nov 22, 2018 at 1:25 AM Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
> >
> >> I can’t agree more. We should be able to make changes in a manner that
> >> improves the DB In the long term, rather than live with the technical
> debt
> >> of arbitrary decisions made by a handful of people.
> >>
> >> I also agree that putting a knob in place to let people migrate over is
> a
> >> reasonable decision.
> >>
> >> Jon
> >>
> >> On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith <
> >> benedict@apache.org>
> >> wrote:
> >>
> >>> The goal is simply to agree on a set of well-defined principles for how
> >> we
> >>> should behave.  If we don’t like the implications that arise, we’ll
> have
> >>> another vote?  A democracy cannot bind itself, so I never understood
> this
> >>> fear of a decision.
> >>>
> >>> A database also has a thousand toggles.  If we absolutely need to, we
> can
> >>> introduce one more.
> >>>
> >>> We should be doing this upfront a great deal more often.  Doing it
> >>> retrospectively sucks, but in my opinion it's a bad reason to bind
> >>> ourselves to whatever made it in.
> >>>
> >>> Do we anywhere define the principles of our current behaviour?  I
> >> couldn’t
> >>> find it.
> >>>
> >>>
> >>>> On 21 Nov 2018, at 21:08, Sylvain Lebresne <le...@gmail.com>
> wrote:
> >>>>
> >>>> On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <
> >>> benedict@apache.org>
> >>>> wrote:
> >>>>
> >>>>> FWIW, my meaning of arithmetic in this context extends to any
> features
> >>> we
> >>>>> have already released (such as aggregates, and perhaps other built-in
> >>>>> functions) that operate on the same domain.  We should be consistent,
> >>> after
> >>>>> all.
> >>>>>
> >>>>> Whether or not we need to revisit any existing functionality we can
> >>> figure
> >>>>> out after the fact, once we have agreed what our behaviour should be.
> >>>>>
> >>>>
> >>>> I'm not sure I correctly understand the process suggested, but I don't
> >>>> particularly like/agree with what I understand. What I understand is a
> >>>> suggestion for voting on agreeing to be ANSI SQL 92 compliant, with no
> >>> real
> >>>> evaluation of what that entails (at least I haven't seen one), and
> that
> >>>> this vote, if passed, would imply we'd then make any backward
> >>> incompatible
> >>>> change necessary to achieve compliance ("my meaning of arithmetic in
> >> this
> >>>> context extends to any features we have already released" and "Whether
> >> or
> >>>> not we need to revisit any existing functionality we can figure out
> >> after
> >>>> the fact, once we have agreed what our behaviour should be").
> >>>>
> >>>> This might make sense of a new product, but at our stage that seems
> >>>> backward to me. I think we owe our users to first make the effort of
> >>>> identifying what "inconsistencies" our existing arithmetic has[1] and
> >>>> _then_ consider what options we have to fix those, with their pros and
> >>> cons
> >>>> (including how bad they break backward compatibility). And if _then_
> >>>> getting ANSI SQL 92 compliant proves to not be disruptive (or at least
> >>>> acceptably so), then sure, that's great.
> >>>>
> >>>> [1]: one possibly efficient way to do that could actually be to
> compare
> >>> our
> >>>> arithmetic to ANSI SQL 92. Not that all differences found would imply
> >>>> inconsistencies/wrongness of our arithmetic, but still, it should be
> >>>> helpful. And I guess my whole point is that we should that analysis
> >>> first,
> >>>> and then maybe decide that being ANSI SQL 92 is a reasonable option,
> >> not
> >>>> decide first and live with the consequences no matter what they are.
> >>>>
> >>>> --
> >>>> Sylvain
> >>>>
> >>>>
> >>>>> I will make this more explicit for the vote, but just to clarify the
> >>>>> intention so that we are all discussing the same thing.
> >>>>>
> >>>>>
> >>>>>> On 20 Nov 2018, at 14:18, Ariel Weisberg <ad...@fastmail.fm>
> >> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> +1
> >>>>>>
> >>>>>> This is a public API so we will be much better off if we get it
> right
> >>>>> the first time.
> >>>>>>
> >>>>>> Ariel
> >>>>>>
> >>>>>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <jo...@jonhaddad.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>> Sounds good to me.
> >>>>>>>
> >>>>>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
> >>>>> benedict@apache.org>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> So, this thread somewhat petered out.
> >>>>>>>>
> >>>>>>>> There are still a number of unresolved issues, but to make
> >> progress I
> >>>>>>>> wonder if it would first be helpful to have a vote on ensuring we
> >> are
> >>>>> ANSI
> >>>>>>>> SQL 92 compliant for our arithmetic?  This seems like a sensible
> >>>>> baseline,
> >>>>>>>> since we will hopefully minimise surprise to operators this way.
> >>>>>>>>
> >>>>>>>> If people largely agree, I will call a vote, and we can pick up a
> >>>>> couple
> >>>>>>>> of more focused discussions afterwards on how we interpret the
> >> leeway
> >>>>> it
> >>>>>>>> gives.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws>
> >> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> From reading the spec. Precision is always implementation
> defined.
> >>> The
> >>>>>>>> spec specifies scale in several cases, but never precision for any
> >>>>> type or
> >>>>>>>> operation (addition/subtraction, multiplication, division).
> >>>>>>>>>
> >>>>>>>>> So we don't implement anything remotely approaching precision and
> >>>>> scale
> >>>>>>>> in CQL when it comes to numbers I think? So we aren't going to
> >> follow
> >>>>> the
> >>>>>>>> spec for scale. We are already pretty far down that road so I
> would
> >>>>> leave
> >>>>>>>> it alone.
> >>>>>>>>>
> >>>>>>>>> I don't think the spec is asking for the most approximate type.
> >> It's
> >>>>>>>> just saying the result is approximate, and the precision is
> >>>>> implementation
> >>>>>>>> defined. We could return either float or double. I think if one of
> >>> the
> >>>>>>>> operands is a double we should return a double because clearly the
> >>>>> schema
> >>>>>>>> thought a double was required to represent that number. I would
> >> also
> >>>>> be in
> >>>>>>>> favor of returning a double all the time so that people can expect
> >> a
> >>>>>>>> consistent type from expressions involving approximate numbers.
> >>>>>>>>>
> >>>>>>>>> I am a big fan of widening for arithmetic expressions in a
> >> database
> >>> to
> >>>>>>>> avoid having to error on overflow. You can go to the trouble of
> >> only
> >>>>>>>> widening the minimum amount, but I think it's simpler if we always
> >>>>> widen to
> >>>>>>>> bigint and double. This would be something the spec allows.
> >>>>>>>>>
> >>>>>>>>> Definitely if we can make overflow not occur we should and the
> >> spec
> >>>>>>>> allows that. We should also not return different types for the
> same
> >>>>> operand
> >>>>>>>> types just to work around overflow if we detect we need more
> >>> precision.
> >>>>>>>>>
> >>>>>>>>> Ariel
> >>>>>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
> >>>>>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for
> >> digging
> >>>>> this
> >>>>>>>>>> out (and Mike for getting some empirical examples).
> >>>>>>>>>>
> >>>>>>>>>> We still have to decide on the approximate data type to return;
> >>> right
> >>>>>>>>>> now, we have float+bigint=double, but float+int=float.  I think
> >>> this
> >>>>> is
> >>>>>>>>>> fairly inconsistent, and either the approximate type should
> >> always
> >>>>> win,
> >>>>>>>>>> or we should always upgrade to double for mixed operands.
> >>>>>>>>>>
> >>>>>>>>>> The quoted spec also suggests that decimal+float=float, and
> >> decimal
> >>>>>>>>>> +double=double, whereas we currently have decimal+float=decimal,
> >>> and
> >>>>>>>>>> decimal+double=decimal
> >>>>>>>>>>
> >>>>>>>>>> If we’re going to go with an approximate operand implying an
> >>>>>>>> approximate
> >>>>>>>>>> result, I think we should do it consistently (and consistent
> with
> >>> the
> >>>>>>>>>> SQL92 spec), and have the type of the approximate operand always
> >> be
> >>>>> the
> >>>>>>>>>> return type.
> >>>>>>>>>>
> >>>>>>>>>> This would still leave a decision for float+double, though.  The
> >>> most
> >>>>>>>>>> consistent behaviour with that stated above would be to always
> >> take
> >>>>> the
> >>>>>>>>>> most approximate type to return (i.e. float), but this would
> seem
> >>> to
> >>>>> me
> >>>>>>>>>> to be fairly unexpected for the user.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws>
> >>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> I agree with what's been said about expectations regarding
> >>>>> expressions
> >>>>>>>> involving floating point numbers. I think that if one of the
> inputs
> >>> is
> >>>>>>>> approximate then the result should be approximate.
> >>>>>>>>>>>
> >>>>>>>>>>> One thing we could look at for inspiration is the SQL spec. Not
> >> to
> >>>>>>>> follow dogmatically necessarily.
> >>>>>>>>>>>
> >>>>>>>>>>> From the SQL 92 spec regarding assignment
> >>>>>>>>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.contrib.andrew.cmu.edu_-7Eshadow_sql_sql1992.txt&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=2dMzYnFvO5Wf7J74IbDE27vxjfOX2xYT4-u7MEXUqHg&e=
> section
> >>> 4.6:
> >>>>>>>>>>> "
> >>>>>>>>>>>    Values of the data types NUMERIC, DECIMAL, INTEGER,
> >> SMALLINT,
> >>>>>>>>>>>    FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
> >>>>>>>> mutually
> >>>>>>>>>>>    comparable and mutually assignable. If an assignment would
> >>>>>>>> result
> >>>>>>>>>>>    in a loss of the most significant digits, an exception
> >>>>> condition
> >>>>>>>>>>>    is raised. If least significant digits are lost,
> >>>>> implementation-
> >>>>>>>>>>>    defined rounding or truncating occurs with no exception
> >>>>>>>> condition
> >>>>>>>>>>>    being raised. The rules for arithmetic are generally
> >> governed
> >>>>> by
> >>>>>>>>>>>    Subclause 6.12, "<numeric value expression>".
> >>>>>>>>>>> "
> >>>>>>>>>>>
> >>>>>>>>>>> Section 6.12 numeric value expressions:
> >>>>>>>>>>> "
> >>>>>>>>>>>    1) If the data type of both operands of a dyadic arithmetic
> >>>>>>>> opera-
> >>>>>>>>>>>       tor is exact numeric, then the data type of the result is
> >>>>>>>> exact
> >>>>>>>>>>>       numeric, with precision and scale determined as follows:
> >>>>>>>>>>> ...
> >>>>>>>>>>>    2) If the data type of either operand of a dyadic arithmetic
> >>>>> op-
> >>>>>>>>>>>       erator is approximate numeric, then the data type of the
> >>> re-
> >>>>>>>>>>>       sult is approximate numeric. The precision of the result
> >> is
> >>>>>>>>>>>       implementation-defined.
> >>>>>>>>>>> "
> >>>>>>>>>>>
> >>>>>>>>>>> And this makes sense to me. I think we should only return an
> >> exact
> >>>>>>>> result if both of the inputs are exact.
> >>>>>>>>>>>
> >>>>>>>>>>> I think we might want to look closely at the SQL spec and
> >>> especially
> >>>>>>>> when the spec requires an error to be generated. Those are
> >> sometimes
> >>>>> in the
> >>>>>>>> spec to prevent subtle paths to wrong answers. Any time we deviate
> >>>>> from the
> >>>>>>>> spec we should be asking why is it in the spec and why are we
> >>>>> deviating.
> >>>>>>>>>>>
> >>>>>>>>>>> Another issue besides overflow handling is how we determine
> >>>>> precision
> >>>>>>>> and scale for expressions involving two exact types.
> >>>>>>>>>>>
> >>>>>>>>>>> Ariel
> >>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm not sure if I would prefer the Postgres way of doing
> >> things,
> >>>>>>>> which is
> >>>>>>>>>>>> returning just about any type depending on the order of
> >>> operators.
> >>>>>>>>>>>> Considering it actually mentions in the docs that using
> >>>>>>>> numeric/decimal is
> >>>>>>>>>>>> slow and also multiple times that floating points are inexact.
> >> So
> >>>>>>>> doing
> >>>>>>>>>>>> some math with Postgres (9.6.5):
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double
> >>>>>>>>>>>> precision 2147483647
> >>>>>>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
> >>>>>>>>>>>> SELECT 2147483647::bigint*1.0::real returns double
> >>>>>>>>>>>> SELECT 2147483647::double precision*1::bigint returns double
> >>>>>>>> 2147483647
> >>>>>>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double
> >>>>>>>> 2147483647
> >>>>>>>>>>>>
> >>>>>>>>>>>> With + - we can get the same amount of mixture of returned
> >> types.
> >>>>>>>> There's
> >>>>>>>>>>>> no difference in those calculations, just some casting. To me
> >>>>>>>>>>>> floating-point math indicates inexactness and has errors and
> >>>>> whoever
> >>>>>>>> mixes
> >>>>>>>>>>>> up two different types should understand that. If one didn't
> >> want
> >>>>>>>> exact
> >>>>>>>>>>>> numeric type, why would the server return such? The floating
> >>> point
> >>>>>>>> value
> >>>>>>>>>>>> itself could be wrong already before the calculation - trying
> >> to
> >>>>> say
> >>>>>>>> we do
> >>>>>>>>>>>> it lossless is just wrong.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Fun with 2.65:
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
> >>>>>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT round(2.65) returns numeric 4
> >>>>>>>>>>>> SELECT round(2.65::double precision) returns double 4
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT 2.65 * 1 returns double 2.65
> >>>>>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
> >>>>>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
> >>>>>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
> >>>>>>>>>>>> SELECT round(2.65) * round(1) returns double 3
> >>>>>>>>>>>>
> >>>>>>>>>>>> So as we're going to have silly values in any case, why
> pretend
> >>>>>>>> something
> >>>>>>>>>>>> else? Also, exact calculations are slow if we crunch large
> >> amount
> >>>>> of
> >>>>>>>>>>>> numbers. I guess I slightly deviated towards Postgres'
> >>> implemention
> >>>>>>>> in this
> >>>>>>>>>>>> case, but I wish it wasn't used as a benchmark in this case.
> >> And
> >>>>> most
> >>>>>>>>>>>> importantly, I would definitely want the exact same type
> >> returned
> >>>>>>>> each time
> >>>>>>>>>>>> I do a calculation.
> >>>>>>>>>>>>
> >>>>>>>>>>>> - Micke
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
> >>>>>>>> benedict@apache.org>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> As far as I can tell we reached a relatively strong consensus
> >>>>> that we
> >>>>>>>>>>>>> should implement lossless casts by default?  Does anyone have
> >>>>>>>> anything more
> >>>>>>>>>>>>> to add?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Looking at the emails, everyone who participated and
> >> expressed a
> >>>>>>>>>>>>> preference was in favour of the “Postgres approach” of
> >> upcasting
> >>>>> to
> >>>>>>>> decimal
> >>>>>>>>>>>>> for mixed float/int operands?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I’d like to get a clear-cut decision on this, so we know what
> >>>>> we’re
> >>>>>>>> doing
> >>>>>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective
> >> decision
> >>>>> on
> >>>>>>>> Ariel’s
> >>>>>>>>>>>>> concerns about overflow, which I think are also pressing -
> >>>>>>>> particularly for
> >>>>>>>>>>>>> tinyint and smallint.  This does also impact implicit casts
> >> for
> >>>>> mixed
> >>>>>>>>>>>>> integer type operations, but an approach for these will
> >> probably
> >>>>>>>> fall out
> >>>>>>>>>>>>> of any decision on overflow.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
> >>>>>>>> murukesh.mohanan@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think you're conflating two things here. There's the loss
> >>>>>>>> resulting
> >>>>>>>>>>>>> from
> >>>>>>>>>>>>>> using some operators, and loss involved in casting. Dividing
> >> an
> >>>>>>>> integer
> >>>>>>>>>>>>> by
> >>>>>>>>>>>>>> another integer to obtain an integer result can result in
> >> loss,
> >>>>> but
> >>>>>>>>>>>>> there's
> >>>>>>>>>>>>>> no implicit casting there and no loss due to casting.
> >> Casting
> >>> an
> >>>>>>>> integer
> >>>>>>>>>>>>>> to a float can also result in loss. So dividing an integer
> >> by a
> >>>>>>>> float,
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>>> example, with an implicit cast has an additional avenue for
> >>> loss:
> >>>>>>>> the
> >>>>>>>>>>>>>> implicit cast for the operands so that they're of the same
> >>> type.
> >>>>> I
> >>>>>>>>>>>>> believe
> >>>>>>>>>>>>>> this discussion so far has been about the latter, not the
> >> loss
> >>>>> from
> >>>>>>>> the
> >>>>>>>>>>>>>> operations themselves.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
> >>>>>>>> benjamin.lerer@datastax.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I would like to try to clarify things a bit to help people
> >> to
> >>>>>>>> understand
> >>>>>>>>>>>>>>> the true complexity of the problem.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The *float *and *double *types are inexact numeric types.
> >> Not
> >>>>> only
> >>>>>>>> at
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> operation level.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If you insert 676543.21 in a *float* column and then read
> >> it,
> >>>>> you
> >>>>>>>> will
> >>>>>>>>>>>>>>> realize that the value has been truncated to 676543.2.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If you want accuracy the only way is to avoid those inexact
> >>>>> types.
> >>>>>>>>>>>>>>> Using *decimals
> >>>>>>>>>>>>>>> *during operations will mitigate the problem but will not
> >>> remove
> >>>>>>>> it.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am
> >> not
> >>>>>>>> mistaken
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar
> to
> >>>>> what
> >>>>>>>> MS SQL
> >>>>>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
> >>>>> precision
> >>>>>>>> if you
> >>>>>>>>>>>>>>> are not carefull.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If you truly need precision you can have it by using exact
> >>>>> numeric
> >>>>>>>> types
> >>>>>>>>>>>>>>> for your data types. Of course it has a cost on
> performance,
> >>>>>>>> memory and
> >>>>>>>>>>>>>>> disk usage.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The advantage of the current approach is that it give you
> >> the
> >>>>>>>> choice.
> >>>>>>>>>>>>> It is
> >>>>>>>>>>>>>>> up to you to decide what you need for your application. It
> >> is
> >>>>> also
> >>>>>>>> in
> >>>>>>>>>>>>> line
> >>>>>>>>>>>>>>> with the way CQL behave everywhere else.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Muru
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>>>>>>> For additional commands, e-mail:
> >> dev-help@cassandra.apache.org
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>> ---------------------------------------------------------------------
> >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>
> >>>>>>>> --
> >>>>>>> Jon Haddad
> >>>>>>>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
> >>>>>>> twitter: rustyrazorblade
> >>>>>>
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>
> >>>>>
> >>>
> >>> --
> >> Jon Haddad
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=Jad7nE1Oab1mebx31r7AOfSsa0by8th6tCxpykmmOBA&m=vuYFCiEg1Hk9RcozkHxMcCqfg4quy5zdS6jn4LoxIog&s=nIwl4l-6xszzYOOWiSHkxLYvgGVVdlf_izS5h1pfOck&e=
> >> twitter: rustyrazorblade
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Implicit Casts for Arithmetic Operators

Posted by Benedict Elliott Smith <be...@apache.org>.

We’re not presently voting*; we’re only discussing, whether we should base our behaviour on a widely agreed upon standard.

I think perhaps the nub of our disagreement is that, in my view, this is the only relevant fact to decide.  There is no data to base this decision upon.  It’s axiomatic, or ideological; procedural, not technical:  Do we think we should try to hew to standards (where appropriate), or do we think we should stick with what we arrived at in an adhoc manner?

If we believe the former, as I now do, then the current state is only relevant when we come to implement the decision.


* But given how peripheral and inherently ideological this decision is, and how meandering the discussion was with no clear consensus, it seemed to need a vote in the near future.  The prospect of a vote seems to have brought some healthy debate forward too, which is great, but I apologise if this somehow came across as presumptuous.


> On 22 Nov 2018, at 09:26, Sylvain Lebresne <le...@gmail.com> wrote:
> 
> I'm not saying "let's not do this no matter what and ever fix technical
> debt", nor am I fearing decision.
> 
> But I *do* think decisions, technical ones at least, should be fact and
> data driven. And I'm not even sure why we're talking of having a vote here.
> The Apache Way is *not* meant to be primarily vote-driven, votes are
> supposed to be a last resort when, after having debated facts and data, no
> consensus can be reached. Can we have the debate on facts and data first?
> Please.
> 
> At the of the day, I object to: "There are still a number of unresolved
> issues, but to make progress I wonder if it would first be helpful to have
> a vote on ensuring we are ANSI SQL 92 compliant for our arithmetic?". More
> specifically, I disagree that such vote is a good starting point. Let's
> identify and discuss the unresolved issues first. Let's check precisely
> what getting our arithmetic ANSI SQL 92 compliant means and how we can get
> it. I do support the idea of making such analysis btw, it would be good
> data, but no vote is needed whatsoever to make it. Again, I object to
> voting first and doing the analysis 2nd.
> 
> --
> Sylvain
> 
> 
> On Thu, Nov 22, 2018 at 1:25 AM Jonathan Haddad <jo...@jonhaddad.com> wrote:
> 
>> I can’t agree more. We should be able to make changes in a manner that
>> improves the DB In the long term, rather than live with the technical debt
>> of arbitrary decisions made by a handful of people.
>> 
>> I also agree that putting a knob in place to let people migrate over is a
>> reasonable decision.
>> 
>> Jon
>> 
>> On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith <
>> benedict@apache.org>
>> wrote:
>> 
>>> The goal is simply to agree on a set of well-defined principles for how
>> we
>>> should behave.  If we don’t like the implications that arise, we’ll have
>>> another vote?  A democracy cannot bind itself, so I never understood this
>>> fear of a decision.
>>> 
>>> A database also has a thousand toggles.  If we absolutely need to, we can
>>> introduce one more.
>>> 
>>> We should be doing this upfront a great deal more often.  Doing it
>>> retrospectively sucks, but in my opinion it's a bad reason to bind
>>> ourselves to whatever made it in.
>>> 
>>> Do we anywhere define the principles of our current behaviour?  I
>> couldn’t
>>> find it.
>>> 
>>> 
>>>> On 21 Nov 2018, at 21:08, Sylvain Lebresne <le...@gmail.com> wrote:
>>>> 
>>>> On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <
>>> benedict@apache.org>
>>>> wrote:
>>>> 
>>>>> FWIW, my meaning of arithmetic in this context extends to any features
>>> we
>>>>> have already released (such as aggregates, and perhaps other built-in
>>>>> functions) that operate on the same domain.  We should be consistent,
>>> after
>>>>> all.
>>>>> 
>>>>> Whether or not we need to revisit any existing functionality we can
>>> figure
>>>>> out after the fact, once we have agreed what our behaviour should be.
>>>>> 
>>>> 
>>>> I'm not sure I correctly understand the process suggested, but I don't
>>>> particularly like/agree with what I understand. What I understand is a
>>>> suggestion for voting on agreeing to be ANSI SQL 92 compliant, with no
>>> real
>>>> evaluation of what that entails (at least I haven't seen one), and that
>>>> this vote, if passed, would imply we'd then make any backward
>>> incompatible
>>>> change necessary to achieve compliance ("my meaning of arithmetic in
>> this
>>>> context extends to any features we have already released" and "Whether
>> or
>>>> not we need to revisit any existing functionality we can figure out
>> after
>>>> the fact, once we have agreed what our behaviour should be").
>>>> 
>>>> This might make sense of a new product, but at our stage that seems
>>>> backward to me. I think we owe our users to first make the effort of
>>>> identifying what "inconsistencies" our existing arithmetic has[1] and
>>>> _then_ consider what options we have to fix those, with their pros and
>>> cons
>>>> (including how bad they break backward compatibility). And if _then_
>>>> getting ANSI SQL 92 compliant proves to not be disruptive (or at least
>>>> acceptably so), then sure, that's great.
>>>> 
>>>> [1]: one possibly efficient way to do that could actually be to compare
>>> our
>>>> arithmetic to ANSI SQL 92. Not that all differences found would imply
>>>> inconsistencies/wrongness of our arithmetic, but still, it should be
>>>> helpful. And I guess my whole point is that we should that analysis
>>> first,
>>>> and then maybe decide that being ANSI SQL 92 is a reasonable option,
>> not
>>>> decide first and live with the consequences no matter what they are.
>>>> 
>>>> --
>>>> Sylvain
>>>> 
>>>> 
>>>>> I will make this more explicit for the vote, but just to clarify the
>>>>> intention so that we are all discussing the same thing.
>>>>> 
>>>>> 
>>>>>> On 20 Nov 2018, at 14:18, Ariel Weisberg <ad...@fastmail.fm>
>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> +1
>>>>>> 
>>>>>> This is a public API so we will be much better off if we get it right
>>>>> the first time.
>>>>>> 
>>>>>> Ariel
>>>>>> 
>>>>>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <jo...@jonhaddad.com>
>>>>> wrote:
>>>>>>> 
>>>>>>> Sounds good to me.
>>>>>>> 
>>>>>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
>>>>> benedict@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> So, this thread somewhat petered out.
>>>>>>>> 
>>>>>>>> There are still a number of unresolved issues, but to make
>> progress I
>>>>>>>> wonder if it would first be helpful to have a vote on ensuring we
>> are
>>>>> ANSI
>>>>>>>> SQL 92 compliant for our arithmetic?  This seems like a sensible
>>>>> baseline,
>>>>>>>> since we will hopefully minimise surprise to operators this way.
>>>>>>>> 
>>>>>>>> If people largely agree, I will call a vote, and we can pick up a
>>>>> couple
>>>>>>>> of more focused discussions afterwards on how we interpret the
>> leeway
>>>>> it
>>>>>>>> gives.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws>
>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> From reading the spec. Precision is always implementation defined.
>>> The
>>>>>>>> spec specifies scale in several cases, but never precision for any
>>>>> type or
>>>>>>>> operation (addition/subtraction, multiplication, division).
>>>>>>>>> 
>>>>>>>>> So we don't implement anything remotely approaching precision and
>>>>> scale
>>>>>>>> in CQL when it comes to numbers I think? So we aren't going to
>> follow
>>>>> the
>>>>>>>> spec for scale. We are already pretty far down that road so I would
>>>>> leave
>>>>>>>> it alone.
>>>>>>>>> 
>>>>>>>>> I don't think the spec is asking for the most approximate type.
>> It's
>>>>>>>> just saying the result is approximate, and the precision is
>>>>> implementation
>>>>>>>> defined. We could return either float or double. I think if one of
>>> the
>>>>>>>> operands is a double we should return a double because clearly the
>>>>> schema
>>>>>>>> thought a double was required to represent that number. I would
>> also
>>>>> be in
>>>>>>>> favor of returning a double all the time so that people can expect
>> a
>>>>>>>> consistent type from expressions involving approximate numbers.
>>>>>>>>> 
>>>>>>>>> I am a big fan of widening for arithmetic expressions in a
>> database
>>> to
>>>>>>>> avoid having to error on overflow. You can go to the trouble of
>> only
>>>>>>>> widening the minimum amount, but I think it's simpler if we always
>>>>> widen to
>>>>>>>> bigint and double. This would be something the spec allows.
>>>>>>>>> 
>>>>>>>>> Definitely if we can make overflow not occur we should and the
>> spec
>>>>>>>> allows that. We should also not return different types for the same
>>>>> operand
>>>>>>>> types just to work around overflow if we detect we need more
>>> precision.
>>>>>>>>> 
>>>>>>>>> Ariel
>>>>>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
>>>>>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for
>> digging
>>>>> this
>>>>>>>>>> out (and Mike for getting some empirical examples).
>>>>>>>>>> 
>>>>>>>>>> We still have to decide on the approximate data type to return;
>>> right
>>>>>>>>>> now, we have float+bigint=double, but float+int=float.  I think
>>> this
>>>>> is
>>>>>>>>>> fairly inconsistent, and either the approximate type should
>> always
>>>>> win,
>>>>>>>>>> or we should always upgrade to double for mixed operands.
>>>>>>>>>> 
>>>>>>>>>> The quoted spec also suggests that decimal+float=float, and
>> decimal
>>>>>>>>>> +double=double, whereas we currently have decimal+float=decimal,
>>> and
>>>>>>>>>> decimal+double=decimal
>>>>>>>>>> 
>>>>>>>>>> If we’re going to go with an approximate operand implying an
>>>>>>>> approximate
>>>>>>>>>> result, I think we should do it consistently (and consistent with
>>> the
>>>>>>>>>> SQL92 spec), and have the type of the approximate operand always
>> be
>>>>> the
>>>>>>>>>> return type.
>>>>>>>>>> 
>>>>>>>>>> This would still leave a decision for float+double, though.  The
>>> most
>>>>>>>>>> consistent behaviour with that stated above would be to always
>> take
>>>>> the
>>>>>>>>>> most approximate type to return (i.e. float), but this would seem
>>> to
>>>>> me
>>>>>>>>>> to be fairly unexpected for the user.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws>
>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> I agree with what's been said about expectations regarding
>>>>> expressions
>>>>>>>> involving floating point numbers. I think that if one of the inputs
>>> is
>>>>>>>> approximate then the result should be approximate.
>>>>>>>>>>> 
>>>>>>>>>>> One thing we could look at for inspiration is the SQL spec. Not
>> to
>>>>>>>> follow dogmatically necessarily.
>>>>>>>>>>> 
>>>>>>>>>>> From the SQL 92 spec regarding assignment
>>>>>>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section
>>> 4.6:
>>>>>>>>>>> "
>>>>>>>>>>>    Values of the data types NUMERIC, DECIMAL, INTEGER,
>> SMALLINT,
>>>>>>>>>>>    FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
>>>>>>>> mutually
>>>>>>>>>>>    comparable and mutually assignable. If an assignment would
>>>>>>>> result
>>>>>>>>>>>    in a loss of the most significant digits, an exception
>>>>> condition
>>>>>>>>>>>    is raised. If least significant digits are lost,
>>>>> implementation-
>>>>>>>>>>>    defined rounding or truncating occurs with no exception
>>>>>>>> condition
>>>>>>>>>>>    being raised. The rules for arithmetic are generally
>> governed
>>>>> by
>>>>>>>>>>>    Subclause 6.12, "<numeric value expression>".
>>>>>>>>>>> "
>>>>>>>>>>> 
>>>>>>>>>>> Section 6.12 numeric value expressions:
>>>>>>>>>>> "
>>>>>>>>>>>    1) If the data type of both operands of a dyadic arithmetic
>>>>>>>> opera-
>>>>>>>>>>>       tor is exact numeric, then the data type of the result is
>>>>>>>> exact
>>>>>>>>>>>       numeric, with precision and scale determined as follows:
>>>>>>>>>>> ...
>>>>>>>>>>>    2) If the data type of either operand of a dyadic arithmetic
>>>>> op-
>>>>>>>>>>>       erator is approximate numeric, then the data type of the
>>> re-
>>>>>>>>>>>       sult is approximate numeric. The precision of the result
>> is
>>>>>>>>>>>       implementation-defined.
>>>>>>>>>>> "
>>>>>>>>>>> 
>>>>>>>>>>> And this makes sense to me. I think we should only return an
>> exact
>>>>>>>> result if both of the inputs are exact.
>>>>>>>>>>> 
>>>>>>>>>>> I think we might want to look closely at the SQL spec and
>>> especially
>>>>>>>> when the spec requires an error to be generated. Those are
>> sometimes
>>>>> in the
>>>>>>>> spec to prevent subtle paths to wrong answers. Any time we deviate
>>>>> from the
>>>>>>>> spec we should be asking why is it in the spec and why are we
>>>>> deviating.
>>>>>>>>>>> 
>>>>>>>>>>> Another issue besides overflow handling is how we determine
>>>>> precision
>>>>>>>> and scale for expressions involving two exact types.
>>>>>>>>>>> 
>>>>>>>>>>> Ariel
>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm not sure if I would prefer the Postgres way of doing
>> things,
>>>>>>>> which is
>>>>>>>>>>>> returning just about any type depending on the order of
>>> operators.
>>>>>>>>>>>> Considering it actually mentions in the docs that using
>>>>>>>> numeric/decimal is
>>>>>>>>>>>> slow and also multiple times that floating points are inexact.
>> So
>>>>>>>> doing
>>>>>>>>>>>> some math with Postgres (9.6.5):
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double
>>>>>>>>>>>> precision 2147483647
>>>>>>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
>>>>>>>>>>>> SELECT 2147483647::bigint*1.0::real returns double
>>>>>>>>>>>> SELECT 2147483647::double precision*1::bigint returns double
>>>>>>>> 2147483647
>>>>>>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double
>>>>>>>> 2147483647
>>>>>>>>>>>> 
>>>>>>>>>>>> With + - we can get the same amount of mixture of returned
>> types.
>>>>>>>> There's
>>>>>>>>>>>> no difference in those calculations, just some casting. To me
>>>>>>>>>>>> floating-point math indicates inexactness and has errors and
>>>>> whoever
>>>>>>>> mixes
>>>>>>>>>>>> up two different types should understand that. If one didn't
>> want
>>>>>>>> exact
>>>>>>>>>>>> numeric type, why would the server return such? The floating
>>> point
>>>>>>>> value
>>>>>>>>>>>> itself could be wrong already before the calculation - trying
>> to
>>>>> say
>>>>>>>> we do
>>>>>>>>>>>> it lossless is just wrong.
>>>>>>>>>>>> 
>>>>>>>>>>>> Fun with 2.65:
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
>>>>>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT round(2.65) returns numeric 4
>>>>>>>>>>>> SELECT round(2.65::double precision) returns double 4
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT 2.65 * 1 returns double 2.65
>>>>>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
>>>>>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
>>>>>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
>>>>>>>>>>>> SELECT round(2.65) * round(1) returns double 3
>>>>>>>>>>>> 
>>>>>>>>>>>> So as we're going to have silly values in any case, why pretend
>>>>>>>> something
>>>>>>>>>>>> else? Also, exact calculations are slow if we crunch large
>> amount
>>>>> of
>>>>>>>>>>>> numbers. I guess I slightly deviated towards Postgres'
>>> implemention
>>>>>>>> in this
>>>>>>>>>>>> case, but I wish it wasn't used as a benchmark in this case.
>> And
>>>>> most
>>>>>>>>>>>> importantly, I would definitely want the exact same type
>> returned
>>>>>>>> each time
>>>>>>>>>>>> I do a calculation.
>>>>>>>>>>>> 
>>>>>>>>>>>> - Micke
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
>>>>>>>> benedict@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> As far as I can tell we reached a relatively strong consensus
>>>>> that we
>>>>>>>>>>>>> should implement lossless casts by default?  Does anyone have
>>>>>>>> anything more
>>>>>>>>>>>>> to add?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Looking at the emails, everyone who participated and
>> expressed a
>>>>>>>>>>>>> preference was in favour of the “Postgres approach” of
>> upcasting
>>>>> to
>>>>>>>> decimal
>>>>>>>>>>>>> for mixed float/int operands?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I’d like to get a clear-cut decision on this, so we know what
>>>>> we’re
>>>>>>>> doing
>>>>>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective
>> decision
>>>>> on
>>>>>>>> Ariel’s
>>>>>>>>>>>>> concerns about overflow, which I think are also pressing -
>>>>>>>> particularly for
>>>>>>>>>>>>> tinyint and smallint.  This does also impact implicit casts
>> for
>>>>> mixed
>>>>>>>>>>>>> integer type operations, but an approach for these will
>> probably
>>>>>>>> fall out
>>>>>>>>>>>>> of any decision on overflow.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
>>>>>>>> murukesh.mohanan@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I think you're conflating two things here. There's the loss
>>>>>>>> resulting
>>>>>>>>>>>>> from
>>>>>>>>>>>>>> using some operators, and loss involved in casting. Dividing
>> an
>>>>>>>> integer
>>>>>>>>>>>>> by
>>>>>>>>>>>>>> another integer to obtain an integer result can result in
>> loss,
>>>>> but
>>>>>>>>>>>>> there's
>>>>>>>>>>>>>> no implicit casting there and no loss due to casting.
>> Casting
>>> an
>>>>>>>> integer
>>>>>>>>>>>>>> to a float can also result in loss. So dividing an integer
>> by a
>>>>>>>> float,
>>>>>>>>>>>>> for
>>>>>>>>>>>>>> example, with an implicit cast has an additional avenue for
>>> loss:
>>>>>>>> the
>>>>>>>>>>>>>> implicit cast for the operands so that they're of the same
>>> type.
>>>>> I
>>>>>>>>>>>>> believe
>>>>>>>>>>>>>> this discussion so far has been about the latter, not the
>> loss
>>>>> from
>>>>>>>> the
>>>>>>>>>>>>>> operations themselves.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
>>>>>>>> benjamin.lerer@datastax.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I would like to try to clarify things a bit to help people
>> to
>>>>>>>> understand
>>>>>>>>>>>>>>> the true complexity of the problem.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The *float *and *double *types are inexact numeric types.
>> Not
>>>>> only
>>>>>>>> at
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> operation level.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If you insert 676543.21 in a *float* column and then read
>> it,
>>>>> you
>>>>>>>> will
>>>>>>>>>>>>>>> realize that the value has been truncated to 676543.2.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If you want accuracy the only way is to avoid those inexact
>>>>> types.
>>>>>>>>>>>>>>> Using *decimals
>>>>>>>>>>>>>>> *during operations will mitigate the problem but will not
>>> remove
>>>>>>>> it.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am
>> not
>>>>>>>> mistaken
>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to
>>>>> what
>>>>>>>> MS SQL
>>>>>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
>>>>> precision
>>>>>>>> if you
>>>>>>>>>>>>>>> are not carefull.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If you truly need precision you can have it by using exact
>>>>> numeric
>>>>>>>> types
>>>>>>>>>>>>>>> for your data types. Of course it has a cost on performance,
>>>>>>>> memory and
>>>>>>>>>>>>>>> disk usage.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The advantage of the current approach is that it give you
>> the
>>>>>>>> choice.
>>>>>>>>>>>>> It is
>>>>>>>>>>>>>>> up to you to decide what you need for your application. It
>> is
>>>>> also
>>>>>>>> in
>>>>>>>>>>>>> line
>>>>>>>>>>>>>>> with the way CQL behave everywhere else.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Muru
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>>>> For additional commands, e-mail:
>> dev-help@cassandra.apache.org
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>> 
>>>>>>>> --
>>>>>>> Jon Haddad
>>>>>>> http://www.rustyrazorblade.com
>>>>>>> twitter: rustyrazorblade
>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> 
>>>>> 
>>> 
>>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Posted by Sylvain Lebresne <le...@gmail.com>.

I'm not saying "let's not do this no matter what and ever fix technical
debt", nor am I fearing decision.

But I *do* think decisions, technical ones at least, should be fact and
data driven. And I'm not even sure why we're talking of having a vote here.
The Apache Way is *not* meant to be primarily vote-driven, votes are
supposed to be a last resort when, after having debated facts and data, no
consensus can be reached. Can we have the debate on facts and data first?
Please.

At the of the day, I object to: "There are still a number of unresolved
issues, but to make progress I wonder if it would first be helpful to have
a vote on ensuring we are ANSI SQL 92 compliant for our arithmetic?". More
specifically, I disagree that such vote is a good starting point. Let's
identify and discuss the unresolved issues first. Let's check precisely
what getting our arithmetic ANSI SQL 92 compliant means and how we can get
it. I do support the idea of making such analysis btw, it would be good
data, but no vote is needed whatsoever to make it. Again, I object to
voting first and doing the analysis 2nd.

--
Sylvain


On Thu, Nov 22, 2018 at 1:25 AM Jonathan Haddad <jo...@jonhaddad.com> wrote:

> I can’t agree more. We should be able to make changes in a manner that
> improves the DB In the long term, rather than live with the technical debt
> of arbitrary decisions made by a handful of people.
>
> I also agree that putting a knob in place to let people migrate over is a
> reasonable decision.
>
> Jon
>
> On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith <
> benedict@apache.org>
> wrote:
>
> > The goal is simply to agree on a set of well-defined principles for how
> we
> > should behave.  If we don’t like the implications that arise, we’ll have
> > another vote?  A democracy cannot bind itself, so I never understood this
> > fear of a decision.
> >
> > A database also has a thousand toggles.  If we absolutely need to, we can
> > introduce one more.
> >
> > We should be doing this upfront a great deal more often.  Doing it
> > retrospectively sucks, but in my opinion it's a bad reason to bind
> > ourselves to whatever made it in.
> >
> > Do we anywhere define the principles of our current behaviour?  I
> couldn’t
> > find it.
> >
> >
> > > On 21 Nov 2018, at 21:08, Sylvain Lebresne <le...@gmail.com> wrote:
> > >
> > > On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <
> > benedict@apache.org>
> > > wrote:
> > >
> > >> FWIW, my meaning of arithmetic in this context extends to any features
> > we
> > >> have already released (such as aggregates, and perhaps other built-in
> > >> functions) that operate on the same domain.  We should be consistent,
> > after
> > >> all.
> > >>
> > >> Whether or not we need to revisit any existing functionality we can
> > figure
> > >> out after the fact, once we have agreed what our behaviour should be.
> > >>
> > >
> > > I'm not sure I correctly understand the process suggested, but I don't
> > > particularly like/agree with what I understand. What I understand is a
> > > suggestion for voting on agreeing to be ANSI SQL 92 compliant, with no
> > real
> > > evaluation of what that entails (at least I haven't seen one), and that
> > > this vote, if passed, would imply we'd then make any backward
> > incompatible
> > > change necessary to achieve compliance ("my meaning of arithmetic in
> this
> > > context extends to any features we have already released" and "Whether
> or
> > > not we need to revisit any existing functionality we can figure out
> after
> > > the fact, once we have agreed what our behaviour should be").
> > >
> > > This might make sense of a new product, but at our stage that seems
> > > backward to me. I think we owe our users to first make the effort of
> > > identifying what "inconsistencies" our existing arithmetic has[1] and
> > > _then_ consider what options we have to fix those, with their pros and
> > cons
> > > (including how bad they break backward compatibility). And if _then_
> > > getting ANSI SQL 92 compliant proves to not be disruptive (or at least
> > > acceptably so), then sure, that's great.
> > >
> > > [1]: one possibly efficient way to do that could actually be to compare
> > our
> > > arithmetic to ANSI SQL 92. Not that all differences found would imply
> > > inconsistencies/wrongness of our arithmetic, but still, it should be
> > > helpful. And I guess my whole point is that we should that analysis
> > first,
> > > and then maybe decide that being ANSI SQL 92 is a reasonable option,
> not
> > > decide first and live with the consequences no matter what they are.
> > >
> > > --
> > > Sylvain
> > >
> > >
> > >> I will make this more explicit for the vote, but just to clarify the
> > >> intention so that we are all discussing the same thing.
> > >>
> > >>
> > >>> On 20 Nov 2018, at 14:18, Ariel Weisberg <ad...@fastmail.fm>
> wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> +1
> > >>>
> > >>> This is a public API so we will be much better off if we get it right
> > >> the first time.
> > >>>
> > >>> Ariel
> > >>>
> > >>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <jo...@jonhaddad.com>
> > >> wrote:
> > >>>>
> > >>>> Sounds good to me.
> > >>>>
> > >>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
> > >> benedict@apache.org>
> > >>>> wrote:
> > >>>>
> > >>>>> So, this thread somewhat petered out.
> > >>>>>
> > >>>>> There are still a number of unresolved issues, but to make
> progress I
> > >>>>> wonder if it would first be helpful to have a vote on ensuring we
> are
> > >> ANSI
> > >>>>> SQL 92 compliant for our arithmetic?  This seems like a sensible
> > >> baseline,
> > >>>>> since we will hopefully minimise surprise to operators this way.
> > >>>>>
> > >>>>> If people largely agree, I will call a vote, and we can pick up a
> > >> couple
> > >>>>> of more focused discussions afterwards on how we interpret the
> leeway
> > >> it
> > >>>>> gives.
> > >>>>>
> > >>>>>
> > >>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws>
> wrote:
> > >>>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> From reading the spec. Precision is always implementation defined.
> > The
> > >>>>> spec specifies scale in several cases, but never precision for any
> > >> type or
> > >>>>> operation (addition/subtraction, multiplication, division).
> > >>>>>>
> > >>>>>> So we don't implement anything remotely approaching precision and
> > >> scale
> > >>>>> in CQL when it comes to numbers I think? So we aren't going to
> follow
> > >> the
> > >>>>> spec for scale. We are already pretty far down that road so I would
> > >> leave
> > >>>>> it alone.
> > >>>>>>
> > >>>>>> I don't think the spec is asking for the most approximate type.
> It's
> > >>>>> just saying the result is approximate, and the precision is
> > >> implementation
> > >>>>> defined. We could return either float or double. I think if one of
> > the
> > >>>>> operands is a double we should return a double because clearly the
> > >> schema
> > >>>>> thought a double was required to represent that number. I would
> also
> > >> be in
> > >>>>> favor of returning a double all the time so that people can expect
> a
> > >>>>> consistent type from expressions involving approximate numbers.
> > >>>>>>
> > >>>>>> I am a big fan of widening for arithmetic expressions in a
> database
> > to
> > >>>>> avoid having to error on overflow. You can go to the trouble of
> only
> > >>>>> widening the minimum amount, but I think it's simpler if we always
> > >> widen to
> > >>>>> bigint and double. This would be something the spec allows.
> > >>>>>>
> > >>>>>> Definitely if we can make overflow not occur we should and the
> spec
> > >>>>> allows that. We should also not return different types for the same
> > >> operand
> > >>>>> types just to work around overflow if we detect we need more
> > precision.
> > >>>>>>
> > >>>>>> Ariel
> > >>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
> > >>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for
> digging
> > >> this
> > >>>>>>> out (and Mike for getting some empirical examples).
> > >>>>>>>
> > >>>>>>> We still have to decide on the approximate data type to return;
> > right
> > >>>>>>> now, we have float+bigint=double, but float+int=float.  I think
> > this
> > >> is
> > >>>>>>> fairly inconsistent, and either the approximate type should
> always
> > >> win,
> > >>>>>>> or we should always upgrade to double for mixed operands.
> > >>>>>>>
> > >>>>>>> The quoted spec also suggests that decimal+float=float, and
> decimal
> > >>>>>>> +double=double, whereas we currently have decimal+float=decimal,
> > and
> > >>>>>>> decimal+double=decimal
> > >>>>>>>
> > >>>>>>> If we’re going to go with an approximate operand implying an
> > >>>>> approximate
> > >>>>>>> result, I think we should do it consistently (and consistent with
> > the
> > >>>>>>> SQL92 spec), and have the type of the approximate operand always
> be
> > >> the
> > >>>>>>> return type.
> > >>>>>>>
> > >>>>>>> This would still leave a decision for float+double, though.  The
> > most
> > >>>>>>> consistent behaviour with that stated above would be to always
> take
> > >> the
> > >>>>>>> most approximate type to return (i.e. float), but this would seem
> > to
> > >> me
> > >>>>>>> to be fairly unexpected for the user.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws>
> > wrote:
> > >>>>>>>>
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> I agree with what's been said about expectations regarding
> > >> expressions
> > >>>>> involving floating point numbers. I think that if one of the inputs
> > is
> > >>>>> approximate then the result should be approximate.
> > >>>>>>>>
> > >>>>>>>> One thing we could look at for inspiration is the SQL spec. Not
> to
> > >>>>> follow dogmatically necessarily.
> > >>>>>>>>
> > >>>>>>>> From the SQL 92 spec regarding assignment
> > >>>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section
> > 4.6:
> > >>>>>>>> "
> > >>>>>>>>     Values of the data types NUMERIC, DECIMAL, INTEGER,
> SMALLINT,
> > >>>>>>>>     FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
> > >>>>> mutually
> > >>>>>>>>     comparable and mutually assignable. If an assignment would
> > >>>>> result
> > >>>>>>>>     in a loss of the most significant digits, an exception
> > >> condition
> > >>>>>>>>     is raised. If least significant digits are lost,
> > >> implementation-
> > >>>>>>>>     defined rounding or truncating occurs with no exception
> > >>>>> condition
> > >>>>>>>>     being raised. The rules for arithmetic are generally
> governed
> > >> by
> > >>>>>>>>     Subclause 6.12, "<numeric value expression>".
> > >>>>>>>> "
> > >>>>>>>>
> > >>>>>>>> Section 6.12 numeric value expressions:
> > >>>>>>>> "
> > >>>>>>>>     1) If the data type of both operands of a dyadic arithmetic
> > >>>>> opera-
> > >>>>>>>>        tor is exact numeric, then the data type of the result is
> > >>>>> exact
> > >>>>>>>>        numeric, with precision and scale determined as follows:
> > >>>>>>>> ...
> > >>>>>>>>     2) If the data type of either operand of a dyadic arithmetic
> > >> op-
> > >>>>>>>>        erator is approximate numeric, then the data type of the
> > re-
> > >>>>>>>>        sult is approximate numeric. The precision of the result
> is
> > >>>>>>>>        implementation-defined.
> > >>>>>>>> "
> > >>>>>>>>
> > >>>>>>>> And this makes sense to me. I think we should only return an
> exact
> > >>>>> result if both of the inputs are exact.
> > >>>>>>>>
> > >>>>>>>> I think we might want to look closely at the SQL spec and
> > especially
> > >>>>> when the spec requires an error to be generated. Those are
> sometimes
> > >> in the
> > >>>>> spec to prevent subtle paths to wrong answers. Any time we deviate
> > >> from the
> > >>>>> spec we should be asking why is it in the spec and why are we
> > >> deviating.
> > >>>>>>>>
> > >>>>>>>> Another issue besides overflow handling is how we determine
> > >> precision
> > >>>>> and scale for expressions involving two exact types.
> > >>>>>>>>
> > >>>>>>>> Ariel
> > >>>>>>>>
> > >>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> > >>>>>>>>> Hi,
> > >>>>>>>>>
> > >>>>>>>>> I'm not sure if I would prefer the Postgres way of doing
> things,
> > >>>>> which is
> > >>>>>>>>> returning just about any type depending on the order of
> > operators.
> > >>>>>>>>> Considering it actually mentions in the docs that using
> > >>>>> numeric/decimal is
> > >>>>>>>>> slow and also multiple times that floating points are inexact.
> So
> > >>>>> doing
> > >>>>>>>>> some math with Postgres (9.6.5):
> > >>>>>>>>>
> > >>>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double
> > >>>>>>>>> precision 2147483647
> > >>>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
> > >>>>>>>>> SELECT 2147483647::bigint*1.0::real returns double
> > >>>>>>>>> SELECT 2147483647::double precision*1::bigint returns double
> > >>>>> 2147483647
> > >>>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double
> > >>>>> 2147483647
> > >>>>>>>>>
> > >>>>>>>>> With + - we can get the same amount of mixture of returned
> types.
> > >>>>> There's
> > >>>>>>>>> no difference in those calculations, just some casting. To me
> > >>>>>>>>> floating-point math indicates inexactness and has errors and
> > >> whoever
> > >>>>> mixes
> > >>>>>>>>> up two different types should understand that. If one didn't
> want
> > >>>>> exact
> > >>>>>>>>> numeric type, why would the server return such? The floating
> > point
> > >>>>> value
> > >>>>>>>>> itself could be wrong already before the calculation - trying
> to
> > >> say
> > >>>>> we do
> > >>>>>>>>> it lossless is just wrong.
> > >>>>>>>>>
> > >>>>>>>>> Fun with 2.65:
> > >>>>>>>>>
> > >>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
> > >>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
> > >>>>>>>>>
> > >>>>>>>>> SELECT round(2.65) returns numeric 4
> > >>>>>>>>> SELECT round(2.65::double precision) returns double 4
> > >>>>>>>>>
> > >>>>>>>>> SELECT 2.65 * 1 returns double 2.65
> > >>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
> > >>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
> > >>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
> > >>>>>>>>>
> > >>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
> > >>>>>>>>> SELECT round(2.65) * round(1) returns double 3
> > >>>>>>>>>
> > >>>>>>>>> So as we're going to have silly values in any case, why pretend
> > >>>>> something
> > >>>>>>>>> else? Also, exact calculations are slow if we crunch large
> amount
> > >> of
> > >>>>>>>>> numbers. I guess I slightly deviated towards Postgres'
> > implemention
> > >>>>> in this
> > >>>>>>>>> case, but I wish it wasn't used as a benchmark in this case.
> And
> > >> most
> > >>>>>>>>> importantly, I would definitely want the exact same type
> returned
> > >>>>> each time
> > >>>>>>>>> I do a calculation.
> > >>>>>>>>>
> > >>>>>>>>> - Micke
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
> > >>>>> benedict@apache.org>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> As far as I can tell we reached a relatively strong consensus
> > >> that we
> > >>>>>>>>>> should implement lossless casts by default?  Does anyone have
> > >>>>> anything more
> > >>>>>>>>>> to add?
> > >>>>>>>>>>
> > >>>>>>>>>> Looking at the emails, everyone who participated and
> expressed a
> > >>>>>>>>>> preference was in favour of the “Postgres approach” of
> upcasting
> > >> to
> > >>>>> decimal
> > >>>>>>>>>> for mixed float/int operands?
> > >>>>>>>>>>
> > >>>>>>>>>> I’d like to get a clear-cut decision on this, so we know what
> > >> we’re
> > >>>>> doing
> > >>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective
> decision
> > >> on
> > >>>>> Ariel’s
> > >>>>>>>>>> concerns about overflow, which I think are also pressing -
> > >>>>> particularly for
> > >>>>>>>>>> tinyint and smallint.  This does also impact implicit casts
> for
> > >> mixed
> > >>>>>>>>>> integer type operations, but an approach for these will
> probably
> > >>>>> fall out
> > >>>>>>>>>> of any decision on overflow.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
> > >>>>> murukesh.mohanan@gmail.com>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> I think you're conflating two things here. There's the loss
> > >>>>> resulting
> > >>>>>>>>>> from
> > >>>>>>>>>>> using some operators, and loss involved in casting. Dividing
> an
> > >>>>> integer
> > >>>>>>>>>> by
> > >>>>>>>>>>> another integer to obtain an integer result can result in
> loss,
> > >> but
> > >>>>>>>>>> there's
> > >>>>>>>>>>> no implicit casting there and no loss due to casting.
> Casting
> > an
> > >>>>> integer
> > >>>>>>>>>>> to a float can also result in loss. So dividing an integer
> by a
> > >>>>> float,
> > >>>>>>>>>> for
> > >>>>>>>>>>> example, with an implicit cast has an additional avenue for
> > loss:
> > >>>>> the
> > >>>>>>>>>>> implicit cast for the operands so that they're of the same
> > type.
> > >> I
> > >>>>>>>>>> believe
> > >>>>>>>>>>> this discussion so far has been about the latter, not the
> loss
> > >> from
> > >>>>> the
> > >>>>>>>>>>> operations themselves.
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
> > >>>>> benjamin.lerer@datastax.com>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I would like to try to clarify things a bit to help people
> to
> > >>>>> understand
> > >>>>>>>>>>>> the true complexity of the problem.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> The *float *and *double *types are inexact numeric types.
> Not
> > >> only
> > >>>>> at
> > >>>>>>>>>> the
> > >>>>>>>>>>>> operation level.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> If you insert 676543.21 in a *float* column and then read
> it,
> > >> you
> > >>>>> will
> > >>>>>>>>>>>> realize that the value has been truncated to 676543.2.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> If you want accuracy the only way is to avoid those inexact
> > >> types.
> > >>>>>>>>>>>> Using *decimals
> > >>>>>>>>>>>> *during operations will mitigate the problem but will not
> > remove
> > >>>>> it.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am
> not
> > >>>>> mistaken
> > >>>>>>>>>> in
> > >>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to
> > >> what
> > >>>>> MS SQL
> > >>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
> > >> precision
> > >>>>> if you
> > >>>>>>>>>>>> are not carefull.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> If you truly need precision you can have it by using exact
> > >> numeric
> > >>>>> types
> > >>>>>>>>>>>> for your data types. Of course it has a cost on performance,
> > >>>>> memory and
> > >>>>>>>>>>>> disk usage.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> The advantage of the current approach is that it give you
> the
> > >>>>> choice.
> > >>>>>>>>>> It is
> > >>>>>>>>>>>> up to you to decide what you need for your application. It
> is
> > >> also
> > >>>>> in
> > >>>>>>>>>> line
> > >>>>>>>>>>>> with the way CQL behave everywhere else.
> > >>>>>>>>>>>>
> > >>>>>>>>>>> --
> > >>>>>>>>>>>
> > >>>>>>>>>>> Muru
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >> ---------------------------------------------------------------------
> > >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >>>>>>>>>> For additional commands, e-mail:
> dev-help@cassandra.apache.org
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >> ---------------------------------------------------------------------
> > >>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > ---------------------------------------------------------------------
> > >>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > ---------------------------------------------------------------------
> > >>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> ---------------------------------------------------------------------
> > >>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>>>>
> > >>>>> --
> > >>>> Jon Haddad
> > >>>> http://www.rustyrazorblade.com
> > >>>> twitter: rustyrazorblade
> > >>>
> > >>>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>>
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>
> > >>
> >
> > --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>

Re: Implicit Casts for Arithmetic Operators

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

I can’t agree more. We should be able to make changes in a manner that
improves the DB In the long term, rather than live with the technical debt
of arbitrary decisions made by a handful of people.

I also agree that putting a knob in place to let people migrate over is a
reasonable decision.

Jon

On Wed, Nov 21, 2018 at 4:54 PM Benedict Elliott Smith <be...@apache.org>
wrote:

> The goal is simply to agree on a set of well-defined principles for how we
> should behave.  If we don’t like the implications that arise, we’ll have
> another vote?  A democracy cannot bind itself, so I never understood this
> fear of a decision.
>
> A database also has a thousand toggles.  If we absolutely need to, we can
> introduce one more.
>
> We should be doing this upfront a great deal more often.  Doing it
> retrospectively sucks, but in my opinion it's a bad reason to bind
> ourselves to whatever made it in.
>
> Do we anywhere define the principles of our current behaviour?  I couldn’t
> find it.
>
>
> > On 21 Nov 2018, at 21:08, Sylvain Lebresne <le...@gmail.com> wrote:
> >
> > On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <
> benedict@apache.org>
> > wrote:
> >
> >> FWIW, my meaning of arithmetic in this context extends to any features
> we
> >> have already released (such as aggregates, and perhaps other built-in
> >> functions) that operate on the same domain.  We should be consistent,
> after
> >> all.
> >>
> >> Whether or not we need to revisit any existing functionality we can
> figure
> >> out after the fact, once we have agreed what our behaviour should be.
> >>
> >
> > I'm not sure I correctly understand the process suggested, but I don't
> > particularly like/agree with what I understand. What I understand is a
> > suggestion for voting on agreeing to be ANSI SQL 92 compliant, with no
> real
> > evaluation of what that entails (at least I haven't seen one), and that
> > this vote, if passed, would imply we'd then make any backward
> incompatible
> > change necessary to achieve compliance ("my meaning of arithmetic in this
> > context extends to any features we have already released" and "Whether or
> > not we need to revisit any existing functionality we can figure out after
> > the fact, once we have agreed what our behaviour should be").
> >
> > This might make sense of a new product, but at our stage that seems
> > backward to me. I think we owe our users to first make the effort of
> > identifying what "inconsistencies" our existing arithmetic has[1] and
> > _then_ consider what options we have to fix those, with their pros and
> cons
> > (including how bad they break backward compatibility). And if _then_
> > getting ANSI SQL 92 compliant proves to not be disruptive (or at least
> > acceptably so), then sure, that's great.
> >
> > [1]: one possibly efficient way to do that could actually be to compare
> our
> > arithmetic to ANSI SQL 92. Not that all differences found would imply
> > inconsistencies/wrongness of our arithmetic, but still, it should be
> > helpful. And I guess my whole point is that we should that analysis
> first,
> > and then maybe decide that being ANSI SQL 92 is a reasonable option, not
> > decide first and live with the consequences no matter what they are.
> >
> > --
> > Sylvain
> >
> >
> >> I will make this more explicit for the vote, but just to clarify the
> >> intention so that we are all discussing the same thing.
> >>
> >>
> >>> On 20 Nov 2018, at 14:18, Ariel Weisberg <ad...@fastmail.fm> wrote:
> >>>
> >>> Hi,
> >>>
> >>> +1
> >>>
> >>> This is a public API so we will be much better off if we get it right
> >> the first time.
> >>>
> >>> Ariel
> >>>
> >>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <jo...@jonhaddad.com>
> >> wrote:
> >>>>
> >>>> Sounds good to me.
> >>>>
> >>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
> >> benedict@apache.org>
> >>>> wrote:
> >>>>
> >>>>> So, this thread somewhat petered out.
> >>>>>
> >>>>> There are still a number of unresolved issues, but to make progress I
> >>>>> wonder if it would first be helpful to have a vote on ensuring we are
> >> ANSI
> >>>>> SQL 92 compliant for our arithmetic?  This seems like a sensible
> >> baseline,
> >>>>> since we will hopefully minimise surprise to operators this way.
> >>>>>
> >>>>> If people largely agree, I will call a vote, and we can pick up a
> >> couple
> >>>>> of more focused discussions afterwards on how we interpret the leeway
> >> it
> >>>>> gives.
> >>>>>
> >>>>>
> >>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> From reading the spec. Precision is always implementation defined.
> The
> >>>>> spec specifies scale in several cases, but never precision for any
> >> type or
> >>>>> operation (addition/subtraction, multiplication, division).
> >>>>>>
> >>>>>> So we don't implement anything remotely approaching precision and
> >> scale
> >>>>> in CQL when it comes to numbers I think? So we aren't going to follow
> >> the
> >>>>> spec for scale. We are already pretty far down that road so I would
> >> leave
> >>>>> it alone.
> >>>>>>
> >>>>>> I don't think the spec is asking for the most approximate type. It's
> >>>>> just saying the result is approximate, and the precision is
> >> implementation
> >>>>> defined. We could return either float or double. I think if one of
> the
> >>>>> operands is a double we should return a double because clearly the
> >> schema
> >>>>> thought a double was required to represent that number. I would also
> >> be in
> >>>>> favor of returning a double all the time so that people can expect a
> >>>>> consistent type from expressions involving approximate numbers.
> >>>>>>
> >>>>>> I am a big fan of widening for arithmetic expressions in a database
> to
> >>>>> avoid having to error on overflow. You can go to the trouble of only
> >>>>> widening the minimum amount, but I think it's simpler if we always
> >> widen to
> >>>>> bigint and double. This would be something the spec allows.
> >>>>>>
> >>>>>> Definitely if we can make overflow not occur we should and the spec
> >>>>> allows that. We should also not return different types for the same
> >> operand
> >>>>> types just to work around overflow if we detect we need more
> precision.
> >>>>>>
> >>>>>> Ariel
> >>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
> >>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging
> >> this
> >>>>>>> out (and Mike for getting some empirical examples).
> >>>>>>>
> >>>>>>> We still have to decide on the approximate data type to return;
> right
> >>>>>>> now, we have float+bigint=double, but float+int=float.  I think
> this
> >> is
> >>>>>>> fairly inconsistent, and either the approximate type should always
> >> win,
> >>>>>>> or we should always upgrade to double for mixed operands.
> >>>>>>>
> >>>>>>> The quoted spec also suggests that decimal+float=float, and decimal
> >>>>>>> +double=double, whereas we currently have decimal+float=decimal,
> and
> >>>>>>> decimal+double=decimal
> >>>>>>>
> >>>>>>> If we’re going to go with an approximate operand implying an
> >>>>> approximate
> >>>>>>> result, I think we should do it consistently (and consistent with
> the
> >>>>>>> SQL92 spec), and have the type of the approximate operand always be
> >> the
> >>>>>>> return type.
> >>>>>>>
> >>>>>>> This would still leave a decision for float+double, though.  The
> most
> >>>>>>> consistent behaviour with that stated above would be to always take
> >> the
> >>>>>>> most approximate type to return (i.e. float), but this would seem
> to
> >> me
> >>>>>>> to be fairly unexpected for the user.
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws>
> wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I agree with what's been said about expectations regarding
> >> expressions
> >>>>> involving floating point numbers. I think that if one of the inputs
> is
> >>>>> approximate then the result should be approximate.
> >>>>>>>>
> >>>>>>>> One thing we could look at for inspiration is the SQL spec. Not to
> >>>>> follow dogmatically necessarily.
> >>>>>>>>
> >>>>>>>> From the SQL 92 spec regarding assignment
> >>>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section
> 4.6:
> >>>>>>>> "
> >>>>>>>>     Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
> >>>>>>>>     FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
> >>>>> mutually
> >>>>>>>>     comparable and mutually assignable. If an assignment would
> >>>>> result
> >>>>>>>>     in a loss of the most significant digits, an exception
> >> condition
> >>>>>>>>     is raised. If least significant digits are lost,
> >> implementation-
> >>>>>>>>     defined rounding or truncating occurs with no exception
> >>>>> condition
> >>>>>>>>     being raised. The rules for arithmetic are generally governed
> >> by
> >>>>>>>>     Subclause 6.12, "<numeric value expression>".
> >>>>>>>> "
> >>>>>>>>
> >>>>>>>> Section 6.12 numeric value expressions:
> >>>>>>>> "
> >>>>>>>>     1) If the data type of both operands of a dyadic arithmetic
> >>>>> opera-
> >>>>>>>>        tor is exact numeric, then the data type of the result is
> >>>>> exact
> >>>>>>>>        numeric, with precision and scale determined as follows:
> >>>>>>>> ...
> >>>>>>>>     2) If the data type of either operand of a dyadic arithmetic
> >> op-
> >>>>>>>>        erator is approximate numeric, then the data type of the
> re-
> >>>>>>>>        sult is approximate numeric. The precision of the result is
> >>>>>>>>        implementation-defined.
> >>>>>>>> "
> >>>>>>>>
> >>>>>>>> And this makes sense to me. I think we should only return an exact
> >>>>> result if both of the inputs are exact.
> >>>>>>>>
> >>>>>>>> I think we might want to look closely at the SQL spec and
> especially
> >>>>> when the spec requires an error to be generated. Those are sometimes
> >> in the
> >>>>> spec to prevent subtle paths to wrong answers. Any time we deviate
> >> from the
> >>>>> spec we should be asking why is it in the spec and why are we
> >> deviating.
> >>>>>>>>
> >>>>>>>> Another issue besides overflow handling is how we determine
> >> precision
> >>>>> and scale for expressions involving two exact types.
> >>>>>>>>
> >>>>>>>> Ariel
> >>>>>>>>
> >>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I'm not sure if I would prefer the Postgres way of doing things,
> >>>>> which is
> >>>>>>>>> returning just about any type depending on the order of
> operators.
> >>>>>>>>> Considering it actually mentions in the docs that using
> >>>>> numeric/decimal is
> >>>>>>>>> slow and also multiple times that floating points are inexact. So
> >>>>> doing
> >>>>>>>>> some math with Postgres (9.6.5):
> >>>>>>>>>
> >>>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double
> >>>>>>>>> precision 2147483647
> >>>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
> >>>>>>>>> SELECT 2147483647::bigint*1.0::real returns double
> >>>>>>>>> SELECT 2147483647::double precision*1::bigint returns double
> >>>>> 2147483647
> >>>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double
> >>>>> 2147483647
> >>>>>>>>>
> >>>>>>>>> With + - we can get the same amount of mixture of returned types.
> >>>>> There's
> >>>>>>>>> no difference in those calculations, just some casting. To me
> >>>>>>>>> floating-point math indicates inexactness and has errors and
> >> whoever
> >>>>> mixes
> >>>>>>>>> up two different types should understand that. If one didn't want
> >>>>> exact
> >>>>>>>>> numeric type, why would the server return such? The floating
> point
> >>>>> value
> >>>>>>>>> itself could be wrong already before the calculation - trying to
> >> say
> >>>>> we do
> >>>>>>>>> it lossless is just wrong.
> >>>>>>>>>
> >>>>>>>>> Fun with 2.65:
> >>>>>>>>>
> >>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
> >>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
> >>>>>>>>>
> >>>>>>>>> SELECT round(2.65) returns numeric 4
> >>>>>>>>> SELECT round(2.65::double precision) returns double 4
> >>>>>>>>>
> >>>>>>>>> SELECT 2.65 * 1 returns double 2.65
> >>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
> >>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
> >>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
> >>>>>>>>>
> >>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
> >>>>>>>>> SELECT round(2.65) * round(1) returns double 3
> >>>>>>>>>
> >>>>>>>>> So as we're going to have silly values in any case, why pretend
> >>>>> something
> >>>>>>>>> else? Also, exact calculations are slow if we crunch large amount
> >> of
> >>>>>>>>> numbers. I guess I slightly deviated towards Postgres'
> implemention
> >>>>> in this
> >>>>>>>>> case, but I wish it wasn't used as a benchmark in this case. And
> >> most
> >>>>>>>>> importantly, I would definitely want the exact same type returned
> >>>>> each time
> >>>>>>>>> I do a calculation.
> >>>>>>>>>
> >>>>>>>>> - Micke
> >>>>>>>>>
> >>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
> >>>>> benedict@apache.org>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> As far as I can tell we reached a relatively strong consensus
> >> that we
> >>>>>>>>>> should implement lossless casts by default?  Does anyone have
> >>>>> anything more
> >>>>>>>>>> to add?
> >>>>>>>>>>
> >>>>>>>>>> Looking at the emails, everyone who participated and expressed a
> >>>>>>>>>> preference was in favour of the “Postgres approach” of upcasting
> >> to
> >>>>> decimal
> >>>>>>>>>> for mixed float/int operands?
> >>>>>>>>>>
> >>>>>>>>>> I’d like to get a clear-cut decision on this, so we know what
> >> we’re
> >>>>> doing
> >>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective decision
> >> on
> >>>>> Ariel’s
> >>>>>>>>>> concerns about overflow, which I think are also pressing -
> >>>>> particularly for
> >>>>>>>>>> tinyint and smallint.  This does also impact implicit casts for
> >> mixed
> >>>>>>>>>> integer type operations, but an approach for these will probably
> >>>>> fall out
> >>>>>>>>>> of any decision on overflow.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
> >>>>> murukesh.mohanan@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> I think you're conflating two things here. There's the loss
> >>>>> resulting
> >>>>>>>>>> from
> >>>>>>>>>>> using some operators, and loss involved in casting. Dividing an
> >>>>> integer
> >>>>>>>>>> by
> >>>>>>>>>>> another integer to obtain an integer result can result in loss,
> >> but
> >>>>>>>>>> there's
> >>>>>>>>>>> no implicit casting there and no loss due to casting.  Casting
> an
> >>>>> integer
> >>>>>>>>>>> to a float can also result in loss. So dividing an integer by a
> >>>>> float,
> >>>>>>>>>> for
> >>>>>>>>>>> example, with an implicit cast has an additional avenue for
> loss:
> >>>>> the
> >>>>>>>>>>> implicit cast for the operands so that they're of the same
> type.
> >> I
> >>>>>>>>>> believe
> >>>>>>>>>>> this discussion so far has been about the latter, not the loss
> >> from
> >>>>> the
> >>>>>>>>>>> operations themselves.
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
> >>>>> benjamin.lerer@datastax.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I would like to try to clarify things a bit to help people to
> >>>>> understand
> >>>>>>>>>>>> the true complexity of the problem.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The *float *and *double *types are inexact numeric types. Not
> >> only
> >>>>> at
> >>>>>>>>>> the
> >>>>>>>>>>>> operation level.
> >>>>>>>>>>>>
> >>>>>>>>>>>> If you insert 676543.21 in a *float* column and then read it,
> >> you
> >>>>> will
> >>>>>>>>>>>> realize that the value has been truncated to 676543.2.
> >>>>>>>>>>>>
> >>>>>>>>>>>> If you want accuracy the only way is to avoid those inexact
> >> types.
> >>>>>>>>>>>> Using *decimals
> >>>>>>>>>>>> *during operations will mitigate the problem but will not
> remove
> >>>>> it.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am not
> >>>>> mistaken
> >>>>>>>>>> in
> >>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to
> >> what
> >>>>> MS SQL
> >>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
> >> precision
> >>>>> if you
> >>>>>>>>>>>> are not carefull.
> >>>>>>>>>>>>
> >>>>>>>>>>>> If you truly need precision you can have it by using exact
> >> numeric
> >>>>> types
> >>>>>>>>>>>> for your data types. Of course it has a cost on performance,
> >>>>> memory and
> >>>>>>>>>>>> disk usage.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The advantage of the current approach is that it give you the
> >>>>> choice.
> >>>>>>>>>> It is
> >>>>>>>>>>>> up to you to decide what you need for your application. It is
> >> also
> >>>>> in
> >>>>>>>>>> line
> >>>>>>>>>>>> with the way CQL behave everywhere else.
> >>>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>>
> >>>>>>>>>>> Muru
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>
> >>>>> --
> >>>> Jon Haddad
> >>>> http://www.rustyrazorblade.com
> >>>> twitter: rustyrazorblade
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >>
>
> --
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Implicit Casts for Arithmetic Operators

Posted by Benedict Elliott Smith <be...@apache.org>.

The goal is simply to agree on a set of well-defined principles for how we should behave.  If we don’t like the implications that arise, we’ll have another vote?  A democracy cannot bind itself, so I never understood this fear of a decision.

A database also has a thousand toggles.  If we absolutely need to, we can introduce one more.

We should be doing this upfront a great deal more often.  Doing it retrospectively sucks, but in my opinion it's a bad reason to bind ourselves to whatever made it in.

Do we anywhere define the principles of our current behaviour?  I couldn’t find it.


> On 21 Nov 2018, at 21:08, Sylvain Lebresne <le...@gmail.com> wrote:
> 
> On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <be...@apache.org>
> wrote:
> 
>> FWIW, my meaning of arithmetic in this context extends to any features we
>> have already released (such as aggregates, and perhaps other built-in
>> functions) that operate on the same domain.  We should be consistent, after
>> all.
>> 
>> Whether or not we need to revisit any existing functionality we can figure
>> out after the fact, once we have agreed what our behaviour should be.
>> 
> 
> I'm not sure I correctly understand the process suggested, but I don't
> particularly like/agree with what I understand. What I understand is a
> suggestion for voting on agreeing to be ANSI SQL 92 compliant, with no real
> evaluation of what that entails (at least I haven't seen one), and that
> this vote, if passed, would imply we'd then make any backward incompatible
> change necessary to achieve compliance ("my meaning of arithmetic in this
> context extends to any features we have already released" and "Whether or
> not we need to revisit any existing functionality we can figure out after
> the fact, once we have agreed what our behaviour should be").
> 
> This might make sense of a new product, but at our stage that seems
> backward to me. I think we owe our users to first make the effort of
> identifying what "inconsistencies" our existing arithmetic has[1] and
> _then_ consider what options we have to fix those, with their pros and cons
> (including how bad they break backward compatibility). And if _then_
> getting ANSI SQL 92 compliant proves to not be disruptive (or at least
> acceptably so), then sure, that's great.
> 
> [1]: one possibly efficient way to do that could actually be to compare our
> arithmetic to ANSI SQL 92. Not that all differences found would imply
> inconsistencies/wrongness of our arithmetic, but still, it should be
> helpful. And I guess my whole point is that we should that analysis first,
> and then maybe decide that being ANSI SQL 92 is a reasonable option, not
> decide first and live with the consequences no matter what they are.
> 
> --
> Sylvain
> 
> 
>> I will make this more explicit for the vote, but just to clarify the
>> intention so that we are all discussing the same thing.
>> 
>> 
>>> On 20 Nov 2018, at 14:18, Ariel Weisberg <ad...@fastmail.fm> wrote:
>>> 
>>> Hi,
>>> 
>>> +1
>>> 
>>> This is a public API so we will be much better off if we get it right
>> the first time.
>>> 
>>> Ariel
>>> 
>>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <jo...@jonhaddad.com>
>> wrote:
>>>> 
>>>> Sounds good to me.
>>>> 
>>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
>> benedict@apache.org>
>>>> wrote:
>>>> 
>>>>> So, this thread somewhat petered out.
>>>>> 
>>>>> There are still a number of unresolved issues, but to make progress I
>>>>> wonder if it would first be helpful to have a vote on ensuring we are
>> ANSI
>>>>> SQL 92 compliant for our arithmetic?  This seems like a sensible
>> baseline,
>>>>> since we will hopefully minimise surprise to operators this way.
>>>>> 
>>>>> If people largely agree, I will call a vote, and we can pick up a
>> couple
>>>>> of more focused discussions afterwards on how we interpret the leeway
>> it
>>>>> gives.
>>>>> 
>>>>> 
>>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> From reading the spec. Precision is always implementation defined. The
>>>>> spec specifies scale in several cases, but never precision for any
>> type or
>>>>> operation (addition/subtraction, multiplication, division).
>>>>>> 
>>>>>> So we don't implement anything remotely approaching precision and
>> scale
>>>>> in CQL when it comes to numbers I think? So we aren't going to follow
>> the
>>>>> spec for scale. We are already pretty far down that road so I would
>> leave
>>>>> it alone.
>>>>>> 
>>>>>> I don't think the spec is asking for the most approximate type. It's
>>>>> just saying the result is approximate, and the precision is
>> implementation
>>>>> defined. We could return either float or double. I think if one of the
>>>>> operands is a double we should return a double because clearly the
>> schema
>>>>> thought a double was required to represent that number. I would also
>> be in
>>>>> favor of returning a double all the time so that people can expect a
>>>>> consistent type from expressions involving approximate numbers.
>>>>>> 
>>>>>> I am a big fan of widening for arithmetic expressions in a database to
>>>>> avoid having to error on overflow. You can go to the trouble of only
>>>>> widening the minimum amount, but I think it's simpler if we always
>> widen to
>>>>> bigint and double. This would be something the spec allows.
>>>>>> 
>>>>>> Definitely if we can make overflow not occur we should and the spec
>>>>> allows that. We should also not return different types for the same
>> operand
>>>>> types just to work around overflow if we detect we need more precision.
>>>>>> 
>>>>>> Ariel
>>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
>>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging
>> this
>>>>>>> out (and Mike for getting some empirical examples).
>>>>>>> 
>>>>>>> We still have to decide on the approximate data type to return; right
>>>>>>> now, we have float+bigint=double, but float+int=float.  I think this
>> is
>>>>>>> fairly inconsistent, and either the approximate type should always
>> win,
>>>>>>> or we should always upgrade to double for mixed operands.
>>>>>>> 
>>>>>>> The quoted spec also suggests that decimal+float=float, and decimal
>>>>>>> +double=double, whereas we currently have decimal+float=decimal, and
>>>>>>> decimal+double=decimal
>>>>>>> 
>>>>>>> If we’re going to go with an approximate operand implying an
>>>>> approximate
>>>>>>> result, I think we should do it consistently (and consistent with the
>>>>>>> SQL92 spec), and have the type of the approximate operand always be
>> the
>>>>>>> return type.
>>>>>>> 
>>>>>>> This would still leave a decision for float+double, though.  The most
>>>>>>> consistent behaviour with that stated above would be to always take
>> the
>>>>>>> most approximate type to return (i.e. float), but this would seem to
>> me
>>>>>>> to be fairly unexpected for the user.
>>>>>>> 
>>>>>>> 
>>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I agree with what's been said about expectations regarding
>> expressions
>>>>> involving floating point numbers. I think that if one of the inputs is
>>>>> approximate then the result should be approximate.
>>>>>>>> 
>>>>>>>> One thing we could look at for inspiration is the SQL spec. Not to
>>>>> follow dogmatically necessarily.
>>>>>>>> 
>>>>>>>> From the SQL 92 spec regarding assignment
>>>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
>>>>>>>> "
>>>>>>>>     Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
>>>>>>>>     FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
>>>>> mutually
>>>>>>>>     comparable and mutually assignable. If an assignment would
>>>>> result
>>>>>>>>     in a loss of the most significant digits, an exception
>> condition
>>>>>>>>     is raised. If least significant digits are lost,
>> implementation-
>>>>>>>>     defined rounding or truncating occurs with no exception
>>>>> condition
>>>>>>>>     being raised. The rules for arithmetic are generally governed
>> by
>>>>>>>>     Subclause 6.12, "<numeric value expression>".
>>>>>>>> "
>>>>>>>> 
>>>>>>>> Section 6.12 numeric value expressions:
>>>>>>>> "
>>>>>>>>     1) If the data type of both operands of a dyadic arithmetic
>>>>> opera-
>>>>>>>>        tor is exact numeric, then the data type of the result is
>>>>> exact
>>>>>>>>        numeric, with precision and scale determined as follows:
>>>>>>>> ...
>>>>>>>>     2) If the data type of either operand of a dyadic arithmetic
>> op-
>>>>>>>>        erator is approximate numeric, then the data type of the re-
>>>>>>>>        sult is approximate numeric. The precision of the result is
>>>>>>>>        implementation-defined.
>>>>>>>> "
>>>>>>>> 
>>>>>>>> And this makes sense to me. I think we should only return an exact
>>>>> result if both of the inputs are exact.
>>>>>>>> 
>>>>>>>> I think we might want to look closely at the SQL spec and especially
>>>>> when the spec requires an error to be generated. Those are sometimes
>> in the
>>>>> spec to prevent subtle paths to wrong answers. Any time we deviate
>> from the
>>>>> spec we should be asking why is it in the spec and why are we
>> deviating.
>>>>>>>> 
>>>>>>>> Another issue besides overflow handling is how we determine
>> precision
>>>>> and scale for expressions involving two exact types.
>>>>>>>> 
>>>>>>>> Ariel
>>>>>>>> 
>>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I'm not sure if I would prefer the Postgres way of doing things,
>>>>> which is
>>>>>>>>> returning just about any type depending on the order of operators.
>>>>>>>>> Considering it actually mentions in the docs that using
>>>>> numeric/decimal is
>>>>>>>>> slow and also multiple times that floating points are inexact. So
>>>>> doing
>>>>>>>>> some math with Postgres (9.6.5):
>>>>>>>>> 
>>>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double
>>>>>>>>> precision 2147483647
>>>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
>>>>>>>>> SELECT 2147483647::bigint*1.0::real returns double
>>>>>>>>> SELECT 2147483647::double precision*1::bigint returns double
>>>>> 2147483647
>>>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double
>>>>> 2147483647
>>>>>>>>> 
>>>>>>>>> With + - we can get the same amount of mixture of returned types.
>>>>> There's
>>>>>>>>> no difference in those calculations, just some casting. To me
>>>>>>>>> floating-point math indicates inexactness and has errors and
>> whoever
>>>>> mixes
>>>>>>>>> up two different types should understand that. If one didn't want
>>>>> exact
>>>>>>>>> numeric type, why would the server return such? The floating point
>>>>> value
>>>>>>>>> itself could be wrong already before the calculation - trying to
>> say
>>>>> we do
>>>>>>>>> it lossless is just wrong.
>>>>>>>>> 
>>>>>>>>> Fun with 2.65:
>>>>>>>>> 
>>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
>>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
>>>>>>>>> 
>>>>>>>>> SELECT round(2.65) returns numeric 4
>>>>>>>>> SELECT round(2.65::double precision) returns double 4
>>>>>>>>> 
>>>>>>>>> SELECT 2.65 * 1 returns double 2.65
>>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
>>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
>>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
>>>>>>>>> 
>>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
>>>>>>>>> SELECT round(2.65) * round(1) returns double 3
>>>>>>>>> 
>>>>>>>>> So as we're going to have silly values in any case, why pretend
>>>>> something
>>>>>>>>> else? Also, exact calculations are slow if we crunch large amount
>> of
>>>>>>>>> numbers. I guess I slightly deviated towards Postgres' implemention
>>>>> in this
>>>>>>>>> case, but I wish it wasn't used as a benchmark in this case. And
>> most
>>>>>>>>> importantly, I would definitely want the exact same type returned
>>>>> each time
>>>>>>>>> I do a calculation.
>>>>>>>>> 
>>>>>>>>> - Micke
>>>>>>>>> 
>>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
>>>>> benedict@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> As far as I can tell we reached a relatively strong consensus
>> that we
>>>>>>>>>> should implement lossless casts by default?  Does anyone have
>>>>> anything more
>>>>>>>>>> to add?
>>>>>>>>>> 
>>>>>>>>>> Looking at the emails, everyone who participated and expressed a
>>>>>>>>>> preference was in favour of the “Postgres approach” of upcasting
>> to
>>>>> decimal
>>>>>>>>>> for mixed float/int operands?
>>>>>>>>>> 
>>>>>>>>>> I’d like to get a clear-cut decision on this, so we know what
>> we’re
>>>>> doing
>>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective decision
>> on
>>>>> Ariel’s
>>>>>>>>>> concerns about overflow, which I think are also pressing -
>>>>> particularly for
>>>>>>>>>> tinyint and smallint.  This does also impact implicit casts for
>> mixed
>>>>>>>>>> integer type operations, but an approach for these will probably
>>>>> fall out
>>>>>>>>>> of any decision on overflow.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
>>>>> murukesh.mohanan@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I think you're conflating two things here. There's the loss
>>>>> resulting
>>>>>>>>>> from
>>>>>>>>>>> using some operators, and loss involved in casting. Dividing an
>>>>> integer
>>>>>>>>>> by
>>>>>>>>>>> another integer to obtain an integer result can result in loss,
>> but
>>>>>>>>>> there's
>>>>>>>>>>> no implicit casting there and no loss due to casting.  Casting an
>>>>> integer
>>>>>>>>>>> to a float can also result in loss. So dividing an integer by a
>>>>> float,
>>>>>>>>>> for
>>>>>>>>>>> example, with an implicit cast has an additional avenue for loss:
>>>>> the
>>>>>>>>>>> implicit cast for the operands so that they're of the same type.
>> I
>>>>>>>>>> believe
>>>>>>>>>>> this discussion so far has been about the latter, not the loss
>> from
>>>>> the
>>>>>>>>>>> operations themselves.
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
>>>>> benjamin.lerer@datastax.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> I would like to try to clarify things a bit to help people to
>>>>> understand
>>>>>>>>>>>> the true complexity of the problem.
>>>>>>>>>>>> 
>>>>>>>>>>>> The *float *and *double *types are inexact numeric types. Not
>> only
>>>>> at
>>>>>>>>>> the
>>>>>>>>>>>> operation level.
>>>>>>>>>>>> 
>>>>>>>>>>>> If you insert 676543.21 in a *float* column and then read it,
>> you
>>>>> will
>>>>>>>>>>>> realize that the value has been truncated to 676543.2.
>>>>>>>>>>>> 
>>>>>>>>>>>> If you want accuracy the only way is to avoid those inexact
>> types.
>>>>>>>>>>>> Using *decimals
>>>>>>>>>>>> *during operations will mitigate the problem but will not remove
>>>>> it.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am not
>>>>> mistaken
>>>>>>>>>> in
>>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to
>> what
>>>>> MS SQL
>>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
>> precision
>>>>> if you
>>>>>>>>>>>> are not carefull.
>>>>>>>>>>>> 
>>>>>>>>>>>> If you truly need precision you can have it by using exact
>> numeric
>>>>> types
>>>>>>>>>>>> for your data types. Of course it has a cost on performance,
>>>>> memory and
>>>>>>>>>>>> disk usage.
>>>>>>>>>>>> 
>>>>>>>>>>>> The advantage of the current approach is that it give you the
>>>>> choice.
>>>>>>>>>> It is
>>>>>>>>>>>> up to you to decide what you need for your application. It is
>> also
>>>>> in
>>>>>>>>>> line
>>>>>>>>>>>> with the way CQL behave everywhere else.
>>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> 
>>>>>>>>>>> Muru
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> 
>>>>> --
>>>> Jon Haddad
>>>> http://www.rustyrazorblade.com
>>>> twitter: rustyrazorblade
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>>

Re: Implicit Casts for Arithmetic Operators

Posted by Sylvain Lebresne <le...@gmail.com>.

On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <be...@apache.org>
wrote:

> FWIW, my meaning of arithmetic in this context extends to any features we
> have already released (such as aggregates, and perhaps other built-in
> functions) that operate on the same domain.  We should be consistent, after
> all.
>
> Whether or not we need to revisit any existing functionality we can figure
> out after the fact, once we have agreed what our behaviour should be.
>

I'm not sure I correctly understand the process suggested, but I don't
particularly like/agree with what I understand. What I understand is a
suggestion for voting on agreeing to be ANSI SQL 92 compliant, with no real
evaluation of what that entails (at least I haven't seen one), and that
this vote, if passed, would imply we'd then make any backward incompatible
change necessary to achieve compliance ("my meaning of arithmetic in this
context extends to any features we have already released" and "Whether or
not we need to revisit any existing functionality we can figure out after
the fact, once we have agreed what our behaviour should be").

This might make sense of a new product, but at our stage that seems
backward to me. I think we owe our users to first make the effort of
identifying what "inconsistencies" our existing arithmetic has[1] and
_then_ consider what options we have to fix those, with their pros and cons
(including how bad they break backward compatibility). And if _then_
getting ANSI SQL 92 compliant proves to not be disruptive (or at least
acceptably so), then sure, that's great.

[1]: one possibly efficient way to do that could actually be to compare our
arithmetic to ANSI SQL 92. Not that all differences found would imply
inconsistencies/wrongness of our arithmetic, but still, it should be
helpful. And I guess my whole point is that we should that analysis first,
and then maybe decide that being ANSI SQL 92 is a reasonable option, not
decide first and live with the consequences no matter what they are.

--
Sylvain


> I will make this more explicit for the vote, but just to clarify the
> intention so that we are all discussing the same thing.
>
>
> > On 20 Nov 2018, at 14:18, Ariel Weisberg <ad...@fastmail.fm> wrote:
> >
> > Hi,
> >
> > +1
> >
> > This is a public API so we will be much better off if we get it right
> the first time.
> >
> > Ariel
> >
> >> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
> >>
> >> Sounds good to me.
> >>
> >> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
> benedict@apache.org>
> >> wrote:
> >>
> >>> So, this thread somewhat petered out.
> >>>
> >>> There are still a number of unresolved issues, but to make progress I
> >>> wonder if it would first be helpful to have a vote on ensuring we are
> ANSI
> >>> SQL 92 compliant for our arithmetic?  This seems like a sensible
> baseline,
> >>> since we will hopefully minimise surprise to operators this way.
> >>>
> >>> If people largely agree, I will call a vote, and we can pick up a
> couple
> >>> of more focused discussions afterwards on how we interpret the leeway
> it
> >>> gives.
> >>>
> >>>
> >>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> From reading the spec. Precision is always implementation defined. The
> >>> spec specifies scale in several cases, but never precision for any
> type or
> >>> operation (addition/subtraction, multiplication, division).
> >>>>
> >>>> So we don't implement anything remotely approaching precision and
> scale
> >>> in CQL when it comes to numbers I think? So we aren't going to follow
> the
> >>> spec for scale. We are already pretty far down that road so I would
> leave
> >>> it alone.
> >>>>
> >>>> I don't think the spec is asking for the most approximate type. It's
> >>> just saying the result is approximate, and the precision is
> implementation
> >>> defined. We could return either float or double. I think if one of the
> >>> operands is a double we should return a double because clearly the
> schema
> >>> thought a double was required to represent that number. I would also
> be in
> >>> favor of returning a double all the time so that people can expect a
> >>> consistent type from expressions involving approximate numbers.
> >>>>
> >>>> I am a big fan of widening for arithmetic expressions in a database to
> >>> avoid having to error on overflow. You can go to the trouble of only
> >>> widening the minimum amount, but I think it's simpler if we always
> widen to
> >>> bigint and double. This would be something the spec allows.
> >>>>
> >>>> Definitely if we can make overflow not occur we should and the spec
> >>> allows that. We should also not return different types for the same
> operand
> >>> types just to work around overflow if we detect we need more precision.
> >>>>
> >>>> Ariel
> >>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
> >>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging
> this
> >>>>> out (and Mike for getting some empirical examples).
> >>>>>
> >>>>> We still have to decide on the approximate data type to return; right
> >>>>> now, we have float+bigint=double, but float+int=float.  I think this
> is
> >>>>> fairly inconsistent, and either the approximate type should always
> win,
> >>>>> or we should always upgrade to double for mixed operands.
> >>>>>
> >>>>> The quoted spec also suggests that decimal+float=float, and decimal
> >>>>> +double=double, whereas we currently have decimal+float=decimal, and
> >>>>> decimal+double=decimal
> >>>>>
> >>>>> If we’re going to go with an approximate operand implying an
> >>> approximate
> >>>>> result, I think we should do it consistently (and consistent with the
> >>>>> SQL92 spec), and have the type of the approximate operand always be
> the
> >>>>> return type.
> >>>>>
> >>>>> This would still leave a decision for float+double, though.  The most
> >>>>> consistent behaviour with that stated above would be to always take
> the
> >>>>> most approximate type to return (i.e. float), but this would seem to
> me
> >>>>> to be fairly unexpected for the user.
> >>>>>
> >>>>>
> >>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I agree with what's been said about expectations regarding
> expressions
> >>> involving floating point numbers. I think that if one of the inputs is
> >>> approximate then the result should be approximate.
> >>>>>>
> >>>>>> One thing we could look at for inspiration is the SQL spec. Not to
> >>> follow dogmatically necessarily.
> >>>>>>
> >>>>>> From the SQL 92 spec regarding assignment
> >>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
> >>>>>> "
> >>>>>>      Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
> >>>>>>      FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
> >>> mutually
> >>>>>>      comparable and mutually assignable. If an assignment would
> >>> result
> >>>>>>      in a loss of the most significant digits, an exception
> condition
> >>>>>>      is raised. If least significant digits are lost,
> implementation-
> >>>>>>      defined rounding or truncating occurs with no exception
> >>> condition
> >>>>>>      being raised. The rules for arithmetic are generally governed
> by
> >>>>>>      Subclause 6.12, "<numeric value expression>".
> >>>>>> "
> >>>>>>
> >>>>>> Section 6.12 numeric value expressions:
> >>>>>> "
> >>>>>>      1) If the data type of both operands of a dyadic arithmetic
> >>> opera-
> >>>>>>         tor is exact numeric, then the data type of the result is
> >>> exact
> >>>>>>         numeric, with precision and scale determined as follows:
> >>>>>> ...
> >>>>>>      2) If the data type of either operand of a dyadic arithmetic
> op-
> >>>>>>         erator is approximate numeric, then the data type of the re-
> >>>>>>         sult is approximate numeric. The precision of the result is
> >>>>>>         implementation-defined.
> >>>>>> "
> >>>>>>
> >>>>>> And this makes sense to me. I think we should only return an exact
> >>> result if both of the inputs are exact.
> >>>>>>
> >>>>>> I think we might want to look closely at the SQL spec and especially
> >>> when the spec requires an error to be generated. Those are sometimes
> in the
> >>> spec to prevent subtle paths to wrong answers. Any time we deviate
> from the
> >>> spec we should be asking why is it in the spec and why are we
> deviating.
> >>>>>>
> >>>>>> Another issue besides overflow handling is how we determine
> precision
> >>> and scale for expressions involving two exact types.
> >>>>>>
> >>>>>> Ariel
> >>>>>>
> >>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I'm not sure if I would prefer the Postgres way of doing things,
> >>> which is
> >>>>>>> returning just about any type depending on the order of operators.
> >>>>>>> Considering it actually mentions in the docs that using
> >>> numeric/decimal is
> >>>>>>> slow and also multiple times that floating points are inexact. So
> >>> doing
> >>>>>>> some math with Postgres (9.6.5):
> >>>>>>>
> >>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double
> >>>>>>> precision 2147483647
> >>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
> >>>>>>> SELECT 2147483647::bigint*1.0::real returns double
> >>>>>>> SELECT 2147483647::double precision*1::bigint returns double
> >>> 2147483647
> >>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double
> >>> 2147483647
> >>>>>>>
> >>>>>>> With + - we can get the same amount of mixture of returned types.
> >>> There's
> >>>>>>> no difference in those calculations, just some casting. To me
> >>>>>>> floating-point math indicates inexactness and has errors and
> whoever
> >>> mixes
> >>>>>>> up two different types should understand that. If one didn't want
> >>> exact
> >>>>>>> numeric type, why would the server return such? The floating point
> >>> value
> >>>>>>> itself could be wrong already before the calculation - trying to
> say
> >>> we do
> >>>>>>> it lossless is just wrong.
> >>>>>>>
> >>>>>>> Fun with 2.65:
> >>>>>>>
> >>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
> >>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
> >>>>>>>
> >>>>>>> SELECT round(2.65) returns numeric 4
> >>>>>>> SELECT round(2.65::double precision) returns double 4
> >>>>>>>
> >>>>>>> SELECT 2.65 * 1 returns double 2.65
> >>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
> >>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
> >>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
> >>>>>>>
> >>>>>>> SELECT round(2.65) * 1 returns numeric 3
> >>>>>>> SELECT round(2.65) * round(1) returns double 3
> >>>>>>>
> >>>>>>> So as we're going to have silly values in any case, why pretend
> >>> something
> >>>>>>> else? Also, exact calculations are slow if we crunch large amount
> of
> >>>>>>> numbers. I guess I slightly deviated towards Postgres' implemention
> >>> in this
> >>>>>>> case, but I wish it wasn't used as a benchmark in this case. And
> most
> >>>>>>> importantly, I would definitely want the exact same type returned
> >>> each time
> >>>>>>> I do a calculation.
> >>>>>>>
> >>>>>>> - Micke
> >>>>>>>
> >>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
> >>> benedict@apache.org>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> As far as I can tell we reached a relatively strong consensus
> that we
> >>>>>>>> should implement lossless casts by default?  Does anyone have
> >>> anything more
> >>>>>>>> to add?
> >>>>>>>>
> >>>>>>>> Looking at the emails, everyone who participated and expressed a
> >>>>>>>> preference was in favour of the “Postgres approach” of upcasting
> to
> >>> decimal
> >>>>>>>> for mixed float/int operands?
> >>>>>>>>
> >>>>>>>> I’d like to get a clear-cut decision on this, so we know what
> we’re
> >>> doing
> >>>>>>>> for 4.0.  Then hopefully we can move on to a collective decision
> on
> >>> Ariel’s
> >>>>>>>> concerns about overflow, which I think are also pressing -
> >>> particularly for
> >>>>>>>> tinyint and smallint.  This does also impact implicit casts for
> mixed
> >>>>>>>> integer type operations, but an approach for these will probably
> >>> fall out
> >>>>>>>> of any decision on overflow.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
> >>> murukesh.mohanan@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> I think you're conflating two things here. There's the loss
> >>> resulting
> >>>>>>>> from
> >>>>>>>>> using some operators, and loss involved in casting. Dividing an
> >>> integer
> >>>>>>>> by
> >>>>>>>>> another integer to obtain an integer result can result in loss,
> but
> >>>>>>>> there's
> >>>>>>>>> no implicit casting there and no loss due to casting.  Casting an
> >>> integer
> >>>>>>>>> to a float can also result in loss. So dividing an integer by a
> >>> float,
> >>>>>>>> for
> >>>>>>>>> example, with an implicit cast has an additional avenue for loss:
> >>> the
> >>>>>>>>> implicit cast for the operands so that they're of the same type.
> I
> >>>>>>>> believe
> >>>>>>>>> this discussion so far has been about the latter, not the loss
> from
> >>> the
> >>>>>>>>> operations themselves.
> >>>>>>>>>
> >>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
> >>> benjamin.lerer@datastax.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> I would like to try to clarify things a bit to help people to
> >>> understand
> >>>>>>>>>> the true complexity of the problem.
> >>>>>>>>>>
> >>>>>>>>>> The *float *and *double *types are inexact numeric types. Not
> only
> >>> at
> >>>>>>>> the
> >>>>>>>>>> operation level.
> >>>>>>>>>>
> >>>>>>>>>> If you insert 676543.21 in a *float* column and then read it,
> you
> >>> will
> >>>>>>>>>> realize that the value has been truncated to 676543.2.
> >>>>>>>>>>
> >>>>>>>>>> If you want accuracy the only way is to avoid those inexact
> types.
> >>>>>>>>>> Using *decimals
> >>>>>>>>>> *during operations will mitigate the problem but will not remove
> >>> it.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am not
> >>> mistaken
> >>>>>>>> in
> >>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to
> what
> >>> MS SQL
> >>>>>>>>>> server and Oracle do. So all thoses databases will lose
> precision
> >>> if you
> >>>>>>>>>> are not carefull.
> >>>>>>>>>>
> >>>>>>>>>> If you truly need precision you can have it by using exact
> numeric
> >>> types
> >>>>>>>>>> for your data types. Of course it has a cost on performance,
> >>> memory and
> >>>>>>>>>> disk usage.
> >>>>>>>>>>
> >>>>>>>>>> The advantage of the current approach is that it give you the
> >>> choice.
> >>>>>>>> It is
> >>>>>>>>>> up to you to decide what you need for your application. It is
> also
> >>> in
> >>>>>>>> line
> >>>>>>>>>> with the way CQL behave everywhere else.
> >>>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> Muru
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >>> --
> >> Jon Haddad
> >> http://www.rustyrazorblade.com
> >> twitter: rustyrazorblade
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Implicit Casts for Arithmetic Operators

Posted by Benedict Elliott Smith <be...@apache.org>.

FWIW, my meaning of arithmetic in this context extends to any features we have already released (such as aggregates, and perhaps other built-in functions) that operate on the same domain.  We should be consistent, after all.

Whether or not we need to revisit any existing functionality we can figure out after the fact, once we have agreed what our behaviour should be.

I will make this more explicit for the vote, but just to clarify the intention so that we are all discussing the same thing.


> On 20 Nov 2018, at 14:18, Ariel Weisberg <ad...@fastmail.fm> wrote:
> 
> Hi,
> 
> +1
> 
> This is a public API so we will be much better off if we get it right the first time.
> 
> Ariel
> 
>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <jo...@jonhaddad.com> wrote:
>> 
>> Sounds good to me.
>> 
>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <be...@apache.org>
>> wrote:
>> 
>>> So, this thread somewhat petered out.
>>> 
>>> There are still a number of unresolved issues, but to make progress I
>>> wonder if it would first be helpful to have a vote on ensuring we are ANSI
>>> SQL 92 compliant for our arithmetic?  This seems like a sensible baseline,
>>> since we will hopefully minimise surprise to operators this way.
>>> 
>>> If people largely agree, I will call a vote, and we can pick up a couple
>>> of more focused discussions afterwards on how we interpret the leeway it
>>> gives.
>>> 
>>> 
>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> From reading the spec. Precision is always implementation defined. The
>>> spec specifies scale in several cases, but never precision for any type or
>>> operation (addition/subtraction, multiplication, division).
>>>> 
>>>> So we don't implement anything remotely approaching precision and scale
>>> in CQL when it comes to numbers I think? So we aren't going to follow the
>>> spec for scale. We are already pretty far down that road so I would leave
>>> it alone.
>>>> 
>>>> I don't think the spec is asking for the most approximate type. It's
>>> just saying the result is approximate, and the precision is implementation
>>> defined. We could return either float or double. I think if one of the
>>> operands is a double we should return a double because clearly the schema
>>> thought a double was required to represent that number. I would also be in
>>> favor of returning a double all the time so that people can expect a
>>> consistent type from expressions involving approximate numbers.
>>>> 
>>>> I am a big fan of widening for arithmetic expressions in a database to
>>> avoid having to error on overflow. You can go to the trouble of only
>>> widening the minimum amount, but I think it's simpler if we always widen to
>>> bigint and double. This would be something the spec allows.
>>>> 
>>>> Definitely if we can make overflow not occur we should and the spec
>>> allows that. We should also not return different types for the same operand
>>> types just to work around overflow if we detect we need more precision.
>>>> 
>>>> Ariel
>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging this
>>>>> out (and Mike for getting some empirical examples).
>>>>> 
>>>>> We still have to decide on the approximate data type to return; right
>>>>> now, we have float+bigint=double, but float+int=float.  I think this is
>>>>> fairly inconsistent, and either the approximate type should always win,
>>>>> or we should always upgrade to double for mixed operands.
>>>>> 
>>>>> The quoted spec also suggests that decimal+float=float, and decimal
>>>>> +double=double, whereas we currently have decimal+float=decimal, and
>>>>> decimal+double=decimal
>>>>> 
>>>>> If we’re going to go with an approximate operand implying an
>>> approximate
>>>>> result, I think we should do it consistently (and consistent with the
>>>>> SQL92 spec), and have the type of the approximate operand always be the
>>>>> return type.
>>>>> 
>>>>> This would still leave a decision for float+double, though.  The most
>>>>> consistent behaviour with that stated above would be to always take the
>>>>> most approximate type to return (i.e. float), but this would seem to me
>>>>> to be fairly unexpected for the user.
>>>>> 
>>>>> 
>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I agree with what's been said about expectations regarding expressions
>>> involving floating point numbers. I think that if one of the inputs is
>>> approximate then the result should be approximate.
>>>>>> 
>>>>>> One thing we could look at for inspiration is the SQL spec. Not to
>>> follow dogmatically necessarily.
>>>>>> 
>>>>>> From the SQL 92 spec regarding assignment
>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
>>>>>> "
>>>>>>      Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
>>>>>>      FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
>>> mutually
>>>>>>      comparable and mutually assignable. If an assignment would
>>> result
>>>>>>      in a loss of the most significant digits, an exception condition
>>>>>>      is raised. If least significant digits are lost, implementation-
>>>>>>      defined rounding or truncating occurs with no exception
>>> condition
>>>>>>      being raised. The rules for arithmetic are generally governed by
>>>>>>      Subclause 6.12, "<numeric value expression>".
>>>>>> "
>>>>>> 
>>>>>> Section 6.12 numeric value expressions:
>>>>>> "
>>>>>>      1) If the data type of both operands of a dyadic arithmetic
>>> opera-
>>>>>>         tor is exact numeric, then the data type of the result is
>>> exact
>>>>>>         numeric, with precision and scale determined as follows:
>>>>>> ...
>>>>>>      2) If the data type of either operand of a dyadic arithmetic op-
>>>>>>         erator is approximate numeric, then the data type of the re-
>>>>>>         sult is approximate numeric. The precision of the result is
>>>>>>         implementation-defined.
>>>>>> "
>>>>>> 
>>>>>> And this makes sense to me. I think we should only return an exact
>>> result if both of the inputs are exact.
>>>>>> 
>>>>>> I think we might want to look closely at the SQL spec and especially
>>> when the spec requires an error to be generated. Those are sometimes in the
>>> spec to prevent subtle paths to wrong answers. Any time we deviate from the
>>> spec we should be asking why is it in the spec and why are we deviating.
>>>>>> 
>>>>>> Another issue besides overflow handling is how we determine precision
>>> and scale for expressions involving two exact types.
>>>>>> 
>>>>>> Ariel
>>>>>> 
>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I'm not sure if I would prefer the Postgres way of doing things,
>>> which is
>>>>>>> returning just about any type depending on the order of operators.
>>>>>>> Considering it actually mentions in the docs that using
>>> numeric/decimal is
>>>>>>> slow and also multiple times that floating points are inexact. So
>>> doing
>>>>>>> some math with Postgres (9.6.5):
>>>>>>> 
>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double
>>>>>>> precision 2147483647
>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
>>>>>>> SELECT 2147483647::bigint*1.0::real returns double
>>>>>>> SELECT 2147483647::double precision*1::bigint returns double
>>> 2147483647
>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double
>>> 2147483647
>>>>>>> 
>>>>>>> With + - we can get the same amount of mixture of returned types.
>>> There's
>>>>>>> no difference in those calculations, just some casting. To me
>>>>>>> floating-point math indicates inexactness and has errors and whoever
>>> mixes
>>>>>>> up two different types should understand that. If one didn't want
>>> exact
>>>>>>> numeric type, why would the server return such? The floating point
>>> value
>>>>>>> itself could be wrong already before the calculation - trying to say
>>> we do
>>>>>>> it lossless is just wrong.
>>>>>>> 
>>>>>>> Fun with 2.65:
>>>>>>> 
>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
>>>>>>> 
>>>>>>> SELECT round(2.65) returns numeric 4
>>>>>>> SELECT round(2.65::double precision) returns double 4
>>>>>>> 
>>>>>>> SELECT 2.65 * 1 returns double 2.65
>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
>>>>>>> 
>>>>>>> SELECT round(2.65) * 1 returns numeric 3
>>>>>>> SELECT round(2.65) * round(1) returns double 3
>>>>>>> 
>>>>>>> So as we're going to have silly values in any case, why pretend
>>> something
>>>>>>> else? Also, exact calculations are slow if we crunch large amount of
>>>>>>> numbers. I guess I slightly deviated towards Postgres' implemention
>>> in this
>>>>>>> case, but I wish it wasn't used as a benchmark in this case. And most
>>>>>>> importantly, I would definitely want the exact same type returned
>>> each time
>>>>>>> I do a calculation.
>>>>>>> 
>>>>>>> - Micke
>>>>>>> 
>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
>>> benedict@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> As far as I can tell we reached a relatively strong consensus that we
>>>>>>>> should implement lossless casts by default?  Does anyone have
>>> anything more
>>>>>>>> to add?
>>>>>>>> 
>>>>>>>> Looking at the emails, everyone who participated and expressed a
>>>>>>>> preference was in favour of the “Postgres approach” of upcasting to
>>> decimal
>>>>>>>> for mixed float/int operands?
>>>>>>>> 
>>>>>>>> I’d like to get a clear-cut decision on this, so we know what we’re
>>> doing
>>>>>>>> for 4.0.  Then hopefully we can move on to a collective decision on
>>> Ariel’s
>>>>>>>> concerns about overflow, which I think are also pressing -
>>> particularly for
>>>>>>>> tinyint and smallint.  This does also impact implicit casts for mixed
>>>>>>>> integer type operations, but an approach for these will probably
>>> fall out
>>>>>>>> of any decision on overflow.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
>>> murukesh.mohanan@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> I think you're conflating two things here. There's the loss
>>> resulting
>>>>>>>> from
>>>>>>>>> using some operators, and loss involved in casting. Dividing an
>>> integer
>>>>>>>> by
>>>>>>>>> another integer to obtain an integer result can result in loss, but
>>>>>>>> there's
>>>>>>>>> no implicit casting there and no loss due to casting.  Casting an
>>> integer
>>>>>>>>> to a float can also result in loss. So dividing an integer by a
>>> float,
>>>>>>>> for
>>>>>>>>> example, with an implicit cast has an additional avenue for loss:
>>> the
>>>>>>>>> implicit cast for the operands so that they're of the same type. I
>>>>>>>> believe
>>>>>>>>> this discussion so far has been about the latter, not the loss from
>>> the
>>>>>>>>> operations themselves.
>>>>>>>>> 
>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
>>> benjamin.lerer@datastax.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> I would like to try to clarify things a bit to help people to
>>> understand
>>>>>>>>>> the true complexity of the problem.
>>>>>>>>>> 
>>>>>>>>>> The *float *and *double *types are inexact numeric types. Not only
>>> at
>>>>>>>> the
>>>>>>>>>> operation level.
>>>>>>>>>> 
>>>>>>>>>> If you insert 676543.21 in a *float* column and then read it, you
>>> will
>>>>>>>>>> realize that the value has been truncated to 676543.2.
>>>>>>>>>> 
>>>>>>>>>> If you want accuracy the only way is to avoid those inexact types.
>>>>>>>>>> Using *decimals
>>>>>>>>>> *during operations will mitigate the problem but will not remove
>>> it.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am not
>>> mistaken
>>>>>>>> in
>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what
>>> MS SQL
>>>>>>>>>> server and Oracle do. So all thoses databases will lose precision
>>> if you
>>>>>>>>>> are not carefull.
>>>>>>>>>> 
>>>>>>>>>> If you truly need precision you can have it by using exact numeric
>>> types
>>>>>>>>>> for your data types. Of course it has a cost on performance,
>>> memory and
>>>>>>>>>> disk usage.
>>>>>>>>>> 
>>>>>>>>>> The advantage of the current approach is that it give you the
>>> choice.
>>>>>>>> It is
>>>>>>>>>> up to you to decide what you need for your application. It is also
>>> in
>>>>>>>> line
>>>>>>>>>> with the way CQL behave everywhere else.
>>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 
>>>>>>>>> Muru
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Posted by Ariel Weisberg <ad...@fastmail.fm>.

Hi,

+1

This is a public API so we will be much better off if we get it right the first time.

Ariel

> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <jo...@jonhaddad.com> wrote:
> 
> Sounds good to me.
> 
> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <be...@apache.org>
> wrote:
> 
>> So, this thread somewhat petered out.
>> 
>> There are still a number of unresolved issues, but to make progress I
>> wonder if it would first be helpful to have a vote on ensuring we are ANSI
>> SQL 92 compliant for our arithmetic?  This seems like a sensible baseline,
>> since we will hopefully minimise surprise to operators this way.
>> 
>> If people largely agree, I will call a vote, and we can pick up a couple
>> of more focused discussions afterwards on how we interpret the leeway it
>> gives.
>> 
>> 
>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>> 
>>> Hi,
>>> 
>>> From reading the spec. Precision is always implementation defined. The
>> spec specifies scale in several cases, but never precision for any type or
>> operation (addition/subtraction, multiplication, division).
>>> 
>>> So we don't implement anything remotely approaching precision and scale
>> in CQL when it comes to numbers I think? So we aren't going to follow the
>> spec for scale. We are already pretty far down that road so I would leave
>> it alone.
>>> 
>>> I don't think the spec is asking for the most approximate type. It's
>> just saying the result is approximate, and the precision is implementation
>> defined. We could return either float or double. I think if one of the
>> operands is a double we should return a double because clearly the schema
>> thought a double was required to represent that number. I would also be in
>> favor of returning a double all the time so that people can expect a
>> consistent type from expressions involving approximate numbers.
>>> 
>>> I am a big fan of widening for arithmetic expressions in a database to
>> avoid having to error on overflow. You can go to the trouble of only
>> widening the minimum amount, but I think it's simpler if we always widen to
>> bigint and double. This would be something the spec allows.
>>> 
>>> Definitely if we can make overflow not occur we should and the spec
>> allows that. We should also not return different types for the same operand
>> types just to work around overflow if we detect we need more precision.
>>> 
>>> Ariel
>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging this
>>>> out (and Mike for getting some empirical examples).
>>>> 
>>>> We still have to decide on the approximate data type to return; right
>>>> now, we have float+bigint=double, but float+int=float.  I think this is
>>>> fairly inconsistent, and either the approximate type should always win,
>>>> or we should always upgrade to double for mixed operands.
>>>> 
>>>> The quoted spec also suggests that decimal+float=float, and decimal
>>>> +double=double, whereas we currently have decimal+float=decimal, and
>>>> decimal+double=decimal
>>>> 
>>>> If we’re going to go with an approximate operand implying an
>> approximate
>>>> result, I think we should do it consistently (and consistent with the
>>>> SQL92 spec), and have the type of the approximate operand always be the
>>>> return type.
>>>> 
>>>> This would still leave a decision for float+double, though.  The most
>>>> consistent behaviour with that stated above would be to always take the
>>>> most approximate type to return (i.e. float), but this would seem to me
>>>> to be fairly unexpected for the user.
>>>> 
>>>> 
>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I agree with what's been said about expectations regarding expressions
>> involving floating point numbers. I think that if one of the inputs is
>> approximate then the result should be approximate.
>>>>> 
>>>>> One thing we could look at for inspiration is the SQL spec. Not to
>> follow dogmatically necessarily.
>>>>> 
>>>>> From the SQL 92 spec regarding assignment
>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
>>>>> "
>>>>>       Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
>>>>>       FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
>> mutually
>>>>>       comparable and mutually assignable. If an assignment would
>> result
>>>>>       in a loss of the most significant digits, an exception condition
>>>>>       is raised. If least significant digits are lost, implementation-
>>>>>       defined rounding or truncating occurs with no exception
>> condition
>>>>>       being raised. The rules for arithmetic are generally governed by
>>>>>       Subclause 6.12, "<numeric value expression>".
>>>>> "
>>>>> 
>>>>> Section 6.12 numeric value expressions:
>>>>> "
>>>>>       1) If the data type of both operands of a dyadic arithmetic
>> opera-
>>>>>          tor is exact numeric, then the data type of the result is
>> exact
>>>>>          numeric, with precision and scale determined as follows:
>>>>> ...
>>>>>       2) If the data type of either operand of a dyadic arithmetic op-
>>>>>          erator is approximate numeric, then the data type of the re-
>>>>>          sult is approximate numeric. The precision of the result is
>>>>>          implementation-defined.
>>>>> "
>>>>> 
>>>>> And this makes sense to me. I think we should only return an exact
>> result if both of the inputs are exact.
>>>>> 
>>>>> I think we might want to look closely at the SQL spec and especially
>> when the spec requires an error to be generated. Those are sometimes in the
>> spec to prevent subtle paths to wrong answers. Any time we deviate from the
>> spec we should be asking why is it in the spec and why are we deviating.
>>>>> 
>>>>> Another issue besides overflow handling is how we determine precision
>> and scale for expressions involving two exact types.
>>>>> 
>>>>> Ariel
>>>>> 
>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I'm not sure if I would prefer the Postgres way of doing things,
>> which is
>>>>>> returning just about any type depending on the order of operators.
>>>>>> Considering it actually mentions in the docs that using
>> numeric/decimal is
>>>>>> slow and also multiple times that floating points are inexact. So
>> doing
>>>>>> some math with Postgres (9.6.5):
>>>>>> 
>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double
>>>>>> precision 2147483647
>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
>>>>>> SELECT 2147483647::bigint*1.0::real returns double
>>>>>> SELECT 2147483647::double precision*1::bigint returns double
>> 2147483647
>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double
>> 2147483647
>>>>>> 
>>>>>> With + - we can get the same amount of mixture of returned types.
>> There's
>>>>>> no difference in those calculations, just some casting. To me
>>>>>> floating-point math indicates inexactness and has errors and whoever
>> mixes
>>>>>> up two different types should understand that. If one didn't want
>> exact
>>>>>> numeric type, why would the server return such? The floating point
>> value
>>>>>> itself could be wrong already before the calculation - trying to say
>> we do
>>>>>> it lossless is just wrong.
>>>>>> 
>>>>>> Fun with 2.65:
>>>>>> 
>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
>>>>>> 
>>>>>> SELECT round(2.65) returns numeric 4
>>>>>> SELECT round(2.65::double precision) returns double 4
>>>>>> 
>>>>>> SELECT 2.65 * 1 returns double 2.65
>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
>>>>>> 
>>>>>> SELECT round(2.65) * 1 returns numeric 3
>>>>>> SELECT round(2.65) * round(1) returns double 3
>>>>>> 
>>>>>> So as we're going to have silly values in any case, why pretend
>> something
>>>>>> else? Also, exact calculations are slow if we crunch large amount of
>>>>>> numbers. I guess I slightly deviated towards Postgres' implemention
>> in this
>>>>>> case, but I wish it wasn't used as a benchmark in this case. And most
>>>>>> importantly, I would definitely want the exact same type returned
>> each time
>>>>>> I do a calculation.
>>>>>> 
>>>>>> - Micke
>>>>>> 
>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
>> benedict@apache.org>
>>>>>> wrote:
>>>>>> 
>>>>>>> As far as I can tell we reached a relatively strong consensus that we
>>>>>>> should implement lossless casts by default?  Does anyone have
>> anything more
>>>>>>> to add?
>>>>>>> 
>>>>>>> Looking at the emails, everyone who participated and expressed a
>>>>>>> preference was in favour of the “Postgres approach” of upcasting to
>> decimal
>>>>>>> for mixed float/int operands?
>>>>>>> 
>>>>>>> I’d like to get a clear-cut decision on this, so we know what we’re
>> doing
>>>>>>> for 4.0.  Then hopefully we can move on to a collective decision on
>> Ariel’s
>>>>>>> concerns about overflow, which I think are also pressing -
>> particularly for
>>>>>>> tinyint and smallint.  This does also impact implicit casts for mixed
>>>>>>> integer type operations, but an approach for these will probably
>> fall out
>>>>>>> of any decision on overflow.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
>> murukesh.mohanan@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> I think you're conflating two things here. There's the loss
>> resulting
>>>>>>> from
>>>>>>>> using some operators, and loss involved in casting. Dividing an
>> integer
>>>>>>> by
>>>>>>>> another integer to obtain an integer result can result in loss, but
>>>>>>> there's
>>>>>>>> no implicit casting there and no loss due to casting.  Casting an
>> integer
>>>>>>>> to a float can also result in loss. So dividing an integer by a
>> float,
>>>>>>> for
>>>>>>>> example, with an implicit cast has an additional avenue for loss:
>> the
>>>>>>>> implicit cast for the operands so that they're of the same type. I
>>>>>>> believe
>>>>>>>> this discussion so far has been about the latter, not the loss from
>> the
>>>>>>>> operations themselves.
>>>>>>>> 
>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
>> benjamin.lerer@datastax.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I would like to try to clarify things a bit to help people to
>> understand
>>>>>>>>> the true complexity of the problem.
>>>>>>>>> 
>>>>>>>>> The *float *and *double *types are inexact numeric types. Not only
>> at
>>>>>>> the
>>>>>>>>> operation level.
>>>>>>>>> 
>>>>>>>>> If you insert 676543.21 in a *float* column and then read it, you
>> will
>>>>>>>>> realize that the value has been truncated to 676543.2.
>>>>>>>>> 
>>>>>>>>> If you want accuracy the only way is to avoid those inexact types.
>>>>>>>>> Using *decimals
>>>>>>>>> *during operations will mitigate the problem but will not remove
>> it.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am not
>> mistaken
>>>>>>> in
>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what
>> MS SQL
>>>>>>>>> server and Oracle do. So all thoses databases will lose precision
>> if you
>>>>>>>>> are not carefull.
>>>>>>>>> 
>>>>>>>>> If you truly need precision you can have it by using exact numeric
>> types
>>>>>>>>> for your data types. Of course it has a cost on performance,
>> memory and
>>>>>>>>> disk usage.
>>>>>>>>> 
>>>>>>>>> The advantage of the current approach is that it give you the
>> choice.
>>>>>>> It is
>>>>>>>>> up to you to decide what you need for your application. It is also
>> in
>>>>>>> line
>>>>>>>>> with the way CQL behave everywhere else.
>>>>>>>>> 
>>>>>>>> --
>>>>>>>> 
>>>>>>>> Muru
>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

Sounds good to me.

On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <be...@apache.org>
wrote:

> So, this thread somewhat petered out.
>
> There are still a number of unresolved issues, but to make progress I
> wonder if it would first be helpful to have a vote on ensuring we are ANSI
> SQL 92 compliant for our arithmetic?  This seems like a sensible baseline,
> since we will hopefully minimise surprise to operators this way.
>
> If people largely agree, I will call a vote, and we can pick up a couple
> of more focused discussions afterwards on how we interpret the leeway it
> gives.
>
>
> > On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote:
> >
> > Hi,
> >
> > From reading the spec. Precision is always implementation defined. The
> spec specifies scale in several cases, but never precision for any type or
> operation (addition/subtraction, multiplication, division).
> >
> > So we don't implement anything remotely approaching precision and scale
> in CQL when it comes to numbers I think? So we aren't going to follow the
> spec for scale. We are already pretty far down that road so I would leave
> it alone.
> >
> > I don't think the spec is asking for the most approximate type. It's
> just saying the result is approximate, and the precision is implementation
> defined. We could return either float or double. I think if one of the
> operands is a double we should return a double because clearly the schema
> thought a double was required to represent that number. I would also be in
> favor of returning a double all the time so that people can expect a
> consistent type from expressions involving approximate numbers.
> >
> > I am a big fan of widening for arithmetic expressions in a database to
> avoid having to error on overflow. You can go to the trouble of only
> widening the minimum amount, but I think it's simpler if we always widen to
> bigint and double. This would be something the spec allows.
> >
> > Definitely if we can make overflow not occur we should and the spec
> allows that. We should also not return different types for the same operand
> types just to work around overflow if we detect we need more precision.
> >
> > Ariel
> > On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
> >> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging this
> >> out (and Mike for getting some empirical examples).
> >>
> >> We still have to decide on the approximate data type to return; right
> >> now, we have float+bigint=double, but float+int=float.  I think this is
> >> fairly inconsistent, and either the approximate type should always win,
> >> or we should always upgrade to double for mixed operands.
> >>
> >> The quoted spec also suggests that decimal+float=float, and decimal
> >> +double=double, whereas we currently have decimal+float=decimal, and
> >> decimal+double=decimal
> >>
> >> If we’re going to go with an approximate operand implying an
> approximate
> >> result, I think we should do it consistently (and consistent with the
> >> SQL92 spec), and have the type of the approximate operand always be the
> >> return type.
> >>
> >> This would still leave a decision for float+double, though.  The most
> >> consistent behaviour with that stated above would be to always take the
> >> most approximate type to return (i.e. float), but this would seem to me
> >> to be fairly unexpected for the user.
> >>
> >>
> >>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I agree with what's been said about expectations regarding expressions
> involving floating point numbers. I think that if one of the inputs is
> approximate then the result should be approximate.
> >>>
> >>> One thing we could look at for inspiration is the SQL spec. Not to
> follow dogmatically necessarily.
> >>>
> >>> From the SQL 92 spec regarding assignment
> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
> >>> "
> >>>        Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
> >>>        FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
> mutually
> >>>        comparable and mutually assignable. If an assignment would
> result
> >>>        in a loss of the most significant digits, an exception condition
> >>>        is raised. If least significant digits are lost, implementation-
> >>>        defined rounding or truncating occurs with no exception
> condition
> >>>        being raised. The rules for arithmetic are generally governed by
> >>>        Subclause 6.12, "<numeric value expression>".
> >>> "
> >>>
> >>> Section 6.12 numeric value expressions:
> >>> "
> >>>        1) If the data type of both operands of a dyadic arithmetic
> opera-
> >>>           tor is exact numeric, then the data type of the result is
> exact
> >>>           numeric, with precision and scale determined as follows:
> >>> ...
> >>>        2) If the data type of either operand of a dyadic arithmetic op-
> >>>           erator is approximate numeric, then the data type of the re-
> >>>           sult is approximate numeric. The precision of the result is
> >>>           implementation-defined.
> >>> "
> >>>
> >>> And this makes sense to me. I think we should only return an exact
> result if both of the inputs are exact.
> >>>
> >>> I think we might want to look closely at the SQL spec and especially
> when the spec requires an error to be generated. Those are sometimes in the
> spec to prevent subtle paths to wrong answers. Any time we deviate from the
> spec we should be asking why is it in the spec and why are we deviating.
> >>>
> >>> Another issue besides overflow handling is how we determine precision
> and scale for expressions involving two exact types.
> >>>
> >>> Ariel
> >>>
> >>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> >>>> Hi,
> >>>>
> >>>> I'm not sure if I would prefer the Postgres way of doing things,
> which is
> >>>> returning just about any type depending on the order of operators.
> >>>> Considering it actually mentions in the docs that using
> numeric/decimal is
> >>>> slow and also multiple times that floating points are inexact. So
> doing
> >>>> some math with Postgres (9.6.5):
> >>>>
> >>>> SELECT 2147483647::bigint*1.0::double precision returns double
> >>>> precision 2147483647
> >>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
> >>>> SELECT 2147483647::bigint*1.0::real returns double
> >>>> SELECT 2147483647::double precision*1::bigint returns double
> 2147483647
> >>>> SELECT 2147483647::double precision*1.0::bigint returns double
> 2147483647
> >>>>
> >>>> With + - we can get the same amount of mixture of returned types.
> There's
> >>>> no difference in those calculations, just some casting. To me
> >>>> floating-point math indicates inexactness and has errors and whoever
> mixes
> >>>> up two different types should understand that. If one didn't want
> exact
> >>>> numeric type, why would the server return such? The floating point
> value
> >>>> itself could be wrong already before the calculation - trying to say
> we do
> >>>> it lossless is just wrong.
> >>>>
> >>>> Fun with 2.65:
> >>>>
> >>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
> >>>> SELECT 2.65::double precision * 1::int returns double 2.65
> >>>>
> >>>> SELECT round(2.65) returns numeric 4
> >>>> SELECT round(2.65::double precision) returns double 4
> >>>>
> >>>> SELECT 2.65 * 1 returns double 2.65
> >>>> SELECT 2.65 * 1::bigint returns numeric 2.65
> >>>> SELECT 2.65 * 1.0 returns numeric 2.650
> >>>> SELECT 2.65 * 1.0::double precision returns double 2.65
> >>>>
> >>>> SELECT round(2.65) * 1 returns numeric 3
> >>>> SELECT round(2.65) * round(1) returns double 3
> >>>>
> >>>> So as we're going to have silly values in any case, why pretend
> something
> >>>> else? Also, exact calculations are slow if we crunch large amount of
> >>>> numbers. I guess I slightly deviated towards Postgres' implemention
> in this
> >>>> case, but I wish it wasn't used as a benchmark in this case. And most
> >>>> importantly, I would definitely want the exact same type returned
> each time
> >>>> I do a calculation.
> >>>>
> >>>> - Micke
> >>>>
> >>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
> benedict@apache.org>
> >>>> wrote:
> >>>>
> >>>>> As far as I can tell we reached a relatively strong consensus that we
> >>>>> should implement lossless casts by default?  Does anyone have
> anything more
> >>>>> to add?
> >>>>>
> >>>>> Looking at the emails, everyone who participated and expressed a
> >>>>> preference was in favour of the “Postgres approach” of upcasting to
> decimal
> >>>>> for mixed float/int operands?
> >>>>>
> >>>>> I’d like to get a clear-cut decision on this, so we know what we’re
> doing
> >>>>> for 4.0.  Then hopefully we can move on to a collective decision on
> Ariel’s
> >>>>> concerns about overflow, which I think are also pressing -
> particularly for
> >>>>> tinyint and smallint.  This does also impact implicit casts for mixed
> >>>>> integer type operations, but an approach for these will probably
> fall out
> >>>>> of any decision on overflow.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
> murukesh.mohanan@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> I think you're conflating two things here. There's the loss
> resulting
> >>>>> from
> >>>>>> using some operators, and loss involved in casting. Dividing an
> integer
> >>>>> by
> >>>>>> another integer to obtain an integer result can result in loss, but
> >>>>> there's
> >>>>>> no implicit casting there and no loss due to casting.  Casting an
> integer
> >>>>>> to a float can also result in loss. So dividing an integer by a
> float,
> >>>>> for
> >>>>>> example, with an implicit cast has an additional avenue for loss:
> the
> >>>>>> implicit cast for the operands so that they're of the same type. I
> >>>>> believe
> >>>>>> this discussion so far has been about the latter, not the loss from
> the
> >>>>>> operations themselves.
> >>>>>>
> >>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
> benjamin.lerer@datastax.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I would like to try to clarify things a bit to help people to
> understand
> >>>>>>> the true complexity of the problem.
> >>>>>>>
> >>>>>>> The *float *and *double *types are inexact numeric types. Not only
> at
> >>>>> the
> >>>>>>> operation level.
> >>>>>>>
> >>>>>>> If you insert 676543.21 in a *float* column and then read it, you
> will
> >>>>>>> realize that the value has been truncated to 676543.2.
> >>>>>>>
> >>>>>>> If you want accuracy the only way is to avoid those inexact types.
> >>>>>>> Using *decimals
> >>>>>>> *during operations will mitigate the problem but will not remove
> it.
> >>>>>>>
> >>>>>>>
> >>>>>>> I do not recall PostgreSQL behaving has described. If I am not
> mistaken
> >>>>> in
> >>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what
> MS SQL
> >>>>>>> server and Oracle do. So all thoses databases will lose precision
> if you
> >>>>>>> are not carefull.
> >>>>>>>
> >>>>>>> If you truly need precision you can have it by using exact numeric
> types
> >>>>>>> for your data types. Of course it has a cost on performance,
> memory and
> >>>>>>> disk usage.
> >>>>>>>
> >>>>>>> The advantage of the current approach is that it give you the
> choice.
> >>>>> It is
> >>>>>>> up to you to decide what you need for your application. It is also
> in
> >>>>> line
> >>>>>>> with the way CQL behave everywhere else.
> >>>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Muru
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>
> >>>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
> --
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Implicit Casts for Arithmetic Operators

Posted by Benedict Elliott Smith <be...@apache.org>.

So, this thread somewhat petered out.

There are still a number of unresolved issues, but to make progress I wonder if it would first be helpful to have a vote on ensuring we are ANSI SQL 92 compliant for our arithmetic?  This seems like a sensible baseline, since we will hopefully minimise surprise to operators this way.

If people largely agree, I will call a vote, and we can pick up a couple of more focused discussions afterwards on how we interpret the leeway it gives.


> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote:
> 
> Hi,
> 
> From reading the spec. Precision is always implementation defined. The spec specifies scale in several cases, but never precision for any type or operation (addition/subtraction, multiplication, division).
> 
> So we don't implement anything remotely approaching precision and scale in CQL when it comes to numbers I think? So we aren't going to follow the spec for scale. We are already pretty far down that road so I would leave it alone. 
> 
> I don't think the spec is asking for the most approximate type. It's just saying the result is approximate, and the precision is implementation defined. We could return either float or double. I think if one of the operands is a double we should return a double because clearly the schema thought a double was required to represent that number. I would also be in favor of returning a double all the time so that people can expect a consistent type from expressions involving approximate numbers.
> 
> I am a big fan of widening for arithmetic expressions in a database to avoid having to error on overflow. You can go to the trouble of only widening the minimum amount, but I think it's simpler if we always widen to bigint and double. This would be something the spec allows.
> 
> Definitely if we can make overflow not occur we should and the spec allows that. We should also not return different types for the same operand types just to work around overflow if we detect we need more precision.
> 
> Ariel
> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging this 
>> out (and Mike for getting some empirical examples).
>> 
>> We still have to decide on the approximate data type to return; right 
>> now, we have float+bigint=double, but float+int=float.  I think this is 
>> fairly inconsistent, and either the approximate type should always win, 
>> or we should always upgrade to double for mixed operands.
>> 
>> The quoted spec also suggests that decimal+float=float, and decimal
>> +double=double, whereas we currently have decimal+float=decimal, and 
>> decimal+double=decimal
>> 
>> If we’re going to go with an approximate operand implying an approximate 
>> result, I think we should do it consistently (and consistent with the 
>> SQL92 spec), and have the type of the approximate operand always be the 
>> return type.
>> 
>> This would still leave a decision for float+double, though.  The most 
>> consistent behaviour with that stated above would be to always take the 
>> most approximate type to return (i.e. float), but this would seem to me 
>> to be fairly unexpected for the user.
>> 
>> 
>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>> 
>>> Hi,
>>> 
>>> I agree with what's been said about expectations regarding expressions involving floating point numbers. I think that if one of the inputs is approximate then the result should be approximate.
>>> 
>>> One thing we could look at for inspiration is the SQL spec. Not to follow dogmatically necessarily.
>>> 
>>> From the SQL 92 spec regarding assignment http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
>>> "
>>>        Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
>>>        FLOAT, REAL, and DOUBLE PRECISION are numbers and are all mutually
>>>        comparable and mutually assignable. If an assignment would result
>>>        in a loss of the most significant digits, an exception condition
>>>        is raised. If least significant digits are lost, implementation-
>>>        defined rounding or truncating occurs with no exception condition
>>>        being raised. The rules for arithmetic are generally governed by
>>>        Subclause 6.12, "<numeric value expression>".
>>> "
>>> 
>>> Section 6.12 numeric value expressions:
>>> "
>>>        1) If the data type of both operands of a dyadic arithmetic opera-
>>>           tor is exact numeric, then the data type of the result is exact
>>>           numeric, with precision and scale determined as follows:
>>> ...
>>>        2) If the data type of either operand of a dyadic arithmetic op-
>>>           erator is approximate numeric, then the data type of the re-
>>>           sult is approximate numeric. The precision of the result is
>>>           implementation-defined.
>>> "
>>> 
>>> And this makes sense to me. I think we should only return an exact result if both of the inputs are exact.
>>> 
>>> I think we might want to look closely at the SQL spec and especially when the spec requires an error to be generated. Those are sometimes in the spec to prevent subtle paths to wrong answers. Any time we deviate from the spec we should be asking why is it in the spec and why are we deviating.
>>> 
>>> Another issue besides overflow handling is how we determine precision and scale for expressions involving two exact types.
>>> 
>>> Ariel
>>> 
>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
>>>> Hi,
>>>> 
>>>> I'm not sure if I would prefer the Postgres way of doing things, which is
>>>> returning just about any type depending on the order of operators.
>>>> Considering it actually mentions in the docs that using numeric/decimal is
>>>> slow and also multiple times that floating points are inexact. So doing
>>>> some math with Postgres (9.6.5):
>>>> 
>>>> SELECT 2147483647::bigint*1.0::double precision returns double
>>>> precision 2147483647
>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
>>>> SELECT 2147483647::bigint*1.0::real returns double
>>>> SELECT 2147483647::double precision*1::bigint returns double 2147483647
>>>> SELECT 2147483647::double precision*1.0::bigint returns double 2147483647
>>>> 
>>>> With + - we can get the same amount of mixture of returned types. There's
>>>> no difference in those calculations, just some casting. To me
>>>> floating-point math indicates inexactness and has errors and whoever mixes
>>>> up two different types should understand that. If one didn't want exact
>>>> numeric type, why would the server return such? The floating point value
>>>> itself could be wrong already before the calculation - trying to say we do
>>>> it lossless is just wrong.
>>>> 
>>>> Fun with 2.65:
>>>> 
>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
>>>> SELECT 2.65::double precision * 1::int returns double 2.65
>>>> 
>>>> SELECT round(2.65) returns numeric 4
>>>> SELECT round(2.65::double precision) returns double 4
>>>> 
>>>> SELECT 2.65 * 1 returns double 2.65
>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
>>>> SELECT 2.65 * 1.0 returns numeric 2.650
>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
>>>> 
>>>> SELECT round(2.65) * 1 returns numeric 3
>>>> SELECT round(2.65) * round(1) returns double 3
>>>> 
>>>> So as we're going to have silly values in any case, why pretend something
>>>> else? Also, exact calculations are slow if we crunch large amount of
>>>> numbers. I guess I slightly deviated towards Postgres' implemention in this
>>>> case, but I wish it wasn't used as a benchmark in this case. And most
>>>> importantly, I would definitely want the exact same type returned each time
>>>> I do a calculation.
>>>> 
>>>> - Micke
>>>> 
>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <be...@apache.org>
>>>> wrote:
>>>> 
>>>>> As far as I can tell we reached a relatively strong consensus that we
>>>>> should implement lossless casts by default?  Does anyone have anything more
>>>>> to add?
>>>>> 
>>>>> Looking at the emails, everyone who participated and expressed a
>>>>> preference was in favour of the “Postgres approach” of upcasting to decimal
>>>>> for mixed float/int operands?
>>>>> 
>>>>> I’d like to get a clear-cut decision on this, so we know what we’re doing
>>>>> for 4.0.  Then hopefully we can move on to a collective decision on Ariel’s
>>>>> concerns about overflow, which I think are also pressing - particularly for
>>>>> tinyint and smallint.  This does also impact implicit casts for mixed
>>>>> integer type operations, but an approach for these will probably fall out
>>>>> of any decision on overflow.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <mu...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> I think you're conflating two things here. There's the loss resulting
>>>>> from
>>>>>> using some operators, and loss involved in casting. Dividing an integer
>>>>> by
>>>>>> another integer to obtain an integer result can result in loss, but
>>>>> there's
>>>>>> no implicit casting there and no loss due to casting.  Casting an integer
>>>>>> to a float can also result in loss. So dividing an integer by a float,
>>>>> for
>>>>>> example, with an implicit cast has an additional avenue for loss: the
>>>>>> implicit cast for the operands so that they're of the same type. I
>>>>> believe
>>>>>> this discussion so far has been about the latter, not the loss from the
>>>>>> operations themselves.
>>>>>> 
>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <be...@datastax.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I would like to try to clarify things a bit to help people to understand
>>>>>>> the true complexity of the problem.
>>>>>>> 
>>>>>>> The *float *and *double *types are inexact numeric types. Not only at
>>>>> the
>>>>>>> operation level.
>>>>>>> 
>>>>>>> If you insert 676543.21 in a *float* column and then read it, you will
>>>>>>> realize that the value has been truncated to 676543.2.
>>>>>>> 
>>>>>>> If you want accuracy the only way is to avoid those inexact types.
>>>>>>> Using *decimals
>>>>>>> *during operations will mitigate the problem but will not remove it.
>>>>>>> 
>>>>>>> 
>>>>>>> I do not recall PostgreSQL behaving has described. If I am not mistaken
>>>>> in
>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what MS SQL
>>>>>>> server and Oracle do. So all thoses databases will lose precision if you
>>>>>>> are not carefull.
>>>>>>> 
>>>>>>> If you truly need precision you can have it by using exact numeric types
>>>>>>> for your data types. Of course it has a cost on performance, memory and
>>>>>>> disk usage.
>>>>>>> 
>>>>>>> The advantage of the current approach is that it give you the choice.
>>>>> It is
>>>>>>> up to you to decide what you need for your application. It is also in
>>>>> line
>>>>>>> with the way CQL behave everywhere else.
>>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> Muru
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> 
>>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Posted by Ariel Weisberg <ar...@weisberg.ws>.

Hi,

From reading the spec. Precision is always implementation defined. The spec specifies scale in several cases, but never precision for any type or operation (addition/subtraction, multiplication, division).

So we don't implement anything remotely approaching precision and scale in CQL when it comes to numbers I think? So we aren't going to follow the spec for scale. We are already pretty far down that road so I would leave it alone. 

I don't think the spec is asking for the most approximate type. It's just saying the result is approximate, and the precision is implementation defined. We could return either float or double. I think if one of the operands is a double we should return a double because clearly the schema thought a double was required to represent that number. I would also be in favor of returning a double all the time so that people can expect a consistent type from expressions involving approximate numbers.

I am a big fan of widening for arithmetic expressions in a database to avoid having to error on overflow. You can go to the trouble of only widening the minimum amount, but I think it's simpler if we always widen to bigint and double. This would be something the spec allows.

Definitely if we can make overflow not occur we should and the spec allows that. We should also not return different types for the same operand types just to work around overflow if we detect we need more precision.

Ariel
On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging this 
> out (and Mike for getting some empirical examples).
> 
> We still have to decide on the approximate data type to return; right 
> now, we have float+bigint=double, but float+int=float.  I think this is 
> fairly inconsistent, and either the approximate type should always win, 
> or we should always upgrade to double for mixed operands.
> 
> The quoted spec also suggests that decimal+float=float, and decimal
> +double=double, whereas we currently have decimal+float=decimal, and 
> decimal+double=decimal
> 
> If we’re going to go with an approximate operand implying an approximate 
> result, I think we should do it consistently (and consistent with the 
> SQL92 spec), and have the type of the approximate operand always be the 
> return type.
> 
> This would still leave a decision for float+double, though.  The most 
> consistent behaviour with that stated above would be to always take the 
> most approximate type to return (i.e. float), but this would seem to me 
> to be fairly unexpected for the user.
> 
> 
> > On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote:
> > 
> > Hi,
> > 
> > I agree with what's been said about expectations regarding expressions involving floating point numbers. I think that if one of the inputs is approximate then the result should be approximate.
> > 
> > One thing we could look at for inspiration is the SQL spec. Not to follow dogmatically necessarily.
> > 
> > From the SQL 92 spec regarding assignment http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
> > "
> >         Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
> >         FLOAT, REAL, and DOUBLE PRECISION are numbers and are all mutually
> >         comparable and mutually assignable. If an assignment would result
> >         in a loss of the most significant digits, an exception condition
> >         is raised. If least significant digits are lost, implementation-
> >         defined rounding or truncating occurs with no exception condition
> >         being raised. The rules for arithmetic are generally governed by
> >         Subclause 6.12, "<numeric value expression>".
> > "
> > 
> > Section 6.12 numeric value expressions:
> > "
> >         1) If the data type of both operands of a dyadic arithmetic opera-
> >            tor is exact numeric, then the data type of the result is exact
> >            numeric, with precision and scale determined as follows:
> > ...
> >         2) If the data type of either operand of a dyadic arithmetic op-
> >            erator is approximate numeric, then the data type of the re-
> >            sult is approximate numeric. The precision of the result is
> >            implementation-defined.
> > "
> > 
> > And this makes sense to me. I think we should only return an exact result if both of the inputs are exact.
> > 
> > I think we might want to look closely at the SQL spec and especially when the spec requires an error to be generated. Those are sometimes in the spec to prevent subtle paths to wrong answers. Any time we deviate from the spec we should be asking why is it in the spec and why are we deviating.
> > 
> > Another issue besides overflow handling is how we determine precision and scale for expressions involving two exact types.
> > 
> > Ariel
> > 
> > On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> >> Hi,
> >> 
> >> I'm not sure if I would prefer the Postgres way of doing things, which is
> >> returning just about any type depending on the order of operators.
> >> Considering it actually mentions in the docs that using numeric/decimal is
> >> slow and also multiple times that floating points are inexact. So doing
> >> some math with Postgres (9.6.5):
> >> 
> >> SELECT 2147483647::bigint*1.0::double precision returns double
> >> precision 2147483647
> >> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
> >> SELECT 2147483647::bigint*1.0::real returns double
> >> SELECT 2147483647::double precision*1::bigint returns double 2147483647
> >> SELECT 2147483647::double precision*1.0::bigint returns double 2147483647
> >> 
> >> With + - we can get the same amount of mixture of returned types. There's
> >> no difference in those calculations, just some casting. To me
> >> floating-point math indicates inexactness and has errors and whoever mixes
> >> up two different types should understand that. If one didn't want exact
> >> numeric type, why would the server return such? The floating point value
> >> itself could be wrong already before the calculation - trying to say we do
> >> it lossless is just wrong.
> >> 
> >> Fun with 2.65:
> >> 
> >> SELECT 2.65::real * 1::int returns double 2.65000009536743
> >> SELECT 2.65::double precision * 1::int returns double 2.65
> >> 
> >> SELECT round(2.65) returns numeric 4
> >> SELECT round(2.65::double precision) returns double 4
> >> 
> >> SELECT 2.65 * 1 returns double 2.65
> >> SELECT 2.65 * 1::bigint returns numeric 2.65
> >> SELECT 2.65 * 1.0 returns numeric 2.650
> >> SELECT 2.65 * 1.0::double precision returns double 2.65
> >> 
> >> SELECT round(2.65) * 1 returns numeric 3
> >> SELECT round(2.65) * round(1) returns double 3
> >> 
> >> So as we're going to have silly values in any case, why pretend something
> >> else? Also, exact calculations are slow if we crunch large amount of
> >> numbers. I guess I slightly deviated towards Postgres' implemention in this
> >> case, but I wish it wasn't used as a benchmark in this case. And most
> >> importantly, I would definitely want the exact same type returned each time
> >> I do a calculation.
> >> 
> >>  - Micke
> >> 
> >> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <be...@apache.org>
> >> wrote:
> >> 
> >>> As far as I can tell we reached a relatively strong consensus that we
> >>> should implement lossless casts by default?  Does anyone have anything more
> >>> to add?
> >>> 
> >>> Looking at the emails, everyone who participated and expressed a
> >>> preference was in favour of the “Postgres approach” of upcasting to decimal
> >>> for mixed float/int operands?
> >>> 
> >>> I’d like to get a clear-cut decision on this, so we know what we’re doing
> >>> for 4.0.  Then hopefully we can move on to a collective decision on Ariel’s
> >>> concerns about overflow, which I think are also pressing - particularly for
> >>> tinyint and smallint.  This does also impact implicit casts for mixed
> >>> integer type operations, but an approach for these will probably fall out
> >>> of any decision on overflow.
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <mu...@gmail.com>
> >>> wrote:
> >>>> 
> >>>> I think you're conflating two things here. There's the loss resulting
> >>> from
> >>>> using some operators, and loss involved in casting. Dividing an integer
> >>> by
> >>>> another integer to obtain an integer result can result in loss, but
> >>> there's
> >>>> no implicit casting there and no loss due to casting.  Casting an integer
> >>>> to a float can also result in loss. So dividing an integer by a float,
> >>> for
> >>>> example, with an implicit cast has an additional avenue for loss: the
> >>>> implicit cast for the operands so that they're of the same type. I
> >>> believe
> >>>> this discussion so far has been about the latter, not the loss from the
> >>>> operations themselves.
> >>>> 
> >>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <be...@datastax.com>
> >>>> wrote:
> >>>> 
> >>>>> Hi,
> >>>>> 
> >>>>> I would like to try to clarify things a bit to help people to understand
> >>>>> the true complexity of the problem.
> >>>>> 
> >>>>> The *float *and *double *types are inexact numeric types. Not only at
> >>> the
> >>>>> operation level.
> >>>>> 
> >>>>> If you insert 676543.21 in a *float* column and then read it, you will
> >>>>> realize that the value has been truncated to 676543.2.
> >>>>> 
> >>>>> If you want accuracy the only way is to avoid those inexact types.
> >>>>> Using *decimals
> >>>>> *during operations will mitigate the problem but will not remove it.
> >>>>> 
> >>>>> 
> >>>>> I do not recall PostgreSQL behaving has described. If I am not mistaken
> >>> in
> >>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what MS SQL
> >>>>> server and Oracle do. So all thoses databases will lose precision if you
> >>>>> are not carefull.
> >>>>> 
> >>>>> If you truly need precision you can have it by using exact numeric types
> >>>>> for your data types. Of course it has a cost on performance, memory and
> >>>>> disk usage.
> >>>>> 
> >>>>> The advantage of the current approach is that it give you the choice.
> >>> It is
> >>>>> up to you to decide what you need for your application. It is also in
> >>> line
> >>>>> with the way CQL behave everywhere else.
> >>>>> 
> >>>> --
> >>>> 
> >>>> Muru
> >>> 
> >>> 
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>> 
> >>> 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Posted by Benedict Elliott Smith <be...@apache.org>.

If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging this out (and Mike for getting some empirical examples).

We still have to decide on the approximate data type to return; right now, we have float+bigint=double, but float+int=float.  I think this is fairly inconsistent, and either the approximate type should always win, or we should always upgrade to double for mixed operands.

The quoted spec also suggests that decimal+float=float, and decimal+double=double, whereas we currently have decimal+float=decimal, and decimal+double=decimal

If we’re going to go with an approximate operand implying an approximate result, I think we should do it consistently (and consistent with the SQL92 spec), and have the type of the approximate operand always be the return type.

This would still leave a decision for float+double, though.  The most consistent behaviour with that stated above would be to always take the most approximate type to return (i.e. float), but this would seem to me to be fairly unexpected for the user.


> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote:
> 
> Hi,
> 
> I agree with what's been said about expectations regarding expressions involving floating point numbers. I think that if one of the inputs is approximate then the result should be approximate.
> 
> One thing we could look at for inspiration is the SQL spec. Not to follow dogmatically necessarily.
> 
> From the SQL 92 spec regarding assignment http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
> "
>         Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
>         FLOAT, REAL, and DOUBLE PRECISION are numbers and are all mutually
>         comparable and mutually assignable. If an assignment would result
>         in a loss of the most significant digits, an exception condition
>         is raised. If least significant digits are lost, implementation-
>         defined rounding or truncating occurs with no exception condition
>         being raised. The rules for arithmetic are generally governed by
>         Subclause 6.12, "<numeric value expression>".
> "
> 
> Section 6.12 numeric value expressions:
> "
>         1) If the data type of both operands of a dyadic arithmetic opera-
>            tor is exact numeric, then the data type of the result is exact
>            numeric, with precision and scale determined as follows:
> ...
>         2) If the data type of either operand of a dyadic arithmetic op-
>            erator is approximate numeric, then the data type of the re-
>            sult is approximate numeric. The precision of the result is
>            implementation-defined.
> "
> 
> And this makes sense to me. I think we should only return an exact result if both of the inputs are exact.
> 
> I think we might want to look closely at the SQL spec and especially when the spec requires an error to be generated. Those are sometimes in the spec to prevent subtle paths to wrong answers. Any time we deviate from the spec we should be asking why is it in the spec and why are we deviating.
> 
> Another issue besides overflow handling is how we determine precision and scale for expressions involving two exact types.
> 
> Ariel
> 
> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
>> Hi,
>> 
>> I'm not sure if I would prefer the Postgres way of doing things, which is
>> returning just about any type depending on the order of operators.
>> Considering it actually mentions in the docs that using numeric/decimal is
>> slow and also multiple times that floating points are inexact. So doing
>> some math with Postgres (9.6.5):
>> 
>> SELECT 2147483647::bigint*1.0::double precision returns double
>> precision 2147483647
>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
>> SELECT 2147483647::bigint*1.0::real returns double
>> SELECT 2147483647::double precision*1::bigint returns double 2147483647
>> SELECT 2147483647::double precision*1.0::bigint returns double 2147483647
>> 
>> With + - we can get the same amount of mixture of returned types. There's
>> no difference in those calculations, just some casting. To me
>> floating-point math indicates inexactness and has errors and whoever mixes
>> up two different types should understand that. If one didn't want exact
>> numeric type, why would the server return such? The floating point value
>> itself could be wrong already before the calculation - trying to say we do
>> it lossless is just wrong.
>> 
>> Fun with 2.65:
>> 
>> SELECT 2.65::real * 1::int returns double 2.65000009536743
>> SELECT 2.65::double precision * 1::int returns double 2.65
>> 
>> SELECT round(2.65) returns numeric 4
>> SELECT round(2.65::double precision) returns double 4
>> 
>> SELECT 2.65 * 1 returns double 2.65
>> SELECT 2.65 * 1::bigint returns numeric 2.65
>> SELECT 2.65 * 1.0 returns numeric 2.650
>> SELECT 2.65 * 1.0::double precision returns double 2.65
>> 
>> SELECT round(2.65) * 1 returns numeric 3
>> SELECT round(2.65) * round(1) returns double 3
>> 
>> So as we're going to have silly values in any case, why pretend something
>> else? Also, exact calculations are slow if we crunch large amount of
>> numbers. I guess I slightly deviated towards Postgres' implemention in this
>> case, but I wish it wasn't used as a benchmark in this case. And most
>> importantly, I would definitely want the exact same type returned each time
>> I do a calculation.
>> 
>>  - Micke
>> 
>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <be...@apache.org>
>> wrote:
>> 
>>> As far as I can tell we reached a relatively strong consensus that we
>>> should implement lossless casts by default?  Does anyone have anything more
>>> to add?
>>> 
>>> Looking at the emails, everyone who participated and expressed a
>>> preference was in favour of the “Postgres approach” of upcasting to decimal
>>> for mixed float/int operands?
>>> 
>>> I’d like to get a clear-cut decision on this, so we know what we’re doing
>>> for 4.0.  Then hopefully we can move on to a collective decision on Ariel’s
>>> concerns about overflow, which I think are also pressing - particularly for
>>> tinyint and smallint.  This does also impact implicit casts for mixed
>>> integer type operations, but an approach for these will probably fall out
>>> of any decision on overflow.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <mu...@gmail.com>
>>> wrote:
>>>> 
>>>> I think you're conflating two things here. There's the loss resulting
>>> from
>>>> using some operators, and loss involved in casting. Dividing an integer
>>> by
>>>> another integer to obtain an integer result can result in loss, but
>>> there's
>>>> no implicit casting there and no loss due to casting.  Casting an integer
>>>> to a float can also result in loss. So dividing an integer by a float,
>>> for
>>>> example, with an implicit cast has an additional avenue for loss: the
>>>> implicit cast for the operands so that they're of the same type. I
>>> believe
>>>> this discussion so far has been about the latter, not the loss from the
>>>> operations themselves.
>>>> 
>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <be...@datastax.com>
>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I would like to try to clarify things a bit to help people to understand
>>>>> the true complexity of the problem.
>>>>> 
>>>>> The *float *and *double *types are inexact numeric types. Not only at
>>> the
>>>>> operation level.
>>>>> 
>>>>> If you insert 676543.21 in a *float* column and then read it, you will
>>>>> realize that the value has been truncated to 676543.2.
>>>>> 
>>>>> If you want accuracy the only way is to avoid those inexact types.
>>>>> Using *decimals
>>>>> *during operations will mitigate the problem but will not remove it.
>>>>> 
>>>>> 
>>>>> I do not recall PostgreSQL behaving has described. If I am not mistaken
>>> in
>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what MS SQL
>>>>> server and Oracle do. So all thoses databases will lose precision if you
>>>>> are not carefull.
>>>>> 
>>>>> If you truly need precision you can have it by using exact numeric types
>>>>> for your data types. Of course it has a cost on performance, memory and
>>>>> disk usage.
>>>>> 
>>>>> The advantage of the current approach is that it give you the choice.
>>> It is
>>>>> up to you to decide what you need for your application. It is also in
>>> line
>>>>> with the way CQL behave everywhere else.
>>>>> 
>>>> --
>>>> 
>>>> Muru
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Posted by Ariel Weisberg <ar...@weisberg.ws>.

Hi,

I agree with what's been said about expectations regarding expressions involving floating point numbers. I think that if one of the inputs is approximate then the result should be approximate.

One thing we could look at for inspiration is the SQL spec. Not to follow dogmatically necessarily.

From the SQL 92 spec regarding assignment http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
"
         Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
         FLOAT, REAL, and DOUBLE PRECISION are numbers and are all mutually
         comparable and mutually assignable. If an assignment would result
         in a loss of the most significant digits, an exception condition
         is raised. If least significant digits are lost, implementation-
         defined rounding or truncating occurs with no exception condition
         being raised. The rules for arithmetic are generally governed by
         Subclause 6.12, "<numeric value expression>".
"

Section 6.12 numeric value expressions:
"
         1) If the data type of both operands of a dyadic arithmetic opera-
            tor is exact numeric, then the data type of the result is exact
            numeric, with precision and scale determined as follows:
...
         2) If the data type of either operand of a dyadic arithmetic op-
            erator is approximate numeric, then the data type of the re-
            sult is approximate numeric. The precision of the result is
            implementation-defined.
"

And this makes sense to me. I think we should only return an exact result if both of the inputs are exact.

I think we might want to look closely at the SQL spec and especially when the spec requires an error to be generated. Those are sometimes in the spec to prevent subtle paths to wrong answers. Any time we deviate from the spec we should be asking why is it in the spec and why are we deviating.

Another issue besides overflow handling is how we determine precision and scale for expressions involving two exact types.

Ariel

On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> Hi,
> 
> I'm not sure if I would prefer the Postgres way of doing things, which is
> returning just about any type depending on the order of operators.
> Considering it actually mentions in the docs that using numeric/decimal is
> slow and also multiple times that floating points are inexact. So doing
> some math with Postgres (9.6.5):
> 
> SELECT 2147483647::bigint*1.0::double precision returns double
> precision 2147483647
> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
> SELECT 2147483647::bigint*1.0::real returns double
> SELECT 2147483647::double precision*1::bigint returns double 2147483647
> SELECT 2147483647::double precision*1.0::bigint returns double 2147483647
> 
> With + - we can get the same amount of mixture of returned types. There's
> no difference in those calculations, just some casting. To me
> floating-point math indicates inexactness and has errors and whoever mixes
> up two different types should understand that. If one didn't want exact
> numeric type, why would the server return such? The floating point value
> itself could be wrong already before the calculation - trying to say we do
> it lossless is just wrong.
> 
> Fun with 2.65:
> 
> SELECT 2.65::real * 1::int returns double 2.65000009536743
> SELECT 2.65::double precision * 1::int returns double 2.65
> 
> SELECT round(2.65) returns numeric 4
> SELECT round(2.65::double precision) returns double 4
> 
> SELECT 2.65 * 1 returns double 2.65
> SELECT 2.65 * 1::bigint returns numeric 2.65
> SELECT 2.65 * 1.0 returns numeric 2.650
> SELECT 2.65 * 1.0::double precision returns double 2.65
> 
> SELECT round(2.65) * 1 returns numeric 3
> SELECT round(2.65) * round(1) returns double 3
> 
> So as we're going to have silly values in any case, why pretend something
> else? Also, exact calculations are slow if we crunch large amount of
> numbers. I guess I slightly deviated towards Postgres' implemention in this
> case, but I wish it wasn't used as a benchmark in this case. And most
> importantly, I would definitely want the exact same type returned each time
> I do a calculation.
> 
>   - Micke
> 
> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <be...@apache.org>
> wrote:
> 
> > As far as I can tell we reached a relatively strong consensus that we
> > should implement lossless casts by default?  Does anyone have anything more
> > to add?
> >
> > Looking at the emails, everyone who participated and expressed a
> > preference was in favour of the “Postgres approach” of upcasting to decimal
> > for mixed float/int operands?
> >
> > I’d like to get a clear-cut decision on this, so we know what we’re doing
> > for 4.0.  Then hopefully we can move on to a collective decision on Ariel’s
> > concerns about overflow, which I think are also pressing - particularly for
> > tinyint and smallint.  This does also impact implicit casts for mixed
> > integer type operations, but an approach for these will probably fall out
> > of any decision on overflow.
> >
> >
> >
> >
> >
> >
> > > On 3 Oct 2018, at 11:38, Murukesh Mohanan <mu...@gmail.com>
> > wrote:
> > >
> > > I think you're conflating two things here. There's the loss resulting
> > from
> > > using some operators, and loss involved in casting. Dividing an integer
> > by
> > > another integer to obtain an integer result can result in loss, but
> > there's
> > > no implicit casting there and no loss due to casting.  Casting an integer
> > > to a float can also result in loss. So dividing an integer by a float,
> > for
> > > example, with an implicit cast has an additional avenue for loss: the
> > > implicit cast for the operands so that they're of the same type. I
> > believe
> > > this discussion so far has been about the latter, not the loss from the
> > > operations themselves.
> > >
> > > On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <be...@datastax.com>
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> I would like to try to clarify things a bit to help people to understand
> > >> the true complexity of the problem.
> > >>
> > >> The *float *and *double *types are inexact numeric types. Not only at
> > the
> > >> operation level.
> > >>
> > >> If you insert 676543.21 in a *float* column and then read it, you will
> > >> realize that the value has been truncated to 676543.2.
> > >>
> > >> If you want accuracy the only way is to avoid those inexact types.
> > >> Using *decimals
> > >> *during operations will mitigate the problem but will not remove it.
> > >>
> > >>
> > >> I do not recall PostgreSQL behaving has described. If I am not mistaken
> > in
> > >> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what MS SQL
> > >> server and Oracle do. So all thoses databases will lose precision if you
> > >> are not carefull.
> > >>
> > >> If you truly need precision you can have it by using exact numeric types
> > >> for your data types. Of course it has a cost on performance, memory and
> > >> disk usage.
> > >>
> > >> The advantage of the current approach is that it give you the choice.
> > It is
> > >> up to you to decide what you need for your application. It is also in
> > line
> > >> with the way CQL behave everywhere else.
> > >>
> > > --
> > >
> > > Muru
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Posted by Michael Burman <ya...@iki.fi>.

Hi,

I'm not sure if I would prefer the Postgres way of doing things, which is
returning just about any type depending on the order of operators.
Considering it actually mentions in the docs that using numeric/decimal is
slow and also multiple times that floating points are inexact. So doing
some math with Postgres (9.6.5):

SELECT 2147483647::bigint*1.0::double precision returns double
precision 2147483647
SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
SELECT 2147483647::bigint*1.0::real returns double
SELECT 2147483647::double precision*1::bigint returns double 2147483647
SELECT 2147483647::double precision*1.0::bigint returns double 2147483647

With + - we can get the same amount of mixture of returned types. There's
no difference in those calculations, just some casting. To me
floating-point math indicates inexactness and has errors and whoever mixes
up two different types should understand that. If one didn't want exact
numeric type, why would the server return such? The floating point value
itself could be wrong already before the calculation - trying to say we do
it lossless is just wrong.

Fun with 2.65:

SELECT 2.65::real * 1::int returns double 2.65000009536743
SELECT 2.65::double precision * 1::int returns double 2.65

SELECT round(2.65) returns numeric 4
SELECT round(2.65::double precision) returns double 4

SELECT 2.65 * 1 returns double 2.65
SELECT 2.65 * 1::bigint returns numeric 2.65
SELECT 2.65 * 1.0 returns numeric 2.650
SELECT 2.65 * 1.0::double precision returns double 2.65

SELECT round(2.65) * 1 returns numeric 3
SELECT round(2.65) * round(1) returns double 3

So as we're going to have silly values in any case, why pretend something
else? Also, exact calculations are slow if we crunch large amount of
numbers. I guess I slightly deviated towards Postgres' implemention in this
case, but I wish it wasn't used as a benchmark in this case. And most
importantly, I would definitely want the exact same type returned each time
I do a calculation.

  - Micke

On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <be...@apache.org>
wrote:

> As far as I can tell we reached a relatively strong consensus that we
> should implement lossless casts by default?  Does anyone have anything more
> to add?
>
> Looking at the emails, everyone who participated and expressed a
> preference was in favour of the “Postgres approach” of upcasting to decimal
> for mixed float/int operands?
>
> I’d like to get a clear-cut decision on this, so we know what we’re doing
> for 4.0.  Then hopefully we can move on to a collective decision on Ariel’s
> concerns about overflow, which I think are also pressing - particularly for
> tinyint and smallint.  This does also impact implicit casts for mixed
> integer type operations, but an approach for these will probably fall out
> of any decision on overflow.
>
>
>
>
>
>
> > On 3 Oct 2018, at 11:38, Murukesh Mohanan <mu...@gmail.com>
> wrote:
> >
> > I think you're conflating two things here. There's the loss resulting
> from
> > using some operators, and loss involved in casting. Dividing an integer
> by
> > another integer to obtain an integer result can result in loss, but
> there's
> > no implicit casting there and no loss due to casting.  Casting an integer
> > to a float can also result in loss. So dividing an integer by a float,
> for
> > example, with an implicit cast has an additional avenue for loss: the
> > implicit cast for the operands so that they're of the same type. I
> believe
> > this discussion so far has been about the latter, not the loss from the
> > operations themselves.
> >
> > On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <be...@datastax.com>
> > wrote:
> >
> >> Hi,
> >>
> >> I would like to try to clarify things a bit to help people to understand
> >> the true complexity of the problem.
> >>
> >> The *float *and *double *types are inexact numeric types. Not only at
> the
> >> operation level.
> >>
> >> If you insert 676543.21 in a *float* column and then read it, you will
> >> realize that the value has been truncated to 676543.2.
> >>
> >> If you want accuracy the only way is to avoid those inexact types.
> >> Using *decimals
> >> *during operations will mitigate the problem but will not remove it.
> >>
> >>
> >> I do not recall PostgreSQL behaving has described. If I am not mistaken
> in
> >> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what MS SQL
> >> server and Oracle do. So all thoses databases will lose precision if you
> >> are not carefull.
> >>
> >> If you truly need precision you can have it by using exact numeric types
> >> for your data types. Of course it has a cost on performance, memory and
> >> disk usage.
> >>
> >> The advantage of the current approach is that it give you the choice.
> It is
> >> up to you to decide what you need for your application. It is also in
> line
> >> with the way CQL behave everywhere else.
> >>
> > --
> >
> > Muru
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Implicit Casts for Arithmetic Operators

Posted by Benedict Elliott Smith <be...@apache.org>.

As far as I can tell we reached a relatively strong consensus that we should implement lossless casts by default?  Does anyone have anything more to add?

Looking at the emails, everyone who participated and expressed a preference was in favour of the “Postgres approach” of upcasting to decimal for mixed float/int operands?

I’d like to get a clear-cut decision on this, so we know what we’re doing for 4.0.  Then hopefully we can move on to a collective decision on Ariel’s concerns about overflow, which I think are also pressing - particularly for tinyint and smallint.  This does also impact implicit casts for mixed integer type operations, but an approach for these will probably fall out of any decision on overflow.






> On 3 Oct 2018, at 11:38, Murukesh Mohanan <mu...@gmail.com> wrote:
> 
> I think you're conflating two things here. There's the loss resulting from
> using some operators, and loss involved in casting. Dividing an integer by
> another integer to obtain an integer result can result in loss, but there's
> no implicit casting there and no loss due to casting.  Casting an integer
> to a float can also result in loss. So dividing an integer by a float, for
> example, with an implicit cast has an additional avenue for loss: the
> implicit cast for the operands so that they're of the same type. I believe
> this discussion so far has been about the latter, not the loss from the
> operations themselves.
> 
> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <be...@datastax.com>
> wrote:
> 
>> Hi,
>> 
>> I would like to try to clarify things a bit to help people to understand
>> the true complexity of the problem.
>> 
>> The *float *and *double *types are inexact numeric types. Not only at the
>> operation level.
>> 
>> If you insert 676543.21 in a *float* column and then read it, you will
>> realize that the value has been truncated to 676543.2.
>> 
>> If you want accuracy the only way is to avoid those inexact types.
>> Using *decimals
>> *during operations will mitigate the problem but will not remove it.
>> 
>> 
>> I do not recall PostgreSQL behaving has described. If I am not mistaken in
>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what MS SQL
>> server and Oracle do. So all thoses databases will lose precision if you
>> are not carefull.
>> 
>> If you truly need precision you can have it by using exact numeric types
>> for your data types. Of course it has a cost on performance, memory and
>> disk usage.
>> 
>> The advantage of the current approach is that it give you the choice. It is
>> up to you to decide what you need for your application. It is also in line
>> with the way CQL behave everywhere else.
>> 
> -- 
> 
> Muru


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Posted by Murukesh Mohanan <mu...@gmail.com>.

I think you're conflating two things here. There's the loss resulting from
using some operators, and loss involved in casting. Dividing an integer by
another integer to obtain an integer result can result in loss, but there's
no implicit casting there and no loss due to casting.  Casting an integer
to a float can also result in loss. So dividing an integer by a float, for
example, with an implicit cast has an additional avenue for loss: the
implicit cast for the operands so that they're of the same type. I believe
this discussion so far has been about the latter, not the loss from the
operations themselves.

On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <be...@datastax.com>
wrote:

> Hi,
>
> I would like to try to clarify things a bit to help people to understand
> the true complexity of the problem.
>
> The *float *and *double *types are inexact numeric types. Not only at the
> operation level.
>
> If you insert 676543.21 in a *float* column and then read it, you will
> realize that the value has been truncated to 676543.2.
>
> If you want accuracy the only way is to avoid those inexact types.
> Using *decimals
> *during operations will mitigate the problem but will not remove it.
>
>
> I do not recall PostgreSQL behaving has described. If I am not mistaken in
> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what MS SQL
> server and Oracle do. So all thoses databases will lose precision if you
> are not carefull.
>
> If you truly need precision you can have it by using exact numeric types
> for your data types. Of course it has a cost on performance, memory and
> disk usage.
>
> The advantage of the current approach is that it give you the choice. It is
> up to you to decide what you need for your application. It is also in line
> with the way CQL behave everywhere else.
>
-- 

Muru

Re: Implicit Casts for Arithmetic Operators

Posted by Benjamin Lerer <be...@datastax.com>.

Hi,

I would like to try to clarify things a bit to help people to understand
the true complexity of the problem.

The *float *and *double *types are inexact numeric types. Not only at the
operation level.

If you insert 676543.21 in a *float* column and then read it, you will
realize that the value has been truncated to 676543.2.

If you want accuracy the only way is to avoid those inexact types.
Using *decimals
*during operations will mitigate the problem but will not remove it.


I do not recall PostgreSQL behaving has described. If I am not mistaken in
PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what MS SQL
server and Oracle do. So all thoses databases will lose precision if you
are not carefull.

If you truly need precision you can have it by using exact numeric types
for your data types. Of course it has a cost on performance, memory and
disk usage.

The advantage of the current approach is that it give you the choice. It is
up to you to decide what you need for your application. It is also in line
with the way CQL behave everywhere else.

Re: Implicit Casts for Arithmetic Operators

Posted by Dinesh Joshi <di...@yahoo.com.INVALID>.

Thanks for starting this discussion. I’m definitely in the lossless camp. I’m curious of the performance impact of choosing lossless vs lossy.

Dinesh

> On Oct 2, 2018, at 10:54 AM, Benedict Elliott Smith <be...@apache.org> wrote:
> 
> I agree, in broad strokes at least.  Interested to hear others’ positions.
> 
> 
> 
>> On 2 Oct 2018, at 16:44, Ariel Weisberg <ar...@weisberg.ws> wrote:
>> 
>> Hi,
>> 
>> I think overflow and the role of widening conversions are pretty linked so I'll continue to inject that into this discussion. Also overflow is much worse since most applications won't be impacted by a loss of precision when an expression involves an int and float, but will care quite a bit if they get some nonsense wrapped number in an integer only expression.
>> 
>> For VoltDB in practice we didn't run into issues with applications not making progress due to exceptions with real data due to the widening conversions. The range of double and long are pretty big and that hides wrap around/infinity. 
>> 
>> I think the proposal of having all operations return a decimal is attractive in that these expressions always result in a consistent type. Two pain points might be whether client languages have decimal support and whether there is a performance issue? The nice thing about always returning decimal is we can sidestep the issue of overflow.
>> 
>> I would start with seeing if that's acceptable, and if it isn't then look at other approaches like returning a variety of types such when doing int + int return a bigint or int + float return a double.
>> 
>> If we take an approach that allows overflow the ideal end state IMO would be to get all users to run Cassandra in way that overflow results in an error even in the context of aggregation. The road to get there is tricky, but maybe start by having it as an opt in tunable in cassandra.yaml. I don't know how/when we could ever change that as a default and it's unfortunate having an option like this that 99% won't know they should flip.
>> 
>> It seems like having the default throw on overflow is not as bad as it sounds if you do the widening conversions since most people won't run into them. The change in the column types of results sets actually sounds worse if we want to also improve aggregrations. Many applications won't notice if the client library abstracts that away, but I think there are still cases where people would notice the type changing.
>> 
>> Ariel
>> 
>>> On Tue, Oct 2, 2018, at 11:09 AM, Benedict Elliott Smith wrote:
>>> This (overflow) is an excellent point, but this also affects 
>>> aggregations which were introduced a long time ago.  They already 
>>> inherit Java semantics for all of the relevant types (silent wrap 
>>> around).  We probably want to be consistent, meaning either changing 
>>> aggregations (which incurs a cost for changing API) or continuing the 
>>> java semantics here.
>>> 
>>> This is why having these discussions explicitly in the community before 
>>> a release is so critical, in my view.  It’s very easy for these semantic 
>>> changes to go unnoticed on a JIRA, and then ossify.
>>> 
>>> 
>>>> On 2 Oct 2018, at 15:48, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I think we should decide based on what is least surprising as you mention, but isn't overridden by some other concern.
>>>> 
>>>> It seems to me the priorities are
>>>> 
>>>> * Correctness
>>>> * Performance
>>>> * User visible complexity
>>>> * Developer visible complexity
>>>> 
>>>> Defaulting to silent implicit data loss is not ideal from a correctness standpoint.
>>>> 
>>>> Doing something better like using wider types doesn't seem like a performance issue.
>>>> 
>>>> From a user standpoint doing something less lossy doesn't look more complex as long as it's consistent, and documented and doesn't change from version to version.
>>>> 
>>>> There is some developer complexity, but this is a public API and we only get one shot at this. 
>>>> 
>>>> I wonder about how overflow is handled as well. In VoltDB I think we threw on overflow and tended to just do widening conversions to make that less common. We didn't imitate another database (as far as I know) we just went with what least likely to silently corrupt data.
>>>> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213 <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213>
>>>> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764 <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764>
>>>> 
>>>> Ariel
>>>> 
>>>>> On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
>>>>> ç introduced arithmetic operators, and alongside these 
>>>>> came implicit casts for their operands.  There is a semantic decision to 
>>>>> be made, and I think the project would do well to explicitly raise this 
>>>>> kind of question for wider input before release, since the project is 
>>>>> bound by them forever more.
>>>>> 
>>>>> In this case, the choice is between lossy and lossless casts for 
>>>>> operations involving integers and floating point numbers.  In essence, 
>>>>> should:
>>>>> 
>>>>> (1) float + int = float, double + bigint = double; or
>>>>> (2) float + int = double, double + bigint = decimal; or
>>>>> (3) float + int = decimal, double + bigint = decimal
>>>>> 
>>>>> Option 1 performs a lossy implicit cast from int -> float, or bigint -> 
>>>>> double.  Simply casting between these types changes the value.  This is 
>>>>> what MS SQL Server does.
>>>>> Options 2 and 3 cast without loss of precision, and 3 (or thereabouts) 
>>>>> is what PostgreSQL does.
>>>>> 
>>>>> The question I’m interested in is not just which is the right decision, 
>>>>> but how the right decision should be arrived at.  My view is that we 
>>>>> should primarily aim for least surprise to the user, but I’m keen to 
>>>>> hear from others.
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Posted by Benedict Elliott Smith <be...@apache.org>.

I agree, in broad strokes at least.  Interested to hear others’ positions.



> On 2 Oct 2018, at 16:44, Ariel Weisberg <ar...@weisberg.ws> wrote:
> 
> Hi,
> 
> I think overflow and the role of widening conversions are pretty linked so I'll continue to inject that into this discussion. Also overflow is much worse since most applications won't be impacted by a loss of precision when an expression involves an int and float, but will care quite a bit if they get some nonsense wrapped number in an integer only expression.
> 
> For VoltDB in practice we didn't run into issues with applications not making progress due to exceptions with real data due to the widening conversions. The range of double and long are pretty big and that hides wrap around/infinity. 
> 
> I think the proposal of having all operations return a decimal is attractive in that these expressions always result in a consistent type. Two pain points might be whether client languages have decimal support and whether there is a performance issue? The nice thing about always returning decimal is we can sidestep the issue of overflow.
> 
> I would start with seeing if that's acceptable, and if it isn't then look at other approaches like returning a variety of types such when doing int + int return a bigint or int + float return a double.
> 
> If we take an approach that allows overflow the ideal end state IMO would be to get all users to run Cassandra in way that overflow results in an error even in the context of aggregation. The road to get there is tricky, but maybe start by having it as an opt in tunable in cassandra.yaml. I don't know how/when we could ever change that as a default and it's unfortunate having an option like this that 99% won't know they should flip.
> 
> It seems like having the default throw on overflow is not as bad as it sounds if you do the widening conversions since most people won't run into them. The change in the column types of results sets actually sounds worse if we want to also improve aggregrations. Many applications won't notice if the client library abstracts that away, but I think there are still cases where people would notice the type changing.
> 
> Ariel
> 
>> On Tue, Oct 2, 2018, at 11:09 AM, Benedict Elliott Smith wrote:
>> This (overflow) is an excellent point, but this also affects 
>> aggregations which were introduced a long time ago.  They already 
>> inherit Java semantics for all of the relevant types (silent wrap 
>> around).  We probably want to be consistent, meaning either changing 
>> aggregations (which incurs a cost for changing API) or continuing the 
>> java semantics here.
>> 
>> This is why having these discussions explicitly in the community before 
>> a release is so critical, in my view.  It’s very easy for these semantic 
>> changes to go unnoticed on a JIRA, and then ossify.
>> 
>> 
>>> On 2 Oct 2018, at 15:48, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>> 
>>> Hi,
>>> 
>>> I think we should decide based on what is least surprising as you mention, but isn't overridden by some other concern.
>>> 
>>> It seems to me the priorities are
>>> 
>>> * Correctness
>>> * Performance
>>> * User visible complexity
>>> * Developer visible complexity
>>> 
>>> Defaulting to silent implicit data loss is not ideal from a correctness standpoint.
>>> 
>>> Doing something better like using wider types doesn't seem like a performance issue.
>>> 
>>> From a user standpoint doing something less lossy doesn't look more complex as long as it's consistent, and documented and doesn't change from version to version.
>>> 
>>> There is some developer complexity, but this is a public API and we only get one shot at this. 
>>> 
>>> I wonder about how overflow is handled as well. In VoltDB I think we threw on overflow and tended to just do widening conversions to make that less common. We didn't imitate another database (as far as I know) we just went with what least likely to silently corrupt data.
>>> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213 <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213>
>>> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764 <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764>
>>> 
>>> Ariel
>>> 
>>>> On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
>>>> ç introduced arithmetic operators, and alongside these 
>>>> came implicit casts for their operands.  There is a semantic decision to 
>>>> be made, and I think the project would do well to explicitly raise this 
>>>> kind of question for wider input before release, since the project is 
>>>> bound by them forever more.
>>>> 
>>>> In this case, the choice is between lossy and lossless casts for 
>>>> operations involving integers and floating point numbers.  In essence, 
>>>> should:
>>>> 
>>>> (1) float + int = float, double + bigint = double; or
>>>> (2) float + int = double, double + bigint = decimal; or
>>>> (3) float + int = decimal, double + bigint = decimal
>>>> 
>>>> Option 1 performs a lossy implicit cast from int -> float, or bigint -> 
>>>> double.  Simply casting between these types changes the value.  This is 
>>>> what MS SQL Server does.
>>>> Options 2 and 3 cast without loss of precision, and 3 (or thereabouts) 
>>>> is what PostgreSQL does.
>>>> 
>>>> The question I’m interested in is not just which is the right decision, 
>>>> but how the right decision should be arrived at.  My view is that we 
>>>> should primarily aim for least surprise to the user, but I’m keen to 
>>>> hear from others.
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>>> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Posted by Ariel Weisberg <ar...@weisberg.ws>.

Hi,

I think overflow and the role of widening conversions are pretty linked so I'll continue to inject that into this discussion. Also overflow is much worse since most applications won't be impacted by a loss of precision when an expression involves an int and float, but will care quite a bit if they get some nonsense wrapped number in an integer only expression.

For VoltDB in practice we didn't run into issues with applications not making progress due to exceptions with real data due to the widening conversions. The range of double and long are pretty big and that hides wrap around/infinity. 

I think the proposal of having all operations return a decimal is attractive in that these expressions always result in a consistent type. Two pain points might be whether client languages have decimal support and whether there is a performance issue? The nice thing about always returning decimal is we can sidestep the issue of overflow.

I would start with seeing if that's acceptable, and if it isn't then look at other approaches like returning a variety of types such when doing int + int return a bigint or int + float return a double.

If we take an approach that allows overflow the ideal end state IMO would be to get all users to run Cassandra in way that overflow results in an error even in the context of aggregation. The road to get there is tricky, but maybe start by having it as an opt in tunable in cassandra.yaml. I don't know how/when we could ever change that as a default and it's unfortunate having an option like this that 99% won't know they should flip.

It seems like having the default throw on overflow is not as bad as it sounds if you do the widening conversions since most people won't run into them. The change in the column types of results sets actually sounds worse if we want to also improve aggregrations. Many applications won't notice if the client library abstracts that away, but I think there are still cases where people would notice the type changing.

Ariel

On Tue, Oct 2, 2018, at 11:09 AM, Benedict Elliott Smith wrote:
> This (overflow) is an excellent point, but this also affects 
> aggregations which were introduced a long time ago.  They already 
> inherit Java semantics for all of the relevant types (silent wrap 
> around).  We probably want to be consistent, meaning either changing 
> aggregations (which incurs a cost for changing API) or continuing the 
> java semantics here.
> 
> This is why having these discussions explicitly in the community before 
> a release is so critical, in my view.  It’s very easy for these semantic 
> changes to go unnoticed on a JIRA, and then ossify.
> 
> 
> > On 2 Oct 2018, at 15:48, Ariel Weisberg <ar...@weisberg.ws> wrote:
> > 
> > Hi,
> > 
> > I think we should decide based on what is least surprising as you mention, but isn't overridden by some other concern.
> > 
> > It seems to me the priorities are
> > 
> > * Correctness
> > * Performance
> > * User visible complexity
> > * Developer visible complexity
> > 
> > Defaulting to silent implicit data loss is not ideal from a correctness standpoint.
> > 
> > Doing something better like using wider types doesn't seem like a performance issue.
> > 
> > From a user standpoint doing something less lossy doesn't look more complex as long as it's consistent, and documented and doesn't change from version to version.
> > 
> > There is some developer complexity, but this is a public API and we only get one shot at this. 
> > 
> > I wonder about how overflow is handled as well. In VoltDB I think we threw on overflow and tended to just do widening conversions to make that less common. We didn't imitate another database (as far as I know) we just went with what least likely to silently corrupt data.
> > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213 <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213>
> > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764 <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764>
> > 
> > Ariel
> > 
> > On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
> >> ç introduced arithmetic operators, and alongside these 
> >> came implicit casts for their operands.  There is a semantic decision to 
> >> be made, and I think the project would do well to explicitly raise this 
> >> kind of question for wider input before release, since the project is 
> >> bound by them forever more.
> >> 
> >> In this case, the choice is between lossy and lossless casts for 
> >> operations involving integers and floating point numbers.  In essence, 
> >> should:
> >> 
> >> (1) float + int = float, double + bigint = double; or
> >> (2) float + int = double, double + bigint = decimal; or
> >> (3) float + int = decimal, double + bigint = decimal
> >> 
> >> Option 1 performs a lossy implicit cast from int -> float, or bigint -> 
> >> double.  Simply casting between these types changes the value.  This is 
> >> what MS SQL Server does.
> >> Options 2 and 3 cast without loss of precision, and 3 (or thereabouts) 
> >> is what PostgreSQL does.
> >> 
> >> The question I’m interested in is not just which is the right decision, 
> >> but how the right decision should be arrived at.  My view is that we 
> >> should primarily aim for least surprise to the user, but I’m keen to 
> >> hear from others.
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
> >> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
> >> 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
> > For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Posted by Benedict Elliott Smith <be...@apache.org>.

This (overflow) is an excellent point, but this also affects aggregations which were introduced a long time ago.  They already inherit Java semantics for all of the relevant types (silent wrap around).  We probably want to be consistent, meaning either changing aggregations (which incurs a cost for changing API) or continuing the java semantics here.

This is why having these discussions explicitly in the community before a release is so critical, in my view.  It’s very easy for these semantic changes to go unnoticed on a JIRA, and then ossify.


> On 2 Oct 2018, at 15:48, Ariel Weisberg <ar...@weisberg.ws> wrote:
> 
> Hi,
> 
> I think we should decide based on what is least surprising as you mention, but isn't overridden by some other concern.
> 
> It seems to me the priorities are
> 
> * Correctness
> * Performance
> * User visible complexity
> * Developer visible complexity
> 
> Defaulting to silent implicit data loss is not ideal from a correctness standpoint.
> 
> Doing something better like using wider types doesn't seem like a performance issue.
> 
> From a user standpoint doing something less lossy doesn't look more complex as long as it's consistent, and documented and doesn't change from version to version.
> 
> There is some developer complexity, but this is a public API and we only get one shot at this. 
> 
> I wonder about how overflow is handled as well. In VoltDB I think we threw on overflow and tended to just do widening conversions to make that less common. We didn't imitate another database (as far as I know) we just went with what least likely to silently corrupt data.
> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213 <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213>
> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764 <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764>
> 
> Ariel
> 
> On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
>> ç introduced arithmetic operators, and alongside these 
>> came implicit casts for their operands.  There is a semantic decision to 
>> be made, and I think the project would do well to explicitly raise this 
>> kind of question for wider input before release, since the project is 
>> bound by them forever more.
>> 
>> In this case, the choice is between lossy and lossless casts for 
>> operations involving integers and floating point numbers.  In essence, 
>> should:
>> 
>> (1) float + int = float, double + bigint = double; or
>> (2) float + int = double, double + bigint = decimal; or
>> (3) float + int = decimal, double + bigint = decimal
>> 
>> Option 1 performs a lossy implicit cast from int -> float, or bigint -> 
>> double.  Simply casting between these types changes the value.  This is 
>> what MS SQL Server does.
>> Options 2 and 3 cast without loss of precision, and 3 (or thereabouts) 
>> is what PostgreSQL does.
>> 
>> The question I’m interested in is not just which is the right decision, 
>> but how the right decision should be arrived at.  My view is that we 
>> should primarily aim for least surprise to the user, but I’m keen to 
>> hear from others.
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
> For additional commands, e-mail: dev-help@cassandra.apache.org <ma...@cassandra.apache.org>

Re: Implicit Casts for Arithmetic Operators

Posted by Ariel Weisberg <ar...@weisberg.ws>.

Hi,

I think we should decide based on what is least surprising as you mention, but isn't overridden by some other concern.

It seems to me the priorities are

* Correctness
* Performance
* User visible complexity
* Developer visible complexity

Defaulting to silent implicit data loss is not ideal from a correctness standpoint.

Doing something better like using wider types doesn't seem like a performance issue.

From a user standpoint doing something less lossy doesn't look more complex as long as it's consistent, and documented and doesn't change from version to version.

There is some developer complexity, but this is a public API and we only get one shot at this. 

I wonder about how overflow is handled as well. In VoltDB I think we threw on overflow and tended to just do widening conversions to make that less common. We didn't imitate another database (as far as I know) we just went with what least likely to silently corrupt data.
https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213
https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764

Ariel

On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
> ç introduced arithmetic operators, and alongside these 
> came implicit casts for their operands.  There is a semantic decision to 
> be made, and I think the project would do well to explicitly raise this 
> kind of question for wider input before release, since the project is 
> bound by them forever more.
> 
> In this case, the choice is between lossy and lossless casts for 
> operations involving integers and floating point numbers.  In essence, 
> should:
> 
> (1) float + int = float, double + bigint = double; or
> (2) float + int = double, double + bigint = decimal; or
> (3) float + int = decimal, double + bigint = decimal
> 
> Option 1 performs a lossy implicit cast from int -> float, or bigint -> 
> double.  Simply casting between these types changes the value.  This is 
> what MS SQL Server does.
> Options 2 and 3 cast without loss of precision, and 3 (or thereabouts) 
> is what PostgreSQL does.
> 
> The question I’m interested in is not just which is the right decision, 
> but how the right decision should be arrived at.  My view is that we 
> should primarily aim for least surprise to the user, but I’m keen to 
> hear from others.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org