You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Rommel Quintanilla <ro...@blazingdb.com> on 2020/06/21 23:00:39 UTC

Question about timestampdiff

Hi, I'm investigating the behavior of the functions timestampdiff with different time-units.

I found that in the case of the time-units: second, minute, hour, and day, the logical plan returned by Calcite is consistent. Since the input is a timestamp in milliseconds, in the same way, the result is in milliseconds. 

For example, for the following query:
"select l_shipdate, l_commitdate, timestampdiff(DAY, l_commitdate, l_shipdate) as diff from lineitem limit 5"

its logical plan produced is:

LogicalSort(fetch=[5])
  LogicalProject(l_shipdate=[$10], l_commitdate=[$11], diff=[CAST(/INT(Reinterpret(-($10, $11)), 86400000)):INTEGER])
    LogicalTableScan(table=[[main, lineitem]])

So far, so good. However, for the month and year case, the output is not what I would expect. For the query:

"select l_shipdate, l_commitdate, timestampdiff(MONTH, l_commitdate, l_shipdate) as diff from lineitem limit 5"

its logical plan produced is:

LogicalSort(fetch=[5])
  LogicalProject(l_shipdate=[$10], l_commitdate=[$11], diff=[CAST(Reinterpret(-($10, $11))):INTEGER])
    LogicalTableScan(table=[[main, lineitem]])

What I expected is that the subtraction is also divided by a month in milliseconds, something like: 
  LogicalProject(l_shipdate=[$10], l_commitdate=[$11], [CAST(/INT(Reinterpret(-($10, $11)), 2592000000)):INTEGER])

Doesn't seem to be a bug in Calcite, because the processing of constants in the unit tests passes OK:

SqlOperatorBaseTest.java:
..
    tester.checkScalar("{fn TIMESTAMPDIFF(HOUR,"
        + " TIMESTAMP '2014-03-29 12:34:56',"
        + " TIMESTAMP '2014-03-29 12:34:56')}", "0", "INTEGER NOT NULL");
    tester.checkScalar("{fn TIMESTAMPDIFF(MONTH,"
        + " TIMESTAMP '2019-09-01 00:00:00',"
        + " TIMESTAMP '2020-03-01 00:00:00')}", "6", "INTEGER NOT NULL");
..

So I wonder if there is something I'm not considering that is making me think wrong. Please, any ideas?

Re: Question about timestampdiff

Posted by Julian Hyde <jh...@apache.org>.
Is your question about semantics or implementation?

If it's semantics, it's useful if you provide an example where Calcite
gives a different result than another DBMS. I find
https://rextester.com/l/mysql_online_compiler useful for that.

If it's implementation, you should know that YEAR and MONTH have very
different semantics and code paths to time units DAY and smaller.
Calcite uses interval arithmetic, e.g.

  (DATE '1971-01-01' - DATE '1970-01-01') MONTH

whose behavior is specified in the SQL standard. For us, TIMESTAMPDIFF
is just syntactic sugar for that.

Julian

On Sun, Jun 21, 2020 at 7:25 PM Rui Wang <am...@apache.org> wrote:
>
> I think it is because there is no clear mapping for how to convert a month
> (and then same to year) in interval to seconds/milliseconds. Does a month
> have 30 days, 31 days, 28 days or 29 days? (Probably can find an answer in
> SQL standard).
>
>
> -Rui
>
>
>
> On Sun, Jun 21, 2020 at 4:00 PM Rommel Quintanilla <ro...@blazingdb.com>
> wrote:
>
> > Hi, I'm investigating the behavior of the functions timestampdiff with
> > different time-units.
> >
> > I found that in the case of the time-units: second, minute, hour, and day,
> > the logical plan returned by Calcite is consistent. Since the input is a
> > timestamp in milliseconds, in the same way, the result is in milliseconds.
> >
> > For example, for the following query:
> > "select l_shipdate, l_commitdate, timestampdiff(DAY, l_commitdate,
> > l_shipdate) as diff from lineitem limit 5"
> >
> > its logical plan produced is:
> >
> > LogicalSort(fetch=[5])
> >   LogicalProject(l_shipdate=[$10], l_commitdate=[$11],
> > diff=[CAST(/INT(Reinterpret(-($10, $11)), 86400000)):INTEGER])
> >     LogicalTableScan(table=[[main, lineitem]])
> >
> > So far, so good. However, for the month and year case, the output is not
> > what I would expect. For the query:
> >
> > "select l_shipdate, l_commitdate, timestampdiff(MONTH, l_commitdate,
> > l_shipdate) as diff from lineitem limit 5"
> >
> > its logical plan produced is:
> >
> > LogicalSort(fetch=[5])
> >   LogicalProject(l_shipdate=[$10], l_commitdate=[$11],
> > diff=[CAST(Reinterpret(-($10, $11))):INTEGER])
> >     LogicalTableScan(table=[[main, lineitem]])
> >
> > What I expected is that the subtraction is also divided by a month in
> > milliseconds, something like:
> >   LogicalProject(l_shipdate=[$10], l_commitdate=[$11],
> > [CAST(/INT(Reinterpret(-($10, $11)), 2592000000)):INTEGER])
> >
> > Doesn't seem to be a bug in Calcite, because the processing of constants
> > in the unit tests passes OK:
> >
> > SqlOperatorBaseTest.java:
> > ..
> >     tester.checkScalar("{fn TIMESTAMPDIFF(HOUR,"
> >         + " TIMESTAMP '2014-03-29 12:34:56',"
> >         + " TIMESTAMP '2014-03-29 12:34:56')}", "0", "INTEGER NOT NULL");
> >     tester.checkScalar("{fn TIMESTAMPDIFF(MONTH,"
> >         + " TIMESTAMP '2019-09-01 00:00:00',"
> >         + " TIMESTAMP '2020-03-01 00:00:00')}", "6", "INTEGER NOT NULL");
> > ..
> >
> > So I wonder if there is something I'm not considering that is making me
> > think wrong. Please, any ideas?
> >

Re: Question about timestampdiff

Posted by Rui Wang <am...@apache.org>.
I think it is because there is no clear mapping for how to convert a month
(and then same to year) in interval to seconds/milliseconds. Does a month
have 30 days, 31 days, 28 days or 29 days? (Probably can find an answer in
SQL standard).


-Rui



On Sun, Jun 21, 2020 at 4:00 PM Rommel Quintanilla <ro...@blazingdb.com>
wrote:

> Hi, I'm investigating the behavior of the functions timestampdiff with
> different time-units.
>
> I found that in the case of the time-units: second, minute, hour, and day,
> the logical plan returned by Calcite is consistent. Since the input is a
> timestamp in milliseconds, in the same way, the result is in milliseconds.
>
> For example, for the following query:
> "select l_shipdate, l_commitdate, timestampdiff(DAY, l_commitdate,
> l_shipdate) as diff from lineitem limit 5"
>
> its logical plan produced is:
>
> LogicalSort(fetch=[5])
>   LogicalProject(l_shipdate=[$10], l_commitdate=[$11],
> diff=[CAST(/INT(Reinterpret(-($10, $11)), 86400000)):INTEGER])
>     LogicalTableScan(table=[[main, lineitem]])
>
> So far, so good. However, for the month and year case, the output is not
> what I would expect. For the query:
>
> "select l_shipdate, l_commitdate, timestampdiff(MONTH, l_commitdate,
> l_shipdate) as diff from lineitem limit 5"
>
> its logical plan produced is:
>
> LogicalSort(fetch=[5])
>   LogicalProject(l_shipdate=[$10], l_commitdate=[$11],
> diff=[CAST(Reinterpret(-($10, $11))):INTEGER])
>     LogicalTableScan(table=[[main, lineitem]])
>
> What I expected is that the subtraction is also divided by a month in
> milliseconds, something like:
>   LogicalProject(l_shipdate=[$10], l_commitdate=[$11],
> [CAST(/INT(Reinterpret(-($10, $11)), 2592000000)):INTEGER])
>
> Doesn't seem to be a bug in Calcite, because the processing of constants
> in the unit tests passes OK:
>
> SqlOperatorBaseTest.java:
> ..
>     tester.checkScalar("{fn TIMESTAMPDIFF(HOUR,"
>         + " TIMESTAMP '2014-03-29 12:34:56',"
>         + " TIMESTAMP '2014-03-29 12:34:56')}", "0", "INTEGER NOT NULL");
>     tester.checkScalar("{fn TIMESTAMPDIFF(MONTH,"
>         + " TIMESTAMP '2019-09-01 00:00:00',"
>         + " TIMESTAMP '2020-03-01 00:00:00')}", "6", "INTEGER NOT NULL");
> ..
>
> So I wonder if there is something I'm not considering that is making me
> think wrong. Please, any ideas?
>