You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "Jefffrey (via GitHub)" <gi...@apache.org> on 2023/12/31 09:33:52 UTC

Re: [I] Extract from interval type failed [arrow-datafusion]

Jefffrey commented on issue #6327:
URL: https://github.com/apache/arrow-datafusion/issues/6327#issuecomment-1872904401

   Here's some example output from postgres:
   
   ```
   postgres=# select extract(year from '111 year 111 month 111 day 111 hour 111 minute 111 second'::interval);
    extract
   ---------
        120
   (1 row)
   
   postgres=# select extract(month from '111 year 111 month 111 day 111 hour 111 minute 111 second'::interval);
    extract
   ---------
          3
   (1 row)
   
   postgres=# select extract(day from '111 year 111 month 111 day 111 hour 111 minute 111 second'::interval);
    extract
   ---------
        111
   (1 row)
   
   postgres=# select extract(hour from '111 year 111 month 111 day 111 hour 111 minute 111 second'::interval);
    extract
   ---------
        112
   (1 row)
   
   postgres=# select extract(minute from '111 year 111 month 111 day 111 hour 111 minute 111 second'::interval);
    extract
   ---------
         52
   (1 row)
   
   postgres=# select extract(second from '111 year 111 month 111 day 111 hour 111 minute 111 second'::interval);
     extract
   -----------
    51.000000
   (1 row)
   ```
   
   Note how `month` contributes to `year` when it overflows, but `day` doesn't do the same for `month`
   
   And how `second` and `minute` can overflow as well, but `hour` doesn't overflow into day.
   
   Duckdb shows the same results:
   
   ```
   D select extract(year from '111 year 111 month 111 day 111 hour 111 minute 111 second'::interval);
   ┌───────────────────────────────────────────────────────────────────────────────────────────────────────┐
   │ main.date_part('year', CAST('111 year 111 month 111 day 111 hour 111 minute 111 second' AS INTERVAL)) │
   │                                                 int64                                                 │
   ├───────────────────────────────────────────────────────────────────────────────────────────────────────┤
   │                                                                                                   120 │
   └───────────────────────────────────────────────────────────────────────────────────────────────────────┘
   D select extract(month from '111 year 111 month 111 day 111 hour 111 minute 111 second'::interval);
   ┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐
   │ main.date_part('month', CAST('111 year 111 month 111 day 111 hour 111 minute 111 second' AS INTERVAL)) │
   │                                                 int64                                                  │
   ├────────────────────────────────────────────────────────────────────────────────────────────────────────┤
   │                                                                                                      3 │
   └────────────────────────────────────────────────────────────────────────────────────────────────────────┘
   D select extract(day from '111 year 111 month 111 day 111 hour 111 minute 111 second'::interval);
   ┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
   │ main.date_part('day', CAST('111 year 111 month 111 day 111 hour 111 minute 111 second' AS INTERVAL)) │
   │                                                int64                                                 │
   ├──────────────────────────────────────────────────────────────────────────────────────────────────────┤
   │                                                                                                  111 │
   └──────────────────────────────────────────────────────────────────────────────────────────────────────┘
   D select extract(hour from '111 year 111 month 111 day 111 hour 111 minute 111 second'::interval);
   ┌───────────────────────────────────────────────────────────────────────────────────────────────────────┐
   │ main.date_part('hour', CAST('111 year 111 month 111 day 111 hour 111 minute 111 second' AS INTERVAL)) │
   │                                                 int64                                                 │
   ├───────────────────────────────────────────────────────────────────────────────────────────────────────┤
   │                                                                                                   112 │
   └───────────────────────────────────────────────────────────────────────────────────────────────────────┘
   D select extract(minute from '111 year 111 month 111 day 111 hour 111 minute 111 second'::interval);
   ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────┐
   │ main.date_part('minute', CAST('111 year 111 month 111 day 111 hour 111 minute 111 second' AS INTERVAL)) │
   │                                                  int64                                                  │
   ├─────────────────────────────────────────────────────────────────────────────────────────────────────────┤
   │                                                                                                      52 │
   └─────────────────────────────────────────────────────────────────────────────────────────────────────────┘
   D select extract(second from '111 year 111 month 111 day 111 hour 111 minute 111 second'::interval);
   ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────┐
   │ main.date_part('second', CAST('111 year 111 month 111 day 111 hour 111 minute 111 second' AS INTERVAL)) │
   │                                                  int64                                                  │
   ├─────────────────────────────────────────────────────────────────────────────────────────────────────────┤
   │                                                                                                      51 │
   └─────────────────────────────────────────────────────────────────────────────────────────────────────────┘
   ```
   
   Can use these as guidance for implementation for Datafusion, but keeping in mind there are three different interval types to support:
   
   - `YearMonth` - stores number of months
     - probably should only be able to extract `century`, `decade`, `year`, `month` where month overflows into year as necessary
   - `DayTime` - stores number of days and number of milliseconds
     - probably should only extract `week`, `day`, `hour`, and lower units, as conversion from day -> month is not well defined (see discussion #8468)
   - `MonthDayNano` - stores number of months, number of days, and number of nanoseconds
     - probably will be the closest to behaviour with postgres/duckdb as it stores the most base units


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org