You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/03/20 11:05:04 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue, #5651: Support `::interval` casting

alamb opened a new issue, #5651:
URL: https://github.com/apache/arrow-datafusion/issues/5651

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   In DataFusion, we support SQL syntax like `INTERVAL '3 months'` that is then converted into Intervals (I think originally contributed by @ovr).
   
   This has a few limitations:
   1. It can't be 
   
   
   **Describe the solution you'd like**
   I would like to be able to use the interval type from sql as normal:
   
   ```sql
   select '1 day`::interval
   select cast('1 day' as interval);
   select arrow_cast('1 day', 'Interval(MonthDayNano)');
   ```
   
   Here is what happens today
   ```
   ❯ select '1 day'::interval;
   This feature is not implemented: Unsupported SQL type Interval
   ❯ select cast('1 day' as interval);
   This feature is not implemented: Unsupported SQL type Interval
   ❯ select arrow_cast('1 day', 'Interval(MonthDayNano)');
   Error during planning: Cannot automatically convert Utf8 to Interval(MonthDayNano)
   ❯ select arrow_cast('1 day', 'Interval(YearMonth)');
   Error during planning: Cannot automatically convert Utf8 to Interval(YearMonth)
   ❯ select arrow_cast('1 day', 'Interval(DayTime)');
   Error during planning: Cannot automatically convert Utf8 to Interval(DayTime)
   ```
   
   
   **Describe alternatives you've considered**
   One thing we need to figure out is what Arrow interval type will map to the sql interval type.
   
   Intervals can be either YearMonth or DayTime or MonthDayNano
   
   I think in order to support '1 day' and '1 minute' with the same type, we will need to use IntervalMonthDayNano:
   
   https://docs.rs/arrow/35.0.0/arrow/datatypes/struct.IntervalMonthDayNanoType.html
   
   
   **Additional context**
   I believe the casting support is added upstream in arrow-rs by @doki23  -- https://github.com/apache/arrow-rs/pull/3762 in arrow 35.0.0.
   
   However, that upgrade is blocked (see https://github.com/apache/arrow-datafusion/pull/5441) so we may have to wait for arrow-36.0.0 to implement this feature


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waitingkuo commented on issue #5651: Support `::interval` casting / `INTERVAL` SQL datatype

Posted by "waitingkuo (via GitHub)" <gi...@apache.org>.
waitingkuo commented on issue #5651:
URL: https://github.com/apache/arrow-datafusion/issues/5651#issuecomment-1482241768

   hi @alamb this is great, `arrow_cast` to specific `IntervalUnit` is quite helpful.
   
   I'd like to know when the source doesn't feed into the interval , e.g.
   ```bash
   select arrow_cast('1 second', 'Interval(YearMonth)');
   ```
   do we just simply return a `0` or we should raise some warning or even err?
   
   or when there're some ambiguities
   ```bash
   select arrow_cast('1 month', 'Interval(DayTime)');
   ```
   do we return `30 days` or we should raise an ambiguity error?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #5651: Support `::interval` casting / `INTERVAL` SQL datatype

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5651:
URL: https://github.com/apache/arrow-datafusion/issues/5651#issuecomment-1490426450

   After upgrade to arrow 36:
   ```sql
   select arrow_cast('1 second', 'Interval(YearMonth)');DataFusion CLI v21.0.0
   ❯ select arrow_cast('1 second', 'Interval(YearMonth)');
   Arrow error: Cast error: Cannot cast 1 second to IntervalYearMonth. Only year and month fields are allowed.
   ❯ select arrow_cast('1 month', 'Interval(DayTime)');
   
   +--------------------------------------------------+
   | Utf8("1 month")                                  |
   +--------------------------------------------------+
   | 0 years 0 mons 30 days 0 hours 0 mins 0.000 secs |
   +--------------------------------------------------+
   ```
   
   Here is a small PR that adds basic support for the SQL interval type:  https://github.com/apache/arrow-datafusion/pull/5792
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #5651: Support `::interval` casting / `INTERVAL` SQL datatype

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5651:
URL: https://github.com/apache/arrow-datafusion/issues/5651#issuecomment-1479621455

   @waitingkuo  I would be curious to know what you think about this proposal 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #5651: Support `::interval` casting / `INTERVAL` SQL datatype

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5651:
URL: https://github.com/apache/arrow-datafusion/issues/5651#issuecomment-1485852781

   
   ```
   I'd like to know when the source doesn't feed into the interval , e.g.
   
   select arrow_cast('1 second', 'Interval(YearMonth)');
   do we just simply return a 0 or we should raise some warning or even err?
   ```
   
   I think the best way to think of `arrow_cast` is as a way to invoke the arrow `cast` kernel
   
   So in your examples 
   ```sql
   select arrow_cast('1 second', 'Interval(YearMonth)');
   ```
   
   That would invoke `cast(StringArray, IntervalYearMonthArray)`
   
   ```sql
   select arrow_cast('1 month', 'Interval(DayTime)');
   ```
   
   Basically this should be whatever the arrow kernel does (probably throw an ambiguity error, but I am not sure)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #5651: Support `::interval` casting / `INTERVAL` SQL datatype

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #5651: Support `::interval` casting / `INTERVAL` SQL datatype
URL: https://github.com/apache/arrow-datafusion/issues/5651


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] doki23 commented on issue #5651: Support `::interval` casting / `INTERVAL` SQL datatype

Posted by "doki23 (via GitHub)" <gi...@apache.org>.
doki23 commented on issue #5651:
URL: https://github.com/apache/arrow-datafusion/issues/5651#issuecomment-1479760382

   > I think in order to support '1 day' and '1 minute' with the same type, we will need to use IntervalMonthDayNano
   
   To keep data not damaged, we must use IntervalMonthDayNano to store. But I think maybe we could provide some optional output formats like [postgresql](https://www.postgresql.org/docs/15/datatype-datetime.html#DATATYPE-INTERVAL-OUTPUT)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #5651: Support `::interval` casting / `INTERVAL` SQL datatype

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5651:
URL: https://github.com/apache/arrow-datafusion/issues/5651#issuecomment-1481136525

   > To avoid data compatibility problem we must use IntervalMonthDayNano to store. But I think maybe we could provide some optional output formats like [postgresql](https://www.postgresql.org/docs/15/datatype-datetime.html#DATATYPE-INTERVAL-OUTPUT)?
   
   I agree this would be a good idea. 
   
   Ideally we could add some options to the arrow display module to support different output formats. I plan to file some tickets about interval support shortly, and I will include more display as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org