You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/01/23 10:57:00 UTC

[jira] [Commented] (PARQUET-675) Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types

    [ https://issues.apache.org/jira/browse/PARQUET-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270620#comment-17270620 ] 

ASF GitHub Bot commented on PARQUET-675:
----------------------------------------

nevi-me opened a new pull request #165:
URL: https://github.com/apache/parquet-format/pull/165


   I am working on the Parquet Rust implementation, specifically conversion with Arrow.
   One of the outstanding items in the Parquet types is how to deal with interval types.
   
   This PR proposes adding `LogicalType::Interval(IntervalType)`, which is compatible with Arrow. I have only made changes to the thrift file, as I'd like to get feedback on viability, before documenting the behaviour in the LogicalTypes.md.
   Much of the detail is however below in this message.
   
   The legacy `ConvertedType` has `INTERVAL`, but this interval is ambiguous for Arrow, because Arrow defines:
   1. Interval::YearMonth: i32 representing the number of elapsed whole months
   2. Interval::DayTime: i64 stored as 2 contiguous 32-bit integers, representing the number of elapsed days and milliseconds, respectively
   
   @julienledem initially suggested deprecating `INTERVAL` by replacing it with 2 converted types in #43, but given that Parquet got logical types since then, we could offer a better alternative by using the `LogicalType::Interval(IntervalUnit)`alternative.
   
   This would either be 32-bit or 64-bit based on the interval unit.
   On the 64-bit representation, I'm not opinionated on whether we should use an INT64 or FiXED_LEN_BYTE_ARRAY(8). I suspect though that we initially used FIXED_LEN_BYTE_ARRAY(12) because there's no 96-bit primitive.
   
   # Backward Compatibility with ConvertedType
   
   We do not deprecate the `INTERVAL` converted type, as one could always convert the LogicalType value to either the first 4 bytes, or the last 8, depending on the IntervalUnit; and so write only those bytes.
   
   We would mark converting from ConvertedType to LogicalType as undefined behaviour, because without any additional information on which of the 12 bytes are populated, readers could lose information (what Rust is currently doing).
   
   implementations that rely on the old behaviour still have the option of populating both converted type and logical type, so they should not populate the logical type in this instance.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types
> ---------------------------------------------------
>
>                 Key: PARQUET-675
>                 URL: https://issues.apache.org/jira/browse/PARQUET-675
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-format
>            Reporter: Julien Le Dem
>            Assignee: Julien Le Dem
>            Priority: Major
>
> For completeness and compatibility with Arrow and SQL types.
> Those are related to the existing INTERVAL type.
> some references:
>  - https://msdn.microsoft.com/en-us/library/ms716506(v=vs.85).aspx
>  - http://www.techrepublic.com/article/sql-basics-datetime-and-interval-data-types/
>  - https://www.postgresql.org/docs/9.3/static/datatype-datetime.html
>  - https://docs.oracle.com/html/E26088_01/sql_elements001.htm
>  - http://www.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/com.ibm.sqlr.doc/ids_sqr_123.htm



--
This message was sent by Atlassian Jira
(v8.3.4#803005)