You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Zoltan Ivanfi (JIRA)" <ji...@apache.org> on 2017/11/27 17:51:00 UTC

[jira] [Commented] (PARQUET-1064) Deprecate type-defined sort ordering for INTERVAL type

    [ https://issues.apache.org/jira/browse/PARQUET-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16267128#comment-16267128 ] 

Zoltan Ivanfi commented on PARQUET-1064:
----------------------------------------

On a second thought, even though the comparison order does not make too much sense for users, it does provide a consistent ordering for statistics and column indexes and as such allows efficient data retrieval. For this reason, I think that instead of deprecating the comparison in the specification, we should rather implement it in parquet-mr.

[~rdblue], [~julienledem], [~gszadovszky] what do you think?

> Deprecate type-defined sort ordering for INTERVAL type
> ------------------------------------------------------
>
>                 Key: PARQUET-1064
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1064
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Zoltan Ivanfi
>            Assignee: Zoltan Ivanfi
>            Priority: Minor
>
> [LogicalTypes.md in parquet-format|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md] defines the the sort order for INTERVAL to be produced by sorting by the value of months, then days, then milliseconds with unsigned comparison.
> According to these rules, 1d0h0s > 0d48h0s, which is counter-intuitive and does not seem to have any practical uses. Unless somebody is aware of an actual use-case in which this makes sense, I think the sort order should be undefined instead. The [reference implementation in parquet-mr|https://github.com/apache/parquet-mr/blob/352b906996f392030bfd53b93e3cf4adb78d1a55/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L459] already considers the ordering to be unknown.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)