You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Ryan Blue (JIRA)" <ji...@apache.org> on 2015/02/25 00:03:05 UTC

[jira] [Commented] (PARQUET-188) Parquet writes columns out of order (compared to the schema)

    [ https://issues.apache.org/jira/browse/PARQUET-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335625#comment-14335625 ] 

Ryan Blue commented on PARQUET-188:
-----------------------------------

I don't think there is a requirement in the format spec ([docs here|https://parquet.incubator.apache.org/documentation/latest/]) to write columns in a specific order. I agree that it would ideally match, but I'm surprised that this is causing a problems because the column chunk metadata contains the offset where the each column chunk starts.

> Parquet writes columns out of order (compared to the schema)
> ------------------------------------------------------------
>
>                 Key: PARQUET-188
>                 URL: https://issues.apache.org/jira/browse/PARQUET-188
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>            Reporter: Colin Marc
>
> When building from master, parquet seems to write row groups with the columns in arbitrary orders, not in the same order as the schema. This appears to happen regardless of the OutputFormat or WriteSupport used.
> This breaks implementations that assume the columns will be in a specific order, in particular impala.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)