You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "J Y (Jira)" <ji...@apache.org> on 2022/08/30 20:15:00 UTC

[jira] [Updated] (PARQUET-2180) make the default behavior for proto writing not-backwards compatible

     [ https://issues.apache.org/jira/browse/PARQUET-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

J Y updated PARQUET-2180:
-------------------------
    Description: 
https://issues.apache.org/jira/browse/PARQUET-968 introduced supporting maps and lists in a spec compliant way.  however, to not break existing libraries, a flag was introduced and defaulted the write behavior to NOT use the specs compliant writes.

it's been over 5 years, and people should be really off of it.  so much so, that trying to use the new parquet-cli tool to read parquet files generated by flink doesn't work b/c it's hard coded to never allow the old style.  the deprecated parquet-tools reads these files fine b/c it's the older style.

i started coding up a workaround in flink-parquet and parquet-cli, but stopped.  we really should just move on at this point, imho.  protobufs often have repeated primitives and maps now, so it just makes sense to move on at this point.  we should keep the flag around and let people override it back to being backwards compatible though.

i have the code written and can submit a PR if you'd like.

i'm not an expert in parquet though, so i'm unclear as to the deep downstream ramifications of this change, so i would love to get feedback in this area.

  was:
https://issues.apache.org/jira/browse/PARQUET-968 introduced supporting maps and lists in a spec compliant way.  however, to not break existing libraries, a flag was introduced and defaulted the write behavior to NOT use the specs compliant writes.

it's been over 5 years, and people should be really off of it.  so much so, that trying to use the new parquet-cli tool to read parquet files generated by flink using doesn't work b/c it's hard coded to never allow the old style.  the deprecated parquet-tools reads these files fine b/c it's the older style.

i started coding up a workaround in flink-parquet and parquet-cli, but stopped.  we really should just move on at this point, imho.  protobufs often have repeated primitives and maps now, so it just makes sense to move on at this point.  we should keep the flag around and let people override it back to being backwards compatible though.

i have the code written and can submit a PR if you'd like.

i'm not an expert in parquet though, so i'm unclear as to the deep downstream ramifications of this change, so i would love to get feedback in this area.


> make the default behavior for proto writing not-backwards compatible
> --------------------------------------------------------------------
>
>                 Key: PARQUET-2180
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2180
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-protobuf
>            Reporter: J Y
>            Priority: Minor
>
> https://issues.apache.org/jira/browse/PARQUET-968 introduced supporting maps and lists in a spec compliant way.  however, to not break existing libraries, a flag was introduced and defaulted the write behavior to NOT use the specs compliant writes.
> it's been over 5 years, and people should be really off of it.  so much so, that trying to use the new parquet-cli tool to read parquet files generated by flink doesn't work b/c it's hard coded to never allow the old style.  the deprecated parquet-tools reads these files fine b/c it's the older style.
> i started coding up a workaround in flink-parquet and parquet-cli, but stopped.  we really should just move on at this point, imho.  protobufs often have repeated primitives and maps now, so it just makes sense to move on at this point.  we should keep the flag around and let people override it back to being backwards compatible though.
> i have the code written and can submit a PR if you'd like.
> i'm not an expert in parquet though, so i'm unclear as to the deep downstream ramifications of this change, so i would love to get feedback in this area.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)