You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/02/08 20:50:46 UTC

[GitHub] [iceberg] rdblue commented on pull request #2167: Fix for Conversion of Parquet ByteArray to Iceberg Schema

rdblue commented on pull request #2167:
URL: https://github.com/apache/iceberg/pull/2167#issuecomment-775451501

I've been thinking about this more and I'm leaning toward trying to work around it. I think the problem is that the Parquet/Avro writer uses the old list format by default to avoid breaking existing pipelines. But there should be an easy way to update the behavior to produce records that Iceberg accepts by setting `parquet.avro.write-old-list-structure=false`. That's true by default.

If we can fix it that way, then I think we should go with that. Otherwise, we're implementing only part of the [backward-compatibility rules from Parquet](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists). I'm not sure what the impact would be on compatibility if we only partially implement the rules, so I think the safer thing is to just implement all of the backward-compatibility rules. But that's a bigger change and more to maintain (which is why we don't support the 2-level lists in the first place). So I think the preferred solution is to avoid this instead.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org