You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Antoine Pitrou <an...@python.org> on 2023/01/04 16:17:13 UTC

IMPORTANT: specification bugs around v2 data pages

Hello,

I would like to bring this list's attention to two alleged bugs in the
specification around v2 data pages:

- https://issues.apache.org/jira/browse/PARQUET-2221: Encoding spec
  incorrect for dictionary fallback

- https://issues.apache.org/jira/browse/PARQUET-2222: RLE encoding spec
  incorrect for v2 data pages

Regards

Antoine.



Re: IMPORTANT: specification bugs around v2 data pages

Posted by Micah Kornfield <em...@gmail.com>.
>
> - https://issues.apache.org/jira/browse/PARQUET-2221: Encoding spec
>   incorrect for dictionary fallback

The way I've always interpreted the encodings on the writer's side is that
any fallback (or series of fallbacks) should be considered valid, even
though that isn't as the spec reads, and there are probably other edge
cases to consider as well.

 - https://issues.apache.org/jira/browse/PARQUET-2222: RLE encoding spec
>   incorrect for v2 data pages

This seems more serious if the encodings actually differ based on which
version of the data page is used.  V2 (especially in C++) has never really
been production ready, We should try to resolve this bug one way or another
but I think the way forward probably depends on what various
implementations are doing here.

On Wed, Jan 4, 2023 at 8:17 AM Antoine Pitrou <an...@python.org> wrote:

>
> Hello,
>
> I would like to bring this list's attention to two alleged bugs in the
> specification around v2 data pages:
>
> - https://issues.apache.org/jira/browse/PARQUET-2221: Encoding spec
>   incorrect for dictionary fallback
>
> - https://issues.apache.org/jira/browse/PARQUET-2222: RLE encoding spec
>   incorrect for v2 data pages
>
> Regards
>
> Antoine.
>
>
>