You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Zoltan Ivanfi <zi...@cloudera.com.INVALID> on 2018/10/01 12:58:00 UTC

Row group layout anomalies

Hi,

PARQUET-1337 describes the problem of ending up with a drastically
different (and worse) row group layout than intended under certain
circumstances.

A few weeks ago I started tweaking the logic that controls this in a
test-driven fashion. I have found that fixing one problem repeatedly leads
to the discovery of another one. After playing this whack-a-mole for a
while, I ended up with a much more fundamental change than I originally
intended with still room (and need) for improvement.

Due to the potential impact of these changes, I have put together a design
doc that describes all the problems I could identify and some possible
fixes for them:

https://docs.google.com/document/d/1FJAVwzszZGkxZa8FtKtSbgBKm7qkS4cXuNW8hl4YKwU/edit#

If you are interested, please review and comment on the document.

Thanks,

Zoltan