You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Ryan Blue <bl...@cloudera.com> on 2015/05/30 00:58:13 UTC
Row group alignment?
Hi everyone,
I have a question for anyone with large datasets in HDFS about row group
and HDFS block alignment: how well are row groups and HFDS blocks
aligned in practice?
Say your row group size is equal to HDFS block size, then how many
blocks does it take, on average, before a row group is significantly
split between two blocks? Or put differently, how much shorter are row
groups than the planned row group size, on average?
I'm trying to find out whether it would be a significant benefit to use
something like variable-length HDFS blocks, added in HDFS-3689 [1], to
keep the two aligned.
Thanks!
rb
[1]: https://issues.apache.org/jira/browse/HDFS-3689
--
Ryan Blue
Software Engineer
Cloudera, Inc.