You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Ryan Blue <bl...@cloudera.com> on 2015/05/30 00:58:13 UTC

Row group alignment?

Hi everyone,

I have a question for anyone with large datasets in HDFS about row group 
and HDFS block alignment: how well are row groups and HFDS blocks 
aligned in practice?

Say your row group size is equal to HDFS block size, then how many 
blocks does it take, on average, before a row group is significantly 
split between two blocks? Or put differently, how much shorter are row 
groups than the planned row group size, on average?

I'm trying to find out whether it would be a significant benefit to use 
something like variable-length HDFS blocks, added in HDFS-3689 [1], to 
keep the two aligned.

Thanks!

rb

[1]: https://issues.apache.org/jira/browse/HDFS-3689


-- 
Ryan Blue
Software Engineer
Cloudera, Inc.