You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Skye Wanderman-Milne <sk...@cloudera.com> on 2014/11/24 23:04:19 UTC

Re: Control Parquet file size in reducer

Redirecting to dev@parquet

On Mon, Nov 24, 2014 at 7:07 AM, Pengcheng Liu <ze...@gmail.com> wrote:

> Hello guys
>
> Is there an efficient way to control the size of parquet file using
> ParquetFileOutputFormat ?
>
> Currently our solution is randomize our mapper key to send them to
> different reducer in that way we
>
> can control how many number of records are written by a reducer thus a
> rough file size for each
>
> reducer to write.
>
> But this is not a long term solution, if we don't know the distribution of
> the data we don't know how
>
> to randomize the key.
>
> Other than writing out our own Custom ParquetFileOutputFormat? Is there
> any other way we could
>
> handle this problem?
>
> Any suggestions would be appreciated!
>
> Thanks in advance.
>
> Pengcheng
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to impala-user+unsubscribe@cloudera.org.
>