You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Krishna <re...@gmail.com> on 2014/09/17 03:33:03 UTC

Bulk loader mapper output (3.1.0)

Hi,

Does the bulk loader compress mapper output? I couldn't find anywhere in
the code where "mapreduce.map.output.compress" is set to true.

Are HFiles compressed only if the Phoenix table (that data is being
imported to) is created with compression parameter (ex: COMPRESSION='GZ')?

Thanks for clarifying.

Re: Bulk loader mapper output (3.1.0)

Posted by Krishna <re...@gmail.com>.

Thanks for clarifying Gabriel.

On Tue, Sep 16, 2014 at 11:45 PM, Gabriel Reid <ga...@gmail.com>
wrote:

> Hi Krishna,
>
> > Does the bulk loader compress mapper output? I couldn't find anywhere in
> the
> > code where "mapreduce.map.output.compress" is set to true.
>
> The bulk loader doesn't specifically specify compression on the map
> output, but if the client hadoop configuration (i.e. the
> mapred-site.xml on the machine where the job is kicked off) or the
> mapred-site.xml configs on the cluster specify it, then it will be
> used (as with all other mapreduce jobs).
>
> The reason for not specifying it directly in the code itself is that
> that makes a hard dependency on the compression codec(s) available on
> the mapreduce cluster. I suppose these days Snappy an gzip are both
> generally available pretty much all the time, but I've been bitten by
> this in the past where a given codec wasn't available on a system but
> it was specifically referenced from within code.
>
> Another option is to supply the compression settings as part of the
> job arguments via -D parameters, i.e.
> -Dmapreduce.map.output.compress=true.
>
> >
> > Are HFiles compressed only if the Phoenix table (that data is being
> imported
> > to) is created with compression parameter (ex: COMPRESSION='GZ')?
> >
>
> Yes, I believe this is indeed the case. The default behavior of
> HFileOutputFormat (as far as I know) is to take compression settings
> from the output table and apply them to the created HFiles.
>
> - Gabriel
>

Re: Bulk loader mapper output (3.1.0)

Posted by Gabriel Reid <ga...@gmail.com>.

Hi Krishna,

> Does the bulk loader compress mapper output? I couldn't find anywhere in the
> code where "mapreduce.map.output.compress" is set to true.

The bulk loader doesn't specifically specify compression on the map
output, but if the client hadoop configuration (i.e. the
mapred-site.xml on the machine where the job is kicked off) or the
mapred-site.xml configs on the cluster specify it, then it will be
used (as with all other mapreduce jobs).

The reason for not specifying it directly in the code itself is that
that makes a hard dependency on the compression codec(s) available on
the mapreduce cluster. I suppose these days Snappy an gzip are both
generally available pretty much all the time, but I've been bitten by
this in the past where a given codec wasn't available on a system but
it was specifically referenced from within code.

Another option is to supply the compression settings as part of the
job arguments via -D parameters, i.e.
-Dmapreduce.map.output.compress=true.

>
> Are HFiles compressed only if the Phoenix table (that data is being imported
> to) is created with compression parameter (ex: COMPRESSION='GZ')?
>

Yes, I believe this is indeed the case. The default behavior of
HFileOutputFormat (as far as I know) is to take compression settings
from the output table and apply them to the created HFiles.

- Gabriel