You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Eran Kutner <er...@gigya.com> on 2011/08/03 14:46:23 UTC

CollectorSink doesn't pass the new format parameter

Just opened bug FLUME-720, but was wondering if anyone had a workaround:

CollectorSink doesn't properly pass the format parameter down to the
EscapedCustomDfs sink.
For example, this is working fine:
collectorSource(54001) | escapedCustomDfs("hdfs://hadoop1-m1:8020/", "test",
seqfile("SnappyCodec") );

However, this is using the codec defined in flume-conf.xml
collectorSource(54001) | collectorSink("hdfs://hadoop1-m1:8020/", "test-",
600000, seqfile("SnappyCodec") );

By itself this bug would not be very serious, however the problem is that
escapedCustomDfs/customDfs are using the same compressor, and they apply it
on the whole file, in addition to the compression done natively by the
sequence file - this makes the sequence file double compressed and invalid.
As far as I can tell, the only way to get a valid compressed sequence file
is by setting flume.collector.dfs.compress.codec to "None" in flume-site.xml
and use the format parameter to specify which compression to use for the
sequence file, except that doesn't work...

Thanks.

-eran

Re: CollectorSink doesn't pass the new format parameter

Posted by Jonathan Hsieh <jo...@cloudera.com>.
Eran,

Thanks for filing the bug and the good description.  I've created a patch
for FLUME-720 [1] and added comments for a workaround to the problem.

For those reading here,  the problem is that the format argument is not
propagated properly when using the collectorSink.  Here's the workaround --
it uses the collector wrapper and the escapedFormatDfs internally where the
output format is handled properly.

collectorSink("hdfs://hadoop1-m1:8020/", "test-", 600000,
seqfile("SnappyCodec") );

collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/",
"test-%{rolltag}", seqfile("SnappyCodec")) }

[1] https://issues.apache.org/jira/browse/FLUME-720

Thanks,
Jon.

On Wed, Aug 3, 2011 at 5:46 AM, Eran Kutner <er...@gigya.com> wrote:

> Just opened bug FLUME-720, but was wondering if anyone had a workaround:
>
> CollectorSink doesn't properly pass the format parameter down to the
> EscapedCustomDfs sink.
> For example, this is working fine:
> collectorSource(54001) | escapedCustomDfs("hdfs://hadoop1-m1:8020/",
> "test", seqfile("SnappyCodec") );
>
> However, this is using the codec defined in flume-conf.xml
> collectorSource(54001) | collectorSink("hdfs://hadoop1-m1:8020/", "test-",
> 600000, seqfile("SnappyCodec") );
>
> By itself this bug would not be very serious, however the problem is that
> escapedCustomDfs/customDfs are using the same compressor, and they apply it
> on the whole file, in addition to the compression done natively by the
> sequence file - this makes the sequence file double compressed and invalid.
> As far as I can tell, the only way to get a valid compressed sequence file
> is by setting flume.collector.dfs.compress.codec to "None" in flume-site.xml
> and use the format parameter to specify which compression to use for the
> sequence file, except that doesn't work...
>
> Thanks.
>
> -eran
>
>


-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com