You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Eran Kutner <er...@gigya.com> on 2012/09/11 01:26:16 UTC

Avro files are empty with snappy compression enabled

Hi,
I'm trying to compress avro files written with hdfs sink everything appears
to work but the files themselves are mostly empty. It appears that instead
of writing the actual data only some kind of a header is written for every
data row in the file. This is a hex dump of such a file:
0000000 0000 6100 0000 0900 0061 fe0a 0001 017e
0000010 0000 0000 0064 0000 6409 0a00 01fe 8a00
0000020 0001 0000 6400 0000 0900 0064 fe0a 0001
0000030 018a 0000 0000 0064 0000 6409 0a00 01fe
0000040 8a00 0001 0000 6400 0000 0900 0064 fe0a
0000050 0001 018a 0000 0000 0064 0000 6409 0a00
0000060 01fe 8a00 0001 0000 6400 0000 0900 0064
0000070 fe0a 0001 018a 0000 0000 0064 0000 6409
0000080 0a00 01fe 8a00 0001 0000 6400 0000 0900
0000090 0064 fe0a 0001 018a 0000 0000 0064 0000
00000a0 6409 0a00 01fe 8a00 0001 0000 6400 0000
00000b0 0900 0064 fe0a 0001 018a 0000 0000 0064
00000c0 0000 6409 0a00 01fe 8a00 0001 0000 6400
00000d0 0000 0900 0064 fe0a 0001 018a 0000 0000
00000e0 0064 0000 6409 0a00 01fe 8a00 0001 0000
00000f0 6400 0000 0900 0064 fe0a 0001 018a 0000
0000100 0000 0064 0000 6409 0a00 01fe 8a00 0001
0000110 0000 6400 0000 0900 0064 fe0a 0001 018a
0000120 0000 0000 0064 0000 6409 0a00 01fe 8a00
0000130 0001 0000 6400 0000 0900 0064 fe0a 0001
0000140 018a 0000 0000 0064 0000 6409 0a00 01fe
0000150 8a00 0001 0000 6400 0000 0900 0064 fe0a
0000160 0001 018a 0000 0000 0064 0000 6409 0a00
0000170 01fe 8a00 0001 0000 6400 0000 0900 0064
0000180 fe0a 0001 018a 0000 0000 0064 0000 6409
0000190 0a00 01fe 8a00 0001 0000 6400 0000 0900
00001a0 0064 fe0a 0001 018a 0000 0000 0064 0000
00001b0 6409 0a00 01fe 8a00 0001 0000 6400 0000
00001c0 0900 0064 fe0a 0001 018a 0000 0000 0064
00001d0 0000 6409 0a00 01fe 8a00 0001 0000 6400

Notice the repeating pattern within the data, it looks like empty headers
with no data.

This is my sink config:
agent.sinks.hdfsSink2.type = hdfs
agent.sinks.hdfsSink2.channel = memoryChannel2
agent.sinks.hdfsSink2.hdfs.path=hdfs://hadoop2-m1:8020/raw-events/%Y-%m-%d
agent.sinks.hdfsSink2.hdfs.filePrefix=load-events.%{hostname}.avro
agent.sinks.hdfsSink2.hdfs.rollInterval=60
agent.sinks.hdfsSink2.hdfs.rollCount=0
agent.sinks.hdfsSink2.hdfs.rollSize=0
agent.sinks.hdfsSink2.hdfs.fileType=CompressedStream
agent.sinks.hdfsSink2.hdfs.codeC=snappy
agent.sinks.hdfsSink2.hdfs.writeFormat=Text
agent.sinks.hdfsSink2.hdfs.batchSize=1000
agent.sinks.hdfsSink2.serializer = avro_event


Any help would be appreciated.

Thanks.

-eran

Re: Avro files are empty with snappy compression enabled

Posted by Eran Kutner <er...@gigya.com>.
anyone knows why this is happening?

-eran



On Tue, Sep 11, 2012 at 2:26 AM, Eran Kutner <er...@gigya.com> wrote:

> Hi,
> I'm trying to compress avro files written with hdfs sink everything
> appears to work but the files themselves are mostly empty. It appears that
> instead of writing the actual data only some kind of a header is written
> for every data row in the file. This is a hex dump of such a file:
> 0000000 0000 6100 0000 0900 0061 fe0a 0001 017e
> 0000010 0000 0000 0064 0000 6409 0a00 01fe 8a00
> 0000020 0001 0000 6400 0000 0900 0064 fe0a 0001
> 0000030 018a 0000 0000 0064 0000 6409 0a00 01fe
> 0000040 8a00 0001 0000 6400 0000 0900 0064 fe0a
> 0000050 0001 018a 0000 0000 0064 0000 6409 0a00
> 0000060 01fe 8a00 0001 0000 6400 0000 0900 0064
> 0000070 fe0a 0001 018a 0000 0000 0064 0000 6409
> 0000080 0a00 01fe 8a00 0001 0000 6400 0000 0900
> 0000090 0064 fe0a 0001 018a 0000 0000 0064 0000
> 00000a0 6409 0a00 01fe 8a00 0001 0000 6400 0000
> 00000b0 0900 0064 fe0a 0001 018a 0000 0000 0064
> 00000c0 0000 6409 0a00 01fe 8a00 0001 0000 6400
> 00000d0 0000 0900 0064 fe0a 0001 018a 0000 0000
> 00000e0 0064 0000 6409 0a00 01fe 8a00 0001 0000
> 00000f0 6400 0000 0900 0064 fe0a 0001 018a 0000
> 0000100 0000 0064 0000 6409 0a00 01fe 8a00 0001
> 0000110 0000 6400 0000 0900 0064 fe0a 0001 018a
> 0000120 0000 0000 0064 0000 6409 0a00 01fe 8a00
> 0000130 0001 0000 6400 0000 0900 0064 fe0a 0001
> 0000140 018a 0000 0000 0064 0000 6409 0a00 01fe
> 0000150 8a00 0001 0000 6400 0000 0900 0064 fe0a
> 0000160 0001 018a 0000 0000 0064 0000 6409 0a00
> 0000170 01fe 8a00 0001 0000 6400 0000 0900 0064
> 0000180 fe0a 0001 018a 0000 0000 0064 0000 6409
> 0000190 0a00 01fe 8a00 0001 0000 6400 0000 0900
> 00001a0 0064 fe0a 0001 018a 0000 0000 0064 0000
> 00001b0 6409 0a00 01fe 8a00 0001 0000 6400 0000
> 00001c0 0900 0064 fe0a 0001 018a 0000 0000 0064
> 00001d0 0000 6409 0a00 01fe 8a00 0001 0000 6400
>
> Notice the repeating pattern within the data, it looks like empty headers
> with no data.
>
> This is my sink config:
> agent.sinks.hdfsSink2.type = hdfs
> agent.sinks.hdfsSink2.channel = memoryChannel2
> agent.sinks.hdfsSink2.hdfs.path=hdfs://hadoop2-m1:8020/raw-events/%Y-%m-%d
> agent.sinks.hdfsSink2.hdfs.filePrefix=load-events.%{hostname}.avro
> agent.sinks.hdfsSink2.hdfs.rollInterval=60
> agent.sinks.hdfsSink2.hdfs.rollCount=0
> agent.sinks.hdfsSink2.hdfs.rollSize=0
> agent.sinks.hdfsSink2.hdfs.fileType=CompressedStream
> agent.sinks.hdfsSink2.hdfs.codeC=snappy
> agent.sinks.hdfsSink2.hdfs.writeFormat=Text
> agent.sinks.hdfsSink2.hdfs.batchSize=1000
> agent.sinks.hdfsSink2.serializer = avro_event
>
>
> Any help would be appreciated.
>
> Thanks.
>
> -eran
>
>