You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by higkoohk <hi...@gmail.com> on 2013/05/15 12:05:36 UTC

What does the file header mean ? Flume always add headers to file header

My flume.conf

tengine.sources = tengine
> tengine.sources.tengine.type = exec
> tengine.sources.tengine.command = tail -n +0 -F
> /data/log/tengine/access.log
> tengine.sources.tengine.channels = file4log
> tengine.sinks = hdfs4log
> tengine.sinks.hdfs4log.type = hdfs
> tengine.sinks.hdfs4log.channel = file4log
> tengine.sinks.hdfs4log.serializer = avro_event
> tengine.sinks.hdfs4log.hdfs.path = hdfs://
> hdfs.kisops.org:8020/flume/tengine
> tengine.sinks.hdfs4log.hdfs.filePrefix = access
> tengine.sinks.hdfs4log.hdfs.fileSuffix = .log
> tengine.sinks.hdfs4log.hdfs.rollInterval = 0
> tengine.sinks.hdfs4log.hdfs.rollCount = 0
> tengine.sinks.hdfs4log.hdfs.rollSize = 134217728
> tengine.sinks.hdfs4log.hdfs.batchSize = 1024
> tengine.sinks.hdfs4log.hdfs.threadsPoolSize = 1
> tengine.sinks.hdfs4log.hdfs.fileType = DataStream
> tengine.sinks.hdfs4log.hdfs.writeFormat = Text
> tengine.channels = file4log
> tengine.channels.file4log.type = file
> tengine.channels.file4log.capacity = 4096
> tengine.channels.file4log.transactionCapacity = 1024
> tengine.channels.file4log.checkpointDir = /data/log/hdfs
> tengine.channels.file4log.dataDirs = /data/log/loadrunner


When I see the logs in hdfs , there are same headers in files which not
creater by app :

>
> Objavro.codecnullavro.schema�{"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]}�"�,�)��E����5�Y���
> ��agent25.kisops.org|10.20.216.20|1368610557.341|200|207|255|GET
> /status?00000005 HTTP/1.1|0.000|52033467��


See the image :
[image: 内嵌图片 1]

What does it mean , how to remove it or when and how to use this info ?

Many thanks !

Re: What does the file header mean ? Flume always add headers to file header

Posted by higkoohk <hi...@gmail.com>.
OK ,thank you.  I think I needn't it now.


2013/5/22 Mike Percy <mp...@apache.org>

> You probably figured this out by now but those are Avro container files :)
>
> see http://avro.apache.org
>
> Regards
> Mike
>
>
>
> On Wed, May 15, 2013 at 3:06 AM, higkoohk <hi...@gmail.com> wrote:
>
>> Maybe it make by 'tengine.sinks.hdfs4log.serializer = avro_event' , but
>> still don't know why and howto ...
>>
>>
>> 2013/5/15 higkoohk <hi...@gmail.com>
>>
>>> My flume.conf
>>>
>>> tengine.sources = tengine
>>>> tengine.sources.tengine.type = exec
>>>> tengine.sources.tengine.command = tail -n +0 -F
>>>> /data/log/tengine/access.log
>>>> tengine.sources.tengine.channels = file4log
>>>> tengine.sinks = hdfs4log
>>>> tengine.sinks.hdfs4log.type = hdfs
>>>> tengine.sinks.hdfs4log.channel = file4log
>>>> tengine.sinks.hdfs4log.serializer = avro_event
>>>> tengine.sinks.hdfs4log.hdfs.path = hdfs://
>>>> hdfs.kisops.org:8020/flume/tengine
>>>> tengine.sinks.hdfs4log.hdfs.filePrefix = access
>>>> tengine.sinks.hdfs4log.hdfs.fileSuffix = .log
>>>> tengine.sinks.hdfs4log.hdfs.rollInterval = 0
>>>> tengine.sinks.hdfs4log.hdfs.rollCount = 0
>>>> tengine.sinks.hdfs4log.hdfs.rollSize = 134217728
>>>> tengine.sinks.hdfs4log.hdfs.batchSize = 1024
>>>> tengine.sinks.hdfs4log.hdfs.threadsPoolSize = 1
>>>> tengine.sinks.hdfs4log.hdfs.fileType = DataStream
>>>> tengine.sinks.hdfs4log.hdfs.writeFormat = Text
>>>> tengine.channels = file4log
>>>> tengine.channels.file4log.type = file
>>>> tengine.channels.file4log.capacity = 4096
>>>> tengine.channels.file4log.transactionCapacity = 1024
>>>> tengine.channels.file4log.checkpointDir = /data/log/hdfs
>>>> tengine.channels.file4log.dataDirs = /data/log/loadrunner
>>>
>>>
>>> When I see the logs in hdfs , there are same headers in files which not
>>> creater by app :
>>>
>>>> Obj avro.codec null avro.schema�
>>>> {"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]}�"
>>>> �,�)��E����5�Y� ��
>>>> �� agent25.kisops.org|10.20.216.20|1368610557.341|200|207|255|GET
>>>> /status?00000005 HTTP/1.1|0.000|52033467��
>>>
>>>
>>> See the image :
>>>
>>>
>>> What does it mean , how to remove it or when and how to use this info ?
>>>
>>> Many thanks !
>>>
>>
>>
>

Re: What does the file header mean ? Flume always add headers to file header

Posted by Mike Percy <mp...@apache.org>.
You probably figured this out by now but those are Avro container files :)

see http://avro.apache.org

Regards
Mike



On Wed, May 15, 2013 at 3:06 AM, higkoohk <hi...@gmail.com> wrote:

> Maybe it make by 'tengine.sinks.hdfs4log.serializer = avro_event' , but
> still don't know why and howto ...
>
>
> 2013/5/15 higkoohk <hi...@gmail.com>
>
>> My flume.conf
>>
>> tengine.sources = tengine
>>> tengine.sources.tengine.type = exec
>>> tengine.sources.tengine.command = tail -n +0 -F
>>> /data/log/tengine/access.log
>>> tengine.sources.tengine.channels = file4log
>>> tengine.sinks = hdfs4log
>>> tengine.sinks.hdfs4log.type = hdfs
>>> tengine.sinks.hdfs4log.channel = file4log
>>> tengine.sinks.hdfs4log.serializer = avro_event
>>> tengine.sinks.hdfs4log.hdfs.path = hdfs://
>>> hdfs.kisops.org:8020/flume/tengine
>>> tengine.sinks.hdfs4log.hdfs.filePrefix = access
>>> tengine.sinks.hdfs4log.hdfs.fileSuffix = .log
>>> tengine.sinks.hdfs4log.hdfs.rollInterval = 0
>>> tengine.sinks.hdfs4log.hdfs.rollCount = 0
>>> tengine.sinks.hdfs4log.hdfs.rollSize = 134217728
>>> tengine.sinks.hdfs4log.hdfs.batchSize = 1024
>>> tengine.sinks.hdfs4log.hdfs.threadsPoolSize = 1
>>> tengine.sinks.hdfs4log.hdfs.fileType = DataStream
>>> tengine.sinks.hdfs4log.hdfs.writeFormat = Text
>>> tengine.channels = file4log
>>> tengine.channels.file4log.type = file
>>> tengine.channels.file4log.capacity = 4096
>>> tengine.channels.file4log.transactionCapacity = 1024
>>> tengine.channels.file4log.checkpointDir = /data/log/hdfs
>>> tengine.channels.file4log.dataDirs = /data/log/loadrunner
>>
>>
>> When I see the logs in hdfs , there are same headers in files which not
>> creater by app :
>>
>>> Obj avro.codec null avro.schema�
>>> {"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]}�"
>>> �,�)��E����5�Y� ��
>>> �� agent25.kisops.org|10.20.216.20|1368610557.341|200|207|255|GET
>>> /status?00000005 HTTP/1.1|0.000|52033467��
>>
>>
>> See the image :
>>
>>
>> What does it mean , how to remove it or when and how to use this info ?
>>
>> Many thanks !
>>
>
>

Re: What does the file header mean ? Flume always add headers to file header

Posted by higkoohk <hi...@gmail.com>.
Maybe it make by 'tengine.sinks.hdfs4log.serializer = avro_event' , but
still don't know why and howto ...


2013/5/15 higkoohk <hi...@gmail.com>

> My flume.conf
>
> tengine.sources = tengine
>> tengine.sources.tengine.type = exec
>> tengine.sources.tengine.command = tail -n +0 -F
>> /data/log/tengine/access.log
>> tengine.sources.tengine.channels = file4log
>> tengine.sinks = hdfs4log
>> tengine.sinks.hdfs4log.type = hdfs
>> tengine.sinks.hdfs4log.channel = file4log
>> tengine.sinks.hdfs4log.serializer = avro_event
>> tengine.sinks.hdfs4log.hdfs.path = hdfs://
>> hdfs.kisops.org:8020/flume/tengine
>> tengine.sinks.hdfs4log.hdfs.filePrefix = access
>> tengine.sinks.hdfs4log.hdfs.fileSuffix = .log
>> tengine.sinks.hdfs4log.hdfs.rollInterval = 0
>> tengine.sinks.hdfs4log.hdfs.rollCount = 0
>> tengine.sinks.hdfs4log.hdfs.rollSize = 134217728
>> tengine.sinks.hdfs4log.hdfs.batchSize = 1024
>> tengine.sinks.hdfs4log.hdfs.threadsPoolSize = 1
>> tengine.sinks.hdfs4log.hdfs.fileType = DataStream
>> tengine.sinks.hdfs4log.hdfs.writeFormat = Text
>> tengine.channels = file4log
>> tengine.channels.file4log.type = file
>> tengine.channels.file4log.capacity = 4096
>> tengine.channels.file4log.transactionCapacity = 1024
>> tengine.channels.file4log.checkpointDir = /data/log/hdfs
>> tengine.channels.file4log.dataDirs = /data/log/loadrunner
>
>
> When I see the logs in hdfs , there are same headers in files which not
> creater by app :
>
>> Obj avro.codec null avro.schema�
>> {"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]}�"
>> �,�)��E����5�Y� ��
>> �� agent25.kisops.org|10.20.216.20|1368610557.341|200|207|255|GET
>> /status?00000005 HTTP/1.1|0.000|52033467��
>
>
> See the image :
>
>
> What does it mean , how to remove it or when and how to use this info ?
>
> Many thanks !
>