You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by David Capwell <dc...@gmail.com> on 2012/08/21 04:36:13 UTC

Can HDFSSink write headers as well?

I was wondering if I pass random data to an event's header, can the
HDFSSink write it to HDFS?  I know it can use the headers to split the data
into different paths, but what about writing the data to HDFS itself?

thanks for your time reading this email.

Re: Can HDFSSink write headers as well?

Posted by Denny Ye <de...@gmail.com>.
hi David,  normally, Flume doesn't write event header to HDFS. Also, you
can expand function what you want [?]

2012/8/21 David Capwell <dc...@gmail.com>

> I was wondering if I pass random data to an event's header, can the
> HDFSSink write it to HDFS?  I know it can use the headers to split the data
> into different paths, but what about writing the data to HDFS itself?
>
> thanks for your time reading this email.
>

Re: Can HDFSSink write headers as well?

Posted by David Capwell <dc...@gmail.com>.
Thanks for this, its what im looking for.  Binary Avro should be good to
use, thanks.

On Wed, Aug 22, 2012 at 7:59 PM, Mohit Anchlia <mo...@gmail.com>wrote:

>
>
> On Tue, Aug 21, 2012 at 8:16 PM, ashutosh(오픈플랫폼개발팀) <
> sharma.ashutosh@kt.com> wrote:
>
>>  Hi All,
>>
>>
>>
>> I am using the “avro_event” serializer  with writable format as
>> DataStream file type to store the events into hdfs.
>>
>> I would like to read the file for further analysis. I am new to avro and
>> don’t have idea; how to develop the de-serializer to read the flume’s
>> events written in hdfs file.
>>
>>
>>
>> If anyone could share the sample or example, it would be nice to me.
>> Please help….
>>
>>
>>
>
> Look at this test to see how to read data. But in general you would want
> to create your own serializer specific to your schema. Otherwise it makes
> sense to just use sequence files.
>
>
> http://svn.apache.org/repos/asf/flume/trunk/flume-ng-core/src/test/java/org/apache/flume/serialization/TestFlumeEventAvroEventSerializer.java
>
>
>>  Thanks & Regards,
>>
>> Ashutosh Sharma
>>
>>
>>
>> *From:* Bhaskar V. Karambelkar [mailto:bhaskarvk@gmail.com]
>> *Sent:* Wednesday, August 22, 2012 12:22 AM
>> *To:* user@flume.apache.org
>> *Subject:* Re: Can HDFSSink write headers as well?
>>
>>
>>
>>
>>
>> On Tue, Aug 21, 2012 at 2:25 AM, バーチャル クリストファー <
>> birchall@infoscience.co.jp> wrote:
>>
>> Hi David,
>>
>> Currently there is no way to write headers to HDFS using the built-in
>> Flume functionality.
>>
>>
>>
>> This is not entirely true, the following combination will write headers
>> to HDFS, in an avro_data file format (binary).
>>
>>
>>
>> agent.sinks.hdfsBinarySink.hdfs.fileType = DataStream
>>
>> agent.sinks.hdfsBinarySink.serializer =  avro_client
>>
>> agent.sinks.hdfsBinarySink.hdfs.writeFormat =  writable
>>
>>
>>
>> The serializer used is part of flume distribution viz.
>>
>>
>> flume-ng-core/src/main/java/org/apache/flume/serialization/FlumeEventAvroEventSerializer.java
>>
>>
>>
>> A file thus written can be processed with AVRO mapreduce API found in
>> AVRO distribution.
>>
>>
>>
>> Also note that simply using DataStream doesn't mean it's a text file, the
>> serializer and hdfs.writeFormat also decide
>>
>> whether the file is text or binary.
>>
>>
>>
>> I've read the entire HDFS sink code and exprimented with it a lot, so if
>> you want more details, let me know.
>>
>>
>>
>>
>>
>>
>> If you are writing to text or binary files on HDFS (i.e. you have set
>> hdfs.fileType = DataStream or CompressedStream in your config), then you
>> can supply your own custom serializer, which will allow you to write
>> headers to HDFS. You will need to write a serializer that implements
>> org.apache.flume.serialization.EventSerializer.
>>
>> If, on the other hand, you are writing to HDFS SequenceFiles, then
>> unfortunately there is no way to customize the way that events are
>> serialized, so you cannot write event headers to HDFS. This is a known
>> issue (FLUME-1100) and I have supplied a patch to fix it.
>>
>> Chris.
>>
>>
>>
>>
>> On 2012/08/21 11:36, David Capwell wrote:
>>
>> I was wondering if I pass random data to an event's header, can the
>> HDFSSink write it to HDFS?  I know it can use the headers to split the data
>> into different paths, but what about writing the data to HDFS itself?
>>
>> thanks for your time reading this email.
>>
>>
>>
>>
>>
>>
>> 이 메일은 지정된 수취인만을 위해 작성되었으며, 중요한 정보나 저작권을 포함하고 있을 수 있습니다. 어떠한 권한 없이, 본 문서에
>> 포함된 정보의 전부 또는 일부를 무단으로 제3자에게 공개, 배포, 복사 또는 사용하는 것을 엄격히 금지합니다. 만약, 본 메일이 잘못
>> 전송된 경우, 발신인 또는 당사에 알려주시고, 본 메일을 즉시 삭제하여 주시기 바랍니다.
>> This E-mail may contain confidential information and/or copyright
>> material. This email is intended for the use of the addressee only. If you
>> receive this email by mistake, please either delete it without reproducing,
>> distributing or retaining copies thereof or notify the sender immediately.
>>
>
>

Re: Can HDFSSink write headers as well?

Posted by Mohit Anchlia <mo...@gmail.com>.
On Tue, Aug 21, 2012 at 8:16 PM, ashutosh(오픈플랫폼개발팀)
<sh...@kt.com>wrote:

>  Hi All,
>
>
>
> I am using the “avro_event” serializer  with writable format as DataStream
> file type to store the events into hdfs.
>
> I would like to read the file for further analysis. I am new to avro and
> don’t have idea; how to develop the de-serializer to read the flume’s
> events written in hdfs file.
>
>
>
> If anyone could share the sample or example, it would be nice to me.
> Please help….
>
>
>

Look at this test to see how to read data. But in general you would want to
create your own serializer specific to your schema. Otherwise it makes
sense to just use sequence files.

http://svn.apache.org/repos/asf/flume/trunk/flume-ng-core/src/test/java/org/apache/flume/serialization/TestFlumeEventAvroEventSerializer.java


>  Thanks & Regards,
>
> Ashutosh Sharma
>
>
>
> *From:* Bhaskar V. Karambelkar [mailto:bhaskarvk@gmail.com]
> *Sent:* Wednesday, August 22, 2012 12:22 AM
> *To:* user@flume.apache.org
> *Subject:* Re: Can HDFSSink write headers as well?
>
>
>
>
>
> On Tue, Aug 21, 2012 at 2:25 AM, バーチャル クリストファー <bi...@infoscience.co.jp>
> wrote:
>
> Hi David,
>
> Currently there is no way to write headers to HDFS using the built-in
> Flume functionality.
>
>
>
> This is not entirely true, the following combination will write headers to
> HDFS, in an avro_data file format (binary).
>
>
>
> agent.sinks.hdfsBinarySink.hdfs.fileType = DataStream
>
> agent.sinks.hdfsBinarySink.serializer =  avro_client
>
> agent.sinks.hdfsBinarySink.hdfs.writeFormat =  writable
>
>
>
> The serializer used is part of flume distribution viz.
>
>
> flume-ng-core/src/main/java/org/apache/flume/serialization/FlumeEventAvroEventSerializer.java
>
>
>
> A file thus written can be processed with AVRO mapreduce API found in AVRO
> distribution.
>
>
>
> Also note that simply using DataStream doesn't mean it's a text file, the
> serializer and hdfs.writeFormat also decide
>
> whether the file is text or binary.
>
>
>
> I've read the entire HDFS sink code and exprimented with it a lot, so if
> you want more details, let me know.
>
>
>
>
>
>
> If you are writing to text or binary files on HDFS (i.e. you have set
> hdfs.fileType = DataStream or CompressedStream in your config), then you
> can supply your own custom serializer, which will allow you to write
> headers to HDFS. You will need to write a serializer that implements
> org.apache.flume.serialization.EventSerializer.
>
> If, on the other hand, you are writing to HDFS SequenceFiles, then
> unfortunately there is no way to customize the way that events are
> serialized, so you cannot write event headers to HDFS. This is a known
> issue (FLUME-1100) and I have supplied a patch to fix it.
>
> Chris.
>
>
>
>
> On 2012/08/21 11:36, David Capwell wrote:
>
> I was wondering if I pass random data to an event's header, can the
> HDFSSink write it to HDFS?  I know it can use the headers to split the data
> into different paths, but what about writing the data to HDFS itself?
>
> thanks for your time reading this email.
>
>
>
>
>
>
> 이 메일은 지정된 수취인만을 위해 작성되었으며, 중요한 정보나 저작권을 포함하고 있을 수 있습니다. 어떠한 권한 없이, 본 문서에
> 포함된 정보의 전부 또는 일부를 무단으로 제3자에게 공개, 배포, 복사 또는 사용하는 것을 엄격히 금지합니다. 만약, 본 메일이 잘못
> 전송된 경우, 발신인 또는 당사에 알려주시고, 본 메일을 즉시 삭제하여 주시기 바랍니다.
> This E-mail may contain confidential information and/or copyright
> material. This email is intended for the use of the addressee only. If you
> receive this email by mistake, please either delete it without reproducing,
> distributing or retaining copies thereof or notify the sender immediately.
>

RE: Can HDFSSink write headers as well?

Posted by "ashutosh (오픈플랫폼개발팀)" <sh...@kt.com>.
Hi All,

I am using the “avro_event” serializer  with writable format as DataStream file type to store the events into hdfs.
I would like to read the file for further analysis. I am new to avro and don’t have idea; how to develop the de-serializer to read the flume’s events written in hdfs file.

If anyone could share the sample or example, it would be nice to me. Please help….

Thanks & Regards,
Ashutosh Sharma

From: Bhaskar V. Karambelkar [mailto:bhaskarvk@gmail.com]
Sent: Wednesday, August 22, 2012 12:22 AM
To: user@flume.apache.org
Subject: Re: Can HDFSSink write headers as well?


On Tue, Aug 21, 2012 at 2:25 AM, バーチャル クリストファー <bi...@infoscience.co.jp>> wrote:
Hi David,

Currently there is no way to write headers to HDFS using the built-in Flume functionality.

This is not entirely true, the following combination will write headers to HDFS, in an avro_data file format (binary).

agent.sinks.hdfsBinarySink.hdfs.fileType = DataStream
agent.sinks.hdfsBinarySink.serializer =  avro_client
agent.sinks.hdfsBinarySink.hdfs.writeFormat =  writable

The serializer used is part of flume distribution viz.
flume-ng-core/src/main/java/org/apache/flume/serialization/FlumeEventAvroEventSerializer.java

A file thus written can be processed with AVRO mapreduce API found in AVRO distribution.

Also note that simply using DataStream doesn't mean it's a text file, the serializer and hdfs.writeFormat also decide
whether the file is text or binary.

I've read the entire HDFS sink code and exprimented with it a lot, so if you want more details, let me know.



If you are writing to text or binary files on HDFS (i.e. you have set hdfs.fileType = DataStream or CompressedStream in your config), then you can supply your own custom serializer, which will allow you to write headers to HDFS. You will need to write a serializer that implements org.apache.flume.serialization.EventSerializer.

If, on the other hand, you are writing to HDFS SequenceFiles, then unfortunately there is no way to customize the way that events are serialized, so you cannot write event headers to HDFS. This is a known issue (FLUME-1100) and I have supplied a patch to fix it.

Chris.



On 2012/08/21 11:36, David Capwell wrote:
I was wondering if I pass random data to an event's header, can the HDFSSink write it to HDFS?  I know it can use the headers to split the data into different paths, but what about writing the data to HDFS itself?

thanks for your time reading this email.




이 메일은 지정된 수취인만을 위해 작성되었으며, 중요한 정보나 저작권을 포함하고 있을 수 있습니다. 어떠한 권한 없이, 본 문서에 포함된 정보의 전부 또는 일부를 무단으로 제3자에게 공개, 배포, 복사 또는 사용하는 것을 엄격히 금지합니다. 만약, 본 메일이 잘못 전송된 경우, 발신인 또는 당사에 알려주시고, 본 메일을 즉시 삭제하여 주시기 바랍니다.
This E-mail may contain confidential information and/or copyright material. This email is intended for the use of the addressee only. If you receive this email by mistake, please either delete it without reproducing, distributing or retaining copies thereof or notify the sender immediately.

Re: Can HDFSSink write headers as well?

Posted by "Bhaskar V. Karambelkar" <bh...@gmail.com>.
On Tue, Aug 21, 2012 at 2:25 AM, バーチャル クリストファー
<bi...@infoscience.co.jp>wrote:

> Hi David,
>
> Currently there is no way to write headers to HDFS using the built-in
> Flume functionality.
>

This is not entirely true, the following combination will write headers to
HDFS, in an avro_data file format (binary).

agent.sinks.hdfsBinarySink.hdfs.fileType = DataStream
agent.sinks.hdfsBinarySink.serializer =  avro_client
agent.sinks.hdfsBinarySink.hdfs.writeFormat =  writable

The serializer used is part of flume distribution viz.
flume-ng-core/src/main/java/org/apache/flume/serialization/FlumeEventAvroEventSerializer.java

A file thus written can be processed with AVRO mapreduce API found in AVRO
distribution.

Also note that simply using DataStream doesn't mean it's a text file, the
serializer and hdfs.writeFormat also decide
whether the file is text or binary.

I've read the entire HDFS sink code and exprimented with it a lot, so if
you want more details, let me know.



>
> If you are writing to text or binary files on HDFS (i.e. you have set
> hdfs.fileType = DataStream or CompressedStream in your config), then you
> can supply your own custom serializer, which will allow you to write
> headers to HDFS. You will need to write a serializer that implements
> org.apache.flume.**serialization.EventSerializer.
>
> If, on the other hand, you are writing to HDFS SequenceFiles, then
> unfortunately there is no way to customize the way that events are
> serialized, so you cannot write event headers to HDFS. This is a known
> issue (FLUME-1100) and I have supplied a patch to fix it.
>
> Chris.
>
>
>
> On 2012/08/21 11:36, David Capwell wrote:
>
>> I was wondering if I pass random data to an event's header, can the
>> HDFSSink write it to HDFS?  I know it can use the headers to split the data
>> into different paths, but what about writing the data to HDFS itself?
>>
>> thanks for your time reading this email.
>>
>
>
>

Re: Can HDFSSink write headers as well?

Posted by David Capwell <dc...@gmail.com>.
Thanks.  Text file should work for now
On Aug 20, 2012 11:25 PM, "バーチャル クリストファー" <bi...@infoscience.co.jp>
wrote:

> Hi David,
>
> Currently there is no way to write headers to HDFS using the built-in
> Flume functionality.
>
> If you are writing to text or binary files on HDFS (i.e. you have set
> hdfs.fileType = DataStream or CompressedStream in your config), then you
> can supply your own custom serializer, which will allow you to write
> headers to HDFS. You will need to write a serializer that implements
> org.apache.flume.**serialization.EventSerializer.
>
> If, on the other hand, you are writing to HDFS SequenceFiles, then
> unfortunately there is no way to customize the way that events are
> serialized, so you cannot write event headers to HDFS. This is a known
> issue (FLUME-1100) and I have supplied a patch to fix it.
>
> Chris.
>
>
> On 2012/08/21 11:36, David Capwell wrote:
>
>> I was wondering if I pass random data to an event's header, can the
>> HDFSSink write it to HDFS?  I know it can use the headers to split the data
>> into different paths, but what about writing the data to HDFS itself?
>>
>> thanks for your time reading this email.
>>
>
>
>

Re: Can HDFSSink write headers as well?

Posted by バーチャル クリストファー <bi...@infoscience.co.jp>.
Hi David,

Currently there is no way to write headers to HDFS using the built-in 
Flume functionality.

If you are writing to text or binary files on HDFS (i.e. you have set 
hdfs.fileType = DataStream or CompressedStream in your config), then you 
can supply your own custom serializer, which will allow you to write 
headers to HDFS. You will need to write a serializer that implements 
org.apache.flume.serialization.EventSerializer.

If, on the other hand, you are writing to HDFS SequenceFiles, then 
unfortunately there is no way to customize the way that events are 
serialized, so you cannot write event headers to HDFS. This is a known 
issue (FLUME-1100) and I have supplied a patch to fix it.

Chris.


On 2012/08/21 11:36, David Capwell wrote:
> I was wondering if I pass random data to an event's header, can the 
> HDFSSink write it to HDFS?  I know it can use the headers to split the 
> data into different paths, but what about writing the data to HDFS 
> itself?
>
> thanks for your time reading this email.