You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by zzz <sq...@gmail.com> on 2014/09/18 01:59:42 UTC

getting Avro into Flume

I am using Cloudera CDH 5.1 and running a Flume agent configured by
Cloudera manager.

I would like to send Avro data to Flume, and I was assuming the Avro Source
would be the appropriate method to send data in this way.

However, the examples of Java clients that send data via the Avro Source,
send simple strings, not Avro objects to be serialized, e.g. the example
here: https://flume.apache.org/FlumeDeveloperGuide.html

And the examples of Avro serialization all seem to be able serializing to
disk.

In my use case, I am basically receiving a real-time stream of JSON
documents, which I am able to convert to Avro objects, and would like to
put them into Flume. I would then like to be able to index this Avro data
in Solr via the Solr sink, and convert it to Parquet format in HDFS using
the HDFS sink.

Is this possible or am I coming about this the wrong way?

Re: getting Avro into Flume

Posted by Hari Shreedharan <hs...@cloudera.com>.
Yes, to Avro Source. RPC client sends it to Avro Source (unless you use Thrift source).


Thanks,
Hari

On Wed, Sep 17, 2014 at 5:26 PM, zzz <sq...@gmail.com> wrote:

> Thanks for the quick reply Hari.
> When you say send data to Flume using the RPC Client API, do you mean send
> it to the Avro Source? If not, which source? Because that is currently what
> I am trying to do. I wasn't sure if encoding Avro data as byte[] and
> sending it to the Avro Source was a valid approach, but from what you are
> saying there is a way for sources (at least the HDFS source) to recognize
> the encoded Avro data. I hope the Solr source can be made to be similarly
> aware.
> Would encoding the Avro data as byte[] and sending it to flume via the HTTP
> interface also work?
> I was actually having trouble converting an Avro object to a byte[] array
> to start with...but I will try that again.
> On Thu, Sep 18, 2014 at 10:16 AM, Hari Shreedharan <
> hshreedharan@cloudera.com> wrote:
>> No, the Avro Source is an RPC source. To send data to Flume use the RPC
>> client API (https://flume.apache.org/FlumeDeveloperGuide.html#client).
>> Just encode your Avro data as byte[] and use the AVRO_EVENT serializer
>> while writing to HDFS.
>>
>> Thanks,
>> Hari
>>
>>
>> On Wed, Sep 17, 2014 at 5:13 PM, zzz <sq...@gmail.com> wrote:
>>
>>> I am using Cloudera CDH 5.1 and running a Flume agent configured by
>>> Cloudera manager.
>>>
>>> I would like to send Avro data to Flume, and I was assuming the Avro
>>> Source would be the appropriate method to send data in this way.
>>>
>>> However, the examples of Java clients that send data via the Avro Source,
>>> send simple strings, not Avro objects to be serialized, e.g. the example
>>> here: https://flume.apache.org/FlumeDeveloperGuide.html
>>>
>>> And the examples of Avro serialization all seem to be able serializing to
>>> disk.
>>>
>>> In my use case, I am basically receiving a real-time stream of JSON
>>> documents, which I am able to convert to Avro objects, and would like to
>>> put them into Flume. I would then like to be able to index this Avro data
>>> in Solr via the Solr sink, and convert it to Parquet format in HDFS using
>>> the HDFS sink.
>>>
>>> Is this possible or am I coming about this the wrong way?
>>>
>>
>>

Re: getting Avro into Flume

Posted by zzz <sq...@gmail.com>.
Thanks for the quick reply Hari.

When you say send data to Flume using the RPC Client API, do you mean send
it to the Avro Source? If not, which source? Because that is currently what
I am trying to do. I wasn't sure if encoding Avro data as byte[] and
sending it to the Avro Source was a valid approach, but from what you are
saying there is a way for sources (at least the HDFS source) to recognize
the encoded Avro data. I hope the Solr source can be made to be similarly
aware.

Would encoding the Avro data as byte[] and sending it to flume via the HTTP
interface also work?

I was actually having trouble converting an Avro object to a byte[] array
to start with...but I will try that again.

On Thu, Sep 18, 2014 at 10:16 AM, Hari Shreedharan <
hshreedharan@cloudera.com> wrote:

> No, the Avro Source is an RPC source. To send data to Flume use the RPC
> client API (https://flume.apache.org/FlumeDeveloperGuide.html#client).
> Just encode your Avro data as byte[] and use the AVRO_EVENT serializer
> while writing to HDFS.
>
> Thanks,
> Hari
>
>
> On Wed, Sep 17, 2014 at 5:13 PM, zzz <sq...@gmail.com> wrote:
>
>> I am using Cloudera CDH 5.1 and running a Flume agent configured by
>> Cloudera manager.
>>
>> I would like to send Avro data to Flume, and I was assuming the Avro
>> Source would be the appropriate method to send data in this way.
>>
>> However, the examples of Java clients that send data via the Avro Source,
>> send simple strings, not Avro objects to be serialized, e.g. the example
>> here: https://flume.apache.org/FlumeDeveloperGuide.html
>>
>> And the examples of Avro serialization all seem to be able serializing to
>> disk.
>>
>> In my use case, I am basically receiving a real-time stream of JSON
>> documents, which I am able to convert to Avro objects, and would like to
>> put them into Flume. I would then like to be able to index this Avro data
>> in Solr via the Solr sink, and convert it to Parquet format in HDFS using
>> the HDFS sink.
>>
>> Is this possible or am I coming about this the wrong way?
>>
>
>

Re: getting Avro into Flume

Posted by Hari Shreedharan <hs...@cloudera.com>.
No, the Avro Source is an RPC source. To send data to Flume use the RPC client API (https://flume.apache.org/FlumeDeveloperGuide.html#client). Just encode your Avro data as byte[] and use the AVRO_EVENT serializer while writing to HDFS.


Thanks,
Hari

On Wed, Sep 17, 2014 at 5:13 PM, zzz <sq...@gmail.com> wrote:

> I am using Cloudera CDH 5.1 and running a Flume agent configured by
> Cloudera manager.
> I would like to send Avro data to Flume, and I was assuming the Avro Source
> would be the appropriate method to send data in this way.
> However, the examples of Java clients that send data via the Avro Source,
> send simple strings, not Avro objects to be serialized, e.g. the example
> here: https://flume.apache.org/FlumeDeveloperGuide.html
> And the examples of Avro serialization all seem to be able serializing to
> disk.
> In my use case, I am basically receiving a real-time stream of JSON
> documents, which I am able to convert to Avro objects, and would like to
> put them into Flume. I would then like to be able to index this Avro data
> in Solr via the Solr sink, and convert it to Parquet format in HDFS using
> the HDFS sink.
> Is this possible or am I coming about this the wrong way?