You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Ed Judge <ej...@gmail.com> on 2014/09/03 03:36:59 UTC

Avro source and sink

Does anyone know of any good documentation that talks about the protocol/negotiation used between an Avro source and sink?

Thanks,
Ed


Re: Avro source and sink

Posted by Ed Judge <ej...@gmail.com>.
Joey,
Thank you very much for the info. Let me look this over and digest it. 

Thanks,
Ed. 

Sent from my iPhone


> On Sep 8, 2014, at 4:27 PM, Joey Echeverria <jo...@cloudera.com> wrote:
> 
> Hi Ed!
> 
> This is definitely doable. What you want is an intercepter on source A that will do the conversion from log lines to Avro. The easiest way to do this would probably be to use Morphlines[1] from the Kite SDK[2]. This will let you put all of your transformation logic into a configuration file that's easy to change at runtime.
> 
> In particular, check out the example of parsing syslog lines[3]. The documentation for configure a Morphlines Intercepter is available on the Flume docs site[4]. You also have to use the Static Intercepter[5] to add the Avro schema to the Flume event header.
> 
> Also, you might consider using the Kite DatasetSink[6] rather than the default Avro serializer. Don't let the "experimental" label scare you, it's quite stable when writing to an Avro dataset (the default) and it will have support for Parquet in the next release of Flume.
> 
> There is a rather complete example of this available in the Kite Examples project[7], though with JSON as the source and using the regular HDFS sink.
> 
> I've been working to update the Kite examples to use the DatasetSink[8], but I haven't gotten around to the JSON example yet. Let me know if you think it will be useful and I'll post an update in the same branch.
> 
> Cheers,
> 
> -Joey
> 
> 
> [1] http://kitesdk.org/docs/current/kite-morphlines/index.html
> [2] http://kitesdk.org/docs/current/
> [3] http://kitesdk.org/docs/current/kite-morphlines/index.html#/SyslogUseCase-Requirements
> [4] http://flume.apache.org/FlumeUserGuide.html#morphline-interceptor
> [5] http://flume.apache.org/FlumeUserGuide.html#static-interceptor
> [6] http://flume.apache.org/FlumeUserGuide.html#kite-dataset-sink-experimental
> [7] https://github.com/kite-sdk/kite-examples/tree/master/json
> [8] https://github.com/joey/kite-examples/tree/cdk-647-datasetsink-flume-log4j-appender
> 
>> On Mon, Sep 8, 2014 at 12:00 PM, Ed Judge <ej...@gmail.com> wrote:
>> Thanks for the reply.  My understanding of the current avro sink/source is that the schema is just a simple header/body schema.  What I ultimately want to do is read a log file and write it into an avro file on a HDFS at sink B.  However, before that I would like to parse the log file at source A first and come up with a more detailed schema which might include for example date, priority, subsystem, and log message fields.  I am trying to “normalize’ the events so that we could also potentially filter certain log file fields at the source.  Does this make sense?
>> 
>>> On Sep 6, 2014, at 11:52 AM, Ashish <pa...@gmail.com> wrote:
>>> 
>>> I am not sure I understand the question correctly, let me try to answer based on my understanding
>>> 
>>> source A -> channel A -> sink A ———> source B -> channel B -> sink B
>>> 
>>> For the scenario, Sink A has to be an Avro sink and Source B has to be an Avro Source for the flow to work.
>>> Flume would use avro for RPC (look at flume-ng-sdk/src/main/avro/flume.avdl). It defines how Flume would send Event(s) across using Avro RPC. 
>>> 
>>> 1. Source A (spooled dir) would read files and create Events from it and insert into channel
>>> 2. Sink A (Avro sink) would read from Channel, and would translate Event into AvroFlumeEvent for sending to Source B (Avro Source)
>>> 3. Source B would read from AvroFlumeEvent and create an Event and insert into channel, which shall be processed by Sink B
>>> 
>>> It's not that event would be wrapped into events as it traverse down the chain. Avro encoding would just exist between Sink A and Source B. 
>>> 
>>> Based on my understanding, you are looking at encoding log file lines using avro. In that case, the avro encoded log file lines would be part of Event body, rest would be same as Step 1-3
>>> 
>>> HTH !
>>> 
>>> 
>>>> On Fri, Sep 5, 2014 at 12:58 AM, Ed Judge <ej...@gmail.com> wrote:
>>>> Ok, I have looked over the source and it is making a little more sense.
>>>> 
>>>> I think what I ultimately want to do is this:
>>>> 
>>>> source A -> channel A -> sink A ———> source B -> channel B -> sink B
>>>> 
>>>> source A will be looking at a log file.  Each line of the log file will have a certain format/schema. I would write Source A such that it could write the schema/line as an event into the channel and pass that through the system all the way ultimately to sink B so that it would know the schema also.  
>>>> I was thinking Avro would be a good format for source A to use when writing into it’s channel.  If Sink A is an existing Avro Sink and Source B is an exiting Avro source, would this still work?  Does this mean I would have 2 Avro headers (one encapsulating the other) which wasteful or can the existing Avro source and sink deal with this unmodified?  Is there a better way to accomplish what I want to do?  Just looking for some guidance.
>>>> 
>>>> Thanks,
>>>> Ed
>>>> 
>>>>> On Sep 4, 2014, at 4:44 AM, Ashish <pa...@gmail.com> wrote:
>>>>> 
>>>>> Avro records shall have the schema embedded with them. Have a look at source, that shall help a bit
>>>>> 
>>>>> 
>>>>>> On Wed, Sep 3, 2014 at 10:30 PM, Ed Judge <ej...@gmail.com> wrote:
>>>>>> That’s helpful but isn’t there some type of Avro schema negotiation that occurs?
>>>>>> 
>>>>>> -Ed
>>>>>> 
>>>>>>> On Sep 3, 2014, at 12:02 AM, Jeff Lord <jl...@cloudera.com> wrote:
>>>>>>> 
>>>>>>> Ed,
>>>>>>> 
>>>>>>> Did you take a look at the javadoc in the source?
>>>>>>> Basically the source uses netty as a server and the sink is just an rpc client.
>>>>>>> If you read over the doc which is in the two links below and take a look at the developer guide and still have questions just ask away and someone will help to answer.
>>>>>>> 
>>>>>>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/AvroSource.java
>>>>>>> 
>>>>>>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/sink/AvroSink.java
>>>>>>> 
>>>>>>> https://flume.apache.org/FlumeDeveloperGuide.html#transaction-interface
>>>>>>> 
>>>>>>> -Jeff
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Tue, Sep 2, 2014 at 6:36 PM, Ed Judge <ej...@gmail.com> wrote:
>>>>>>>> Does anyone know of any good documentation that talks about the protocol/negotiation used between an Avro source and sink?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Ed
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> thanks
>>>>> ashish
>>>>> 
>>>>> Blog: http://www.ashishpaliwal.com/blog
>>>>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>>> 
>>> 
>>> 
>>> -- 
>>> thanks
>>> ashish
>>> 
>>> Blog: http://www.ashishpaliwal.com/blog
>>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
> 
> 
> 
> -- 
> Joey Echeverria

Re: Avro source and sink

Posted by Joey Echeverria <jo...@cloudera.com>.
Hi Ed!

This is definitely doable. What you want is an intercepter on source A that
will do the conversion from log lines to Avro. The easiest way to do this
would probably be to use Morphlines[1] from the Kite SDK[2]. This will let
you put all of your transformation logic into a configuration file that's
easy to change at runtime.

In particular, check out the example of parsing syslog lines[3]. The
documentation for configure a Morphlines Intercepter is available on the
Flume docs site[4]. You also have to use the Static Intercepter[5] to add
the Avro schema to the Flume event header.

Also, you might consider using the Kite DatasetSink[6] rather than the
default Avro serializer. Don't let the "experimental" label scare you, it's
quite stable when writing to an Avro dataset (the default) and it will have
support for Parquet in the next release of Flume.

There is a rather complete example of this available in the Kite Examples
project[7], though with JSON as the source and using the regular HDFS sink.

I've been working to update the Kite examples to use the DatasetSink[8],
but I haven't gotten around to the JSON example yet. Let me know if you
think it will be useful and I'll post an update in the same branch.

Cheers,

-Joey


[1] http://kitesdk.org/docs/current/kite-morphlines/index.html
[2] http://kitesdk.org/docs/current/
[3]
http://kitesdk.org/docs/current/kite-morphlines/index.html#/SyslogUseCase-Requirements
[4] http://flume.apache.org/FlumeUserGuide.html#morphline-interceptor
[5] http://flume.apache.org/FlumeUserGuide.html#static-interceptor
[6]
http://flume.apache.org/FlumeUserGuide.html#kite-dataset-sink-experimental
[7] https://github.com/kite-sdk/kite-examples/tree/master/json
[8]
https://github.com/joey/kite-examples/tree/cdk-647-datasetsink-flume-log4j-appender

On Mon, Sep 8, 2014 at 12:00 PM, Ed Judge <ej...@gmail.com> wrote:

> Thanks for the reply.  My understanding of the current avro sink/source is
> that the schema is just a simple header/body schema.  What I ultimately
> want to do is read a log file and write it into an avro file on a HDFS at
> sink B.  However, before that I would like to parse the log file at source
> A first and come up with a more detailed schema which might include for
> example date, priority, subsystem, and log message fields.  I am trying to
> “normalize’ the events so that we could also potentially filter certain log
> file fields at the source.  Does this make sense?
>
> On Sep 6, 2014, at 11:52 AM, Ashish <pa...@gmail.com> wrote:
>
> I am not sure I understand the question correctly, let me try to answer
> based on my understanding
>
> source A -> channel A -> sink A ———> source B -> channel B -> sink B
>
> For the scenario, Sink A has to be an Avro sink and Source B has to be an
> Avro Source for the flow to work.
> Flume would use avro for RPC (look at
> flume-ng-sdk/src/main/avro/flume.avdl). It defines how Flume would send
> Event(s) across using Avro RPC.
>
> 1. Source A (spooled dir) would read files and create Events from it and
> insert into channel
> 2. Sink A (Avro sink) would read from Channel, and would translate Event
> into AvroFlumeEvent for sending to Source B (Avro Source)
> 3. Source B would read from AvroFlumeEvent and create an Event and insert
> into channel, which shall be processed by Sink B
>
> It's not that event would be wrapped into events as it traverse down the
> chain. Avro encoding would just exist between Sink A and Source B.
>
> Based on my understanding, you are looking at encoding log file lines
> using avro. In that case, the avro encoded log file lines would be part of
> Event body, rest would be same as Step 1-3
>
> HTH !
>
>
> On Fri, Sep 5, 2014 at 12:58 AM, Ed Judge <ej...@gmail.com> wrote:
>
>> Ok, I have looked over the source and it is making a little more sense.
>>
>> I think what I ultimately want to do is this:
>>
>> source A -> channel A -> sink A ———> source B -> channel B -> sink B
>>
>> source A will be looking at a log file.  Each line of the log file will
>> have a certain format/schema. I would write Source A such that it could
>> write the schema/line as an event into the channel and pass that through
>> the system all the way ultimately to sink B so that it would know the
>> schema also.
>> I was thinking Avro would be a good format for source A to use when
>> writing into it’s channel.  If Sink A is an existing Avro Sink and Source B
>> is an exiting Avro source, would this still work?  Does this mean I would
>> have 2 Avro headers (one encapsulating the other) which wasteful or can the
>> existing Avro source and sink deal with this unmodified?  Is there a better
>> way to accomplish what I want to do?  Just looking for some guidance.
>>
>> Thanks,
>> Ed
>>
>> On Sep 4, 2014, at 4:44 AM, Ashish <pa...@gmail.com> wrote:
>>
>> Avro records shall have the schema embedded with them. Have a look at
>> source, that shall help a bit
>>
>>
>> On Wed, Sep 3, 2014 at 10:30 PM, Ed Judge <ej...@gmail.com> wrote:
>>
>>> That’s helpful but isn’t there some type of Avro schema negotiation that
>>> occurs?
>>>
>>> -Ed
>>>
>>> On Sep 3, 2014, at 12:02 AM, Jeff Lord <jl...@cloudera.com> wrote:
>>>
>>> Ed,
>>>
>>> Did you take a look at the javadoc in the source?
>>> Basically the source uses netty as a server and the sink is just an rpc
>>> client.
>>> If you read over the doc which is in the two links below and take a look
>>> at the developer guide and still have questions just ask away and someone
>>> will help to answer.
>>>
>>>
>>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/AvroSource.java
>>>
>>>
>>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/sink/AvroSink.java
>>>
>>> https://flume.apache.org/FlumeDeveloperGuide.html#transaction-interface
>>>
>>> -Jeff
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Sep 2, 2014 at 6:36 PM, Ed Judge <ej...@gmail.com> wrote:
>>>
>>>> Does anyone know of any good documentation that talks about the
>>>> protocol/negotiation used between an Avro source and sink?
>>>>
>>>> Thanks,
>>>> Ed
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> thanks
>> ashish
>>
>> Blog: http://www.ashishpaliwal.com/blog
>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>>
>>
>>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>
>
>


-- 
Joey Echeverria

Re: Avro source and sink

Posted by Ed Judge <ej...@gmail.com>.
Thanks for the reply.  My understanding of the current avro sink/source is that the schema is just a simple header/body schema.  What I ultimately want to do is read a log file and write it into an avro file on a HDFS at sink B.  However, before that I would like to parse the log file at source A first and come up with a more detailed schema which might include for example date, priority, subsystem, and log message fields.  I am trying to “normalize’ the events so that we could also potentially filter certain log file fields at the source.  Does this make sense?

On Sep 6, 2014, at 11:52 AM, Ashish <pa...@gmail.com> wrote:

> I am not sure I understand the question correctly, let me try to answer based on my understanding
> 
> source A -> channel A -> sink A ———> source B -> channel B -> sink B
> 
> For the scenario, Sink A has to be an Avro sink and Source B has to be an Avro Source for the flow to work.
> Flume would use avro for RPC (look at flume-ng-sdk/src/main/avro/flume.avdl). It defines how Flume would send Event(s) across using Avro RPC. 
> 
> 1. Source A (spooled dir) would read files and create Events from it and insert into channel
> 2. Sink A (Avro sink) would read from Channel, and would translate Event into AvroFlumeEvent for sending to Source B (Avro Source)
> 3. Source B would read from AvroFlumeEvent and create an Event and insert into channel, which shall be processed by Sink B
> 
> It's not that event would be wrapped into events as it traverse down the chain. Avro encoding would just exist between Sink A and Source B. 
> 
> Based on my understanding, you are looking at encoding log file lines using avro. In that case, the avro encoded log file lines would be part of Event body, rest would be same as Step 1-3
> 
> HTH !
> 
> 
> On Fri, Sep 5, 2014 at 12:58 AM, Ed Judge <ej...@gmail.com> wrote:
> Ok, I have looked over the source and it is making a little more sense.
> 
> I think what I ultimately want to do is this:
> 
> source A -> channel A -> sink A ———> source B -> channel B -> sink B
> 
> source A will be looking at a log file.  Each line of the log file will have a certain format/schema. I would write Source A such that it could write the schema/line as an event into the channel and pass that through the system all the way ultimately to sink B so that it would know the schema also.  
> I was thinking Avro would be a good format for source A to use when writing into it’s channel.  If Sink A is an existing Avro Sink and Source B is an exiting Avro source, would this still work?  Does this mean I would have 2 Avro headers (one encapsulating the other) which wasteful or can the existing Avro source and sink deal with this unmodified?  Is there a better way to accomplish what I want to do?  Just looking for some guidance.
> 
> Thanks,
> Ed
> 
> On Sep 4, 2014, at 4:44 AM, Ashish <pa...@gmail.com> wrote:
> 
>> Avro records shall have the schema embedded with them. Have a look at source, that shall help a bit
>> 
>> 
>> On Wed, Sep 3, 2014 at 10:30 PM, Ed Judge <ej...@gmail.com> wrote:
>> That’s helpful but isn’t there some type of Avro schema negotiation that occurs?
>> 
>> -Ed
>> 
>> On Sep 3, 2014, at 12:02 AM, Jeff Lord <jl...@cloudera.com> wrote:
>> 
>>> Ed,
>>> 
>>> Did you take a look at the javadoc in the source?
>>> Basically the source uses netty as a server and the sink is just an rpc client.
>>> If you read over the doc which is in the two links below and take a look at the developer guide and still have questions just ask away and someone will help to answer.
>>> 
>>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/AvroSource.java
>>> 
>>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/sink/AvroSink.java
>>> 
>>> https://flume.apache.org/FlumeDeveloperGuide.html#transaction-interface
>>> 
>>> -Jeff
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Sep 2, 2014 at 6:36 PM, Ed Judge <ej...@gmail.com> wrote:
>>> Does anyone know of any good documentation that talks about the protocol/negotiation used between an Avro source and sink?
>>> 
>>> Thanks,
>>> Ed
>>> 
>>> 
>> 
>> 
>> 
>> 
>> -- 
>> thanks
>> ashish
>> 
>> Blog: http://www.ashishpaliwal.com/blog
>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
> 
> 
> 
> 
> -- 
> thanks
> ashish
> 
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal


Re: Avro source and sink

Posted by Ashish <pa...@gmail.com>.
I am not sure I understand the question correctly, let me try to answer
based on my understanding

source A -> channel A -> sink A ———> source B -> channel B -> sink B

For the scenario, Sink A has to be an Avro sink and Source B has to be an
Avro Source for the flow to work.
Flume would use avro for RPC (look at
flume-ng-sdk/src/main/avro/flume.avdl). It defines how Flume would send
Event(s) across using Avro RPC.

1. Source A (spooled dir) would read files and create Events from it and
insert into channel
2. Sink A (Avro sink) would read from Channel, and would translate Event
into AvroFlumeEvent for sending to Source B (Avro Source)
3. Source B would read from AvroFlumeEvent and create an Event and insert
into channel, which shall be processed by Sink B

It's not that event would be wrapped into events as it traverse down the
chain. Avro encoding would just exist between Sink A and Source B.

Based on my understanding, you are looking at encoding log file lines using
avro. In that case, the avro encoded log file lines would be part of Event
body, rest would be same as Step 1-3

HTH !


On Fri, Sep 5, 2014 at 12:58 AM, Ed Judge <ej...@gmail.com> wrote:

> Ok, I have looked over the source and it is making a little more sense.
>
> I think what I ultimately want to do is this:
>
> source A -> channel A -> sink A ———> source B -> channel B -> sink B
>
> source A will be looking at a log file.  Each line of the log file will
> have a certain format/schema. I would write Source A such that it could
> write the schema/line as an event into the channel and pass that through
> the system all the way ultimately to sink B so that it would know the
> schema also.
> I was thinking Avro would be a good format for source A to use when
> writing into it’s channel.  If Sink A is an existing Avro Sink and Source B
> is an exiting Avro source, would this still work?  Does this mean I would
> have 2 Avro headers (one encapsulating the other) which wasteful or can the
> existing Avro source and sink deal with this unmodified?  Is there a better
> way to accomplish what I want to do?  Just looking for some guidance.
>
> Thanks,
> Ed
>
> On Sep 4, 2014, at 4:44 AM, Ashish <pa...@gmail.com> wrote:
>
> Avro records shall have the schema embedded with them. Have a look at
> source, that shall help a bit
>
>
> On Wed, Sep 3, 2014 at 10:30 PM, Ed Judge <ej...@gmail.com> wrote:
>
>> That’s helpful but isn’t there some type of Avro schema negotiation that
>> occurs?
>>
>> -Ed
>>
>> On Sep 3, 2014, at 12:02 AM, Jeff Lord <jl...@cloudera.com> wrote:
>>
>> Ed,
>>
>> Did you take a look at the javadoc in the source?
>> Basically the source uses netty as a server and the sink is just an rpc
>> client.
>> If you read over the doc which is in the two links below and take a look
>> at the developer guide and still have questions just ask away and someone
>> will help to answer.
>>
>>
>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/AvroSource.java
>>
>>
>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/sink/AvroSink.java
>>
>> https://flume.apache.org/FlumeDeveloperGuide.html#transaction-interface
>>
>> -Jeff
>>
>>
>>
>>
>>
>>
>> On Tue, Sep 2, 2014 at 6:36 PM, Ed Judge <ej...@gmail.com> wrote:
>>
>>> Does anyone know of any good documentation that talks about the
>>> protocol/negotiation used between an Avro source and sink?
>>>
>>> Thanks,
>>> Ed
>>>
>>>
>>
>>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>
>
>


-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: Avro source and sink

Posted by Ed Judge <ej...@gmail.com>.
Ok, I have looked over the source and it is making a little more sense.

I think what I ultimately want to do is this:

source A -> channel A -> sink A ———> source B -> channel B -> sink B

source A will be looking at a log file.  Each line of the log file will have a certain format/schema. I would write Source A such that it could write the schema/line as an event into the channel and pass that through the system all the way ultimately to sink B so that it would know the schema also.  
I was thinking Avro would be a good format for source A to use when writing into it’s channel.  If Sink A is an existing Avro Sink and Source B is an exiting Avro source, would this still work?  Does this mean I would have 2 Avro headers (one encapsulating the other) which wasteful or can the existing Avro source and sink deal with this unmodified?  Is there a better way to accomplish what I want to do?  Just looking for some guidance.

Thanks,
Ed

On Sep 4, 2014, at 4:44 AM, Ashish <pa...@gmail.com> wrote:

> Avro records shall have the schema embedded with them. Have a look at source, that shall help a bit
> 
> 
> On Wed, Sep 3, 2014 at 10:30 PM, Ed Judge <ej...@gmail.com> wrote:
> That’s helpful but isn’t there some type of Avro schema negotiation that occurs?
> 
> -Ed
> 
> On Sep 3, 2014, at 12:02 AM, Jeff Lord <jl...@cloudera.com> wrote:
> 
>> Ed,
>> 
>> Did you take a look at the javadoc in the source?
>> Basically the source uses netty as a server and the sink is just an rpc client.
>> If you read over the doc which is in the two links below and take a look at the developer guide and still have questions just ask away and someone will help to answer.
>> 
>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/AvroSource.java
>> 
>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/sink/AvroSink.java
>> 
>> https://flume.apache.org/FlumeDeveloperGuide.html#transaction-interface
>> 
>> -Jeff
>> 
>> 
>> 
>> 
>> 
>> 
>> On Tue, Sep 2, 2014 at 6:36 PM, Ed Judge <ej...@gmail.com> wrote:
>> Does anyone know of any good documentation that talks about the protocol/negotiation used between an Avro source and sink?
>> 
>> Thanks,
>> Ed
>> 
>> 
> 
> 
> 
> 
> -- 
> thanks
> ashish
> 
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal


Re: Avro source and sink

Posted by Ashish <pa...@gmail.com>.
Avro records shall have the schema embedded with them. Have a look at
source, that shall help a bit


On Wed, Sep 3, 2014 at 10:30 PM, Ed Judge <ej...@gmail.com> wrote:

> That’s helpful but isn’t there some type of Avro schema negotiation that
> occurs?
>
> -Ed
>
> On Sep 3, 2014, at 12:02 AM, Jeff Lord <jl...@cloudera.com> wrote:
>
> Ed,
>
> Did you take a look at the javadoc in the source?
> Basically the source uses netty as a server and the sink is just an rpc
> client.
> If you read over the doc which is in the two links below and take a look
> at the developer guide and still have questions just ask away and someone
> will help to answer.
>
>
> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/AvroSource.java
>
>
> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/sink/AvroSink.java
>
> https://flume.apache.org/FlumeDeveloperGuide.html#transaction-interface
>
> -Jeff
>
>
>
>
>
>
> On Tue, Sep 2, 2014 at 6:36 PM, Ed Judge <ej...@gmail.com> wrote:
>
>> Does anyone know of any good documentation that talks about the
>> protocol/negotiation used between an Avro source and sink?
>>
>> Thanks,
>> Ed
>>
>>
>
>


-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: Avro source and sink

Posted by Ed Judge <ej...@gmail.com>.
That’s helpful but isn’t there some type of Avro schema negotiation that occurs?

-Ed

On Sep 3, 2014, at 12:02 AM, Jeff Lord <jl...@cloudera.com> wrote:

> Ed,
> 
> Did you take a look at the javadoc in the source?
> Basically the source uses netty as a server and the sink is just an rpc client.
> If you read over the doc which is in the two links below and take a look at the developer guide and still have questions just ask away and someone will help to answer.
> 
> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/AvroSource.java
> 
> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/sink/AvroSink.java
> 
> https://flume.apache.org/FlumeDeveloperGuide.html#transaction-interface
> 
> -Jeff
> 
> 
> 
> 
> 
> 
> On Tue, Sep 2, 2014 at 6:36 PM, Ed Judge <ej...@gmail.com> wrote:
> Does anyone know of any good documentation that talks about the protocol/negotiation used between an Avro source and sink?
> 
> Thanks,
> Ed
> 
> 


Re: Avro source and sink

Posted by Jeff Lord <jl...@cloudera.com>.
Ed,

Did you take a look at the javadoc in the source?
Basically the source uses netty as a server and the sink is just an rpc
client.
If you read over the doc which is in the two links below and take a look at
the developer guide and still have questions just ask away and someone will
help to answer.

https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/AvroSource.java

https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/sink/AvroSink.java

https://flume.apache.org/FlumeDeveloperGuide.html#transaction-interface

-Jeff






On Tue, Sep 2, 2014 at 6:36 PM, Ed Judge <ej...@gmail.com> wrote:

> Does anyone know of any good documentation that talks about the
> protocol/negotiation used between an Avro source and sink?
>
> Thanks,
> Ed
>
>