You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Juhani Connolly <ju...@cyberagent.co.jp> on 2012/07/30 10:34:03 UTC

Sending avro data from other languages

I'm playing around with making a standalone tail client in python(so 
that I can access inode data) that tracks position in a file and then 
sends it across avro to an avro sink.

However I'm having issues with the avro part of this and wondering if 
anyone more familiar with it could help.

I took the flume.avdl file and converted it using "java -jar 
~/Downloads/avro-tools-1.6.3.jar idl flume.avdl flume.avpr"

I then run it through a simple test program to see if its sending the 
data correctly and it sends from the python client fine, but the sink 
end OOM's because presumably the wire format is wrong:

2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, 
/172.22.114.32:55671 => /172.28.19.112:41414] OPEN
2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, 
/172.22.114.32:55671 => /172.28.19.112:41414] BOUND: /172.28.19.112:41414
2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, 
/172.22.114.32:55671 => /172.28.19.112:41414] CONNECTED: 
/172.22.114.32:55671
2012-07-30 17:22:57,646 WARN ipc.NettyServer: Unexpected exception from 
downstream.
java.lang.OutOfMemoryError: Java heap space
         at java.util.ArrayList.<init>(ArrayList.java:112)
         at 
org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decodePackHeader(NettyTransportCodec.java:154)
         at 
org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decode(NettyTransportCodec.java:131)
         at 
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:282)
         at 
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
         at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
         at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
         at 
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:351)
         at 
org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:282)
         at 
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:202)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:619)
2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, 
/172.22.114.32:55671 :> /172.28.19.112:41414] DISCONNECTED
2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, 
/172.22.114.32:55671 :> /172.28.19.112:41414] UNBOUND
2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, 
/172.22.114.32:55671 :> /172.28.19.112:41414] CLOSED

I've dumped the test program and its output

http://pastebin.com/1DtXZyTu
http://pastebin.com/T9kaqKHY

Re: Sending avro data from other languages

Posted by Brock Noland <br...@cloudera.com>.
This also came up here:
http://mail-archives.apache.org/mod_mbox/flume-user/201207.mbox/%3CCAFukC=6C=jBayVSVAQdiJA+6kWUQ46_uXbDWBbUjDxb+U=aEzQ@mail.gmail.com%3E

I think it would be great if we could get a standalone python client working.

Brock

On Mon, Jul 30, 2012 at 3:34 AM, Juhani Connolly
<ju...@cyberagent.co.jp> wrote:
> I'm playing around with making a standalone tail client in python(so that I
> can access inode data) that tracks position in a file and then sends it
> across avro to an avro sink.
>
> However I'm having issues with the avro part of this and wondering if anyone
> more familiar with it could help.
>
> I took the flume.avdl file and converted it using "java -jar
> ~/Downloads/avro-tools-1.6.3.jar idl flume.avdl flume.avpr"
>
> I then run it through a simple test program to see if its sending the data
> correctly and it sends from the python client fine, but the sink end OOM's
> because presumably the wire format is wrong:
>
> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818,
> /172.22.114.32:55671 => /172.28.19.112:41414] OPEN
> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818,
> /172.22.114.32:55671 => /172.28.19.112:41414] BOUND: /172.28.19.112:41414
> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818,
> /172.22.114.32:55671 => /172.28.19.112:41414] CONNECTED:
> /172.22.114.32:55671
> 2012-07-30 17:22:57,646 WARN ipc.NettyServer: Unexpected exception from
> downstream.
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.ArrayList.<init>(ArrayList.java:112)
>         at
> org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decodePackHeader(NettyTransportCodec.java:154)
>         at
> org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decode(NettyTransportCodec.java:131)
>         at
> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:282)
>         at
> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
>         at
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
>         at
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
>         at
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:351)
>         at
> org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:282)
>         at
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:202)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818,
> /172.22.114.32:55671 :> /172.28.19.112:41414] DISCONNECTED
> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818,
> /172.22.114.32:55671 :> /172.28.19.112:41414] UNBOUND
> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818,
> /172.22.114.32:55671 :> /172.28.19.112:41414] CLOSED
>
> I've dumped the test program and its output
>
> http://pastebin.com/1DtXZyTu
> http://pastebin.com/T9kaqKHY



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Sending avro data from other languages

Posted by Arvind Prabhakar <ar...@apache.org>.
Another alternative to consider for cross-platform/language support would
be protocol buffers. That has relatively better tooling and integration
than other similar systems and is used by other projects as well.

Regards,
Arvind Prabhakar

On Thu, Aug 2, 2012 at 7:01 AM, Brock Noland <br...@cloudera.com> wrote:

> I cannot answer what made us move to Avro. However, I prefer Avro because
> you don't have to build the thrift compiler and you aren't required to do
> code generation.
>
> On Wed, Aug 1, 2012 at 11:06 PM, Juhani Connolly <
> juhani_connolly@cyberagent.co.jp> wrote:
>
> > It looks to me like this was because of the transceiver I was using.
> >
> > Unfortunately it seems like avro doesn't have a python implementation of
> a
> > transceiver that fits the format expected by netty/avro(in fact it only
> has
> > one transceiver... HTTPTransceiver).
> >
> > To address this, I'm thinking of putting together a thrift source(the
> > legacy source doesn't seem to be usable as it returns nothing, and lacks
> > batching). Does this seem like a reasonable solution to making it
> possible
> > to send data to flume from other languages(and allowing backoff on
> > failure?). Historically, what made us move away from thrift to avro?
> >
> >
> > On 07/30/2012 05:34 PM, Juhani Connolly wrote:
> >
> >> I'm playing around with making a standalone tail client in python(so
> that
> >> I can access inode data) that tracks position in a file and then sends
> it
> >> across avro to an avro sink.
> >>
> >> However I'm having issues with the avro part of this and wondering if
> >> anyone more familiar with it could help.
> >>
> >> I took the flume.avdl file and converted it using "java -jar
> >> ~/Downloads/avro-tools-1.6.3.**jar idl flume.avdl flume.avpr"
> >>
> >> I then run it through a simple test program to see if its sending the
> >> data correctly and it sends from the python client fine, but the sink
> end
> >> OOM's because presumably the wire format is wrong:
> >>
> >> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
> >> 172.22.114.32:55671 => /172.28.19.112:41414] OPEN
> >> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
> >> 172.22.114.32:55671 => /172.28.19.112:41414] BOUND: /
> 172.28.19.112:41414
> >> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
> >> 172.22.114.32:55671 => /172.28.19.112:41414] CONNECTED: /
> >> 172.22.114.32:55671
> >> 2012-07-30 17:22:57,646 WARN ipc.NettyServer: Unexpected exception from
> >> downstream.
> >> java.lang.OutOfMemoryError: Java heap space
> >>         at java.util.ArrayList.<init>(**ArrayList.java:112)
> >>         at
> org.apache.avro.ipc.**NettyTransportCodec$**NettyFrameDecoder.
> >> **decodePackHeader(**NettyTransportCodec.java:154)
> >>         at org.apache.avro.ipc.**NettyTransportCodec$**
> >> NettyFrameDecoder.decode(**NettyTransportCodec.java:131)
> >>         at
> org.jboss.netty.handler.codec.**frame.FrameDecoder.callDecode(
> >> **FrameDecoder.java:282)
> >>         at org.jboss.netty.handler.codec.**frame.FrameDecoder.**
> >> messageReceived(FrameDecoder.**java:216)
> >>         at org.jboss.netty.channel.**Channels.fireMessageReceived(**
> >> Channels.java:274)
> >>         at org.jboss.netty.channel.**Channels.fireMessageReceived(**
> >> Channels.java:261)
> >>         at org.jboss.netty.channel.**socket.nio.NioWorker.read(**
> >> NioWorker.java:351)
> >>         at org.jboss.netty.channel.**socket.nio.NioWorker.**
> >> processSelectedKeys(NioWorker.**java:282)
> >>         at org.jboss.netty.channel.**socket.nio.NioWorker.run(**
> >> NioWorker.java:202)
> >>         at java.util.concurrent.**ThreadPoolExecutor$Worker.**
> >> runTask(ThreadPoolExecutor.**java:886)
> >>         at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
> >> ThreadPoolExecutor.java:908)
> >>         at java.lang.Thread.run(Thread.**java:619)
> >> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, /
> >> 172.22.114.32:55671 :> /172.28.19.112:41414] DISCONNECTED
> >> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, /
> >> 172.22.114.32:55671 :> /172.28.19.112:41414] UNBOUND
> >> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, /
> >> 172.22.114.32:55671 :> /172.28.19.112:41414] CLOSED
> >>
> >> I've dumped the test program and its output
> >>
> >> http://pastebin.com/1DtXZyTu
> >> http://pastebin.com/T9kaqKHY
> >>
> >>
> >
>
>
> --
> Apache MRUnit - Unit testing MapReduce -
> http://incubator.apache.org/mrunit/
>

Re: Sending avro data from other languages

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.
Sure, I was just leaving it alone as Danny had tagged you, assuming 
there was some reason.

On 08/06/2012 10:31 AM, Hari Shreedharan wrote:
> Juhani,
>
> Since you are familiar with scribe, it would be great I you could review
> and commit the scribe source.
>
> I took a look but I am not at all familiar with the way scribe works or
> it's protocol to justify a review.
>
>
> Thanks
> Hari
>
> On Sunday, August 5, 2012, Juhani Connolly wrote:
>
>> That would be awesome. I was going to write a thrift source, but Danny's
>> contribution of the scribe source covers our use case really well(since our
>> current data source feeds data in the scribe format. I was in the process
>> of writing a flume transport for it when I realized the avro shortcomings).
>>
>> On 08/03/2012 09:49 PM, Brock Noland wrote:
>>
>> Yeah I agree. FWIW, I am hoping in few weeks I will have a little more
>> spare time and I was planning on writing the Avro patches to ensure
>> languages such as Python, C#, etc could write messages to Flume.
>>
>> On Fri, Aug 3, 2012 at 1:30 AM, Juhani Connolly <
>> juhani_connolly@cyberagent.co.jp> wrote:
>>
>>   On paper it certainly seems like a good solution, it's just unfortunate
>> that some "supported" languages can't actually interface to it. I
>> understand that thrift can be quite a nuisance to deal with at times.
>>
>>
>> On 08/02/2012 11:01 PM, Brock Noland wrote:
>>
>>   I cannot answer what made us move to Avro. However, I prefer Avro because
>> you don't have to build the thrift compiler and you aren't required to do
>> code generation.
>>
>> On Wed, Aug 1, 2012 at 11:06 PM, Juhani Connolly <
>> juhani_connolly@cyberagent.co.****jp <ju...@cyberagent.co.jp>>
>> wrote:
>>
>>    It looks to me like this was because of the transceiver I was using.
>>
>> Unfortunately it seems like avro doesn't have a python implementation of
>> a
>> transceiver that fits the format expected by netty/avro(in fact it only
>> has
>> one transceiver... HTTPTransceiver).
>>
>> To address this, I'm thinking of putting together a thrift source(the
>> legacy source doesn't seem to be usable as it returns nothing, and lacks
>> batching). Does this seem like a reasonable solution to making it
>> possible
>> to send data to flume from other languages(and allowing backoff on
>> failure?). Historically, what made us move away from thrift to avro?
>>
>>
>> On 07/30/2012 05:34 PM, Juhani Connolly wrote:
>>
>>    I'm playing around with making a standalone tail client in python(so
>>
>> that
>> I can access inode data) that tracks position in a file and then sends
>> it
>> across avro to an avro sink.
>>
>> However I'm having issues with the avro part of this and wondering if
>> anyone more familiar with it could help.
>>
>> I took the flume.avdl file and converted it using "java -jar
>> ~/Downloads/avro-tools-1.6.3.******jar idl flume.avdl flume.avpr"
>>
>>
>> I then run it through a simple test program to see if its sending the
>> data correctly and it sends from the python client fine, but the sink
>> end
>> OOM's because presumably the wire format is wrong:
>>
>> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>> 172.22.114.32:55671 => /172.28.19.112:41414] OPEN
>> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>> 172.22.114.32:55671 => /172.28.19.112:41414] BOUND: /
>> 172.28.19.112:41414
>> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>> 172.22.114.32:55671 => /172.28.19.112:41414] CONNECTED: /
>> 172.22.114.32:55671
>> 2012-07-30 17:22:57,646 WARN ipc.NettyServer: Unexpected exception from
>> downstream.
>> java.lang.OutOfMemoryError: Java heap space
>>            at java.util.ArrayList.<init>(******ArrayList.java:112)
>>            at org.apache.avro.ipc.******NettyTransportCodec$****
>> NettyFrameDecoder.
>> **decodePackHeader(******NettyTransportCodec.java:154)
>>            at org.apache.avro.ipc.******NettyTransportCodec$**
>> NettyFrameDecoder.decode(******NettyTransportCodec.java:131)
>>            at org.jboss.netty.handler.codec.******frame.FrameDecoder.**
>> callDecode(
>> **FrameDecoder.java:282)
>>            at org.jboss.netty.handler.codec.******frame.FrameDecoder.**
>> messageReceived(FrameDecoder.******java:216)
>>            at org.jboss.netty.channel.******Channels.fireMessageReceived(**
>> **
>> **
>> Channels.java:274)
>>            at org.jboss.netty.channel.******Channels.fireMessageReceived(**
>> **
>> **
>> Channels.java:26
>>
>>


Re: Sending avro data from other languages

Posted by Hari Shreedharan <hs...@cloudera.com>.
Juhani,

Since you are familiar with scribe, it would be great I you could review
and commit the scribe source.

I took a look but I am not at all familiar with the way scribe works or
it's protocol to justify a review.


Thanks
Hari

On Sunday, August 5, 2012, Juhani Connolly wrote:

> That would be awesome. I was going to write a thrift source, but Danny's
> contribution of the scribe source covers our use case really well(since our
> current data source feeds data in the scribe format. I was in the process
> of writing a flume transport for it when I realized the avro shortcomings).
>
> On 08/03/2012 09:49 PM, Brock Noland wrote:
>
> Yeah I agree. FWIW, I am hoping in few weeks I will have a little more
> spare time and I was planning on writing the Avro patches to ensure
> languages such as Python, C#, etc could write messages to Flume.
>
> On Fri, Aug 3, 2012 at 1:30 AM, Juhani Connolly <
> juhani_connolly@cyberagent.co.jp> wrote:
>
>  On paper it certainly seems like a good solution, it's just unfortunate
> that some "supported" languages can't actually interface to it. I
> understand that thrift can be quite a nuisance to deal with at times.
>
>
> On 08/02/2012 11:01 PM, Brock Noland wrote:
>
>  I cannot answer what made us move to Avro. However, I prefer Avro because
> you don't have to build the thrift compiler and you aren't required to do
> code generation.
>
> On Wed, Aug 1, 2012 at 11:06 PM, Juhani Connolly <
> juhani_connolly@cyberagent.co.****jp <ju...@cyberagent.co.jp>>
> wrote:
>
>   It looks to me like this was because of the transceiver I was using.
>
> Unfortunately it seems like avro doesn't have a python implementation of
> a
> transceiver that fits the format expected by netty/avro(in fact it only
> has
> one transceiver... HTTPTransceiver).
>
> To address this, I'm thinking of putting together a thrift source(the
> legacy source doesn't seem to be usable as it returns nothing, and lacks
> batching). Does this seem like a reasonable solution to making it
> possible
> to send data to flume from other languages(and allowing backoff on
> failure?). Historically, what made us move away from thrift to avro?
>
>
> On 07/30/2012 05:34 PM, Juhani Connolly wrote:
>
>   I'm playing around with making a standalone tail client in python(so
>
> that
> I can access inode data) that tracks position in a file and then sends
> it
> across avro to an avro sink.
>
> However I'm having issues with the avro part of this and wondering if
> anyone more familiar with it could help.
>
> I took the flume.avdl file and converted it using "java -jar
> ~/Downloads/avro-tools-1.6.3.******jar idl flume.avdl flume.avpr"
>
>
> I then run it through a simple test program to see if its sending the
> data correctly and it sends from the python client fine, but the sink
> end
> OOM's because presumably the wire format is wrong:
>
> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
> 172.22.114.32:55671 => /172.28.19.112:41414] OPEN
> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
> 172.22.114.32:55671 => /172.28.19.112:41414] BOUND: /
> 172.28.19.112:41414
> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
> 172.22.114.32:55671 => /172.28.19.112:41414] CONNECTED: /
> 172.22.114.32:55671
> 2012-07-30 17:22:57,646 WARN ipc.NettyServer: Unexpected exception from
> downstream.
> java.lang.OutOfMemoryError: Java heap space
>           at java.util.ArrayList.<init>(******ArrayList.java:112)
>           at org.apache.avro.ipc.******NettyTransportCodec$****
> NettyFrameDecoder.
> **decodePackHeader(******NettyTransportCodec.java:154)
>           at org.apache.avro.ipc.******NettyTransportCodec$**
> NettyFrameDecoder.decode(******NettyTransportCodec.java:131)
>           at org.jboss.netty.handler.codec.******frame.FrameDecoder.**
> callDecode(
> **FrameDecoder.java:282)
>           at org.jboss.netty.handler.codec.******frame.FrameDecoder.**
> messageReceived(FrameDecoder.******java:216)
>           at org.jboss.netty.channel.******Channels.fireMessageReceived(**
> **
> **
> Channels.java:274)
>           at org.jboss.netty.channel.******Channels.fireMessageReceived(**
> **
> **
> Channels.java:26
>
>

Re: Sending avro data from other languages

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.
That would be awesome. I was going to write a thrift source, but Danny's 
contribution of the scribe source covers our use case really well(since 
our current data source feeds data in the scribe format. I was in the 
process of writing a flume transport for it when I realized the avro 
shortcomings).

On 08/03/2012 09:49 PM, Brock Noland wrote:
> Yeah I agree. FWIW, I am hoping in few weeks I will have a little more
> spare time and I was planning on writing the Avro patches to ensure
> languages such as Python, C#, etc could write messages to Flume.
>
> On Fri, Aug 3, 2012 at 1:30 AM, Juhani Connolly <
> juhani_connolly@cyberagent.co.jp> wrote:
>
>> On paper it certainly seems like a good solution, it's just unfortunate
>> that some "supported" languages can't actually interface to it. I
>> understand that thrift can be quite a nuisance to deal with at times.
>>
>>
>> On 08/02/2012 11:01 PM, Brock Noland wrote:
>>
>>> I cannot answer what made us move to Avro. However, I prefer Avro because
>>> you don't have to build the thrift compiler and you aren't required to do
>>> code generation.
>>>
>>> On Wed, Aug 1, 2012 at 11:06 PM, Juhani Connolly <
>>> juhani_connolly@cyberagent.co.**jp <ju...@cyberagent.co.jp>>
>>> wrote:
>>>
>>>   It looks to me like this was because of the transceiver I was using.
>>>> Unfortunately it seems like avro doesn't have a python implementation of
>>>> a
>>>> transceiver that fits the format expected by netty/avro(in fact it only
>>>> has
>>>> one transceiver... HTTPTransceiver).
>>>>
>>>> To address this, I'm thinking of putting together a thrift source(the
>>>> legacy source doesn't seem to be usable as it returns nothing, and lacks
>>>> batching). Does this seem like a reasonable solution to making it
>>>> possible
>>>> to send data to flume from other languages(and allowing backoff on
>>>> failure?). Historically, what made us move away from thrift to avro?
>>>>
>>>>
>>>> On 07/30/2012 05:34 PM, Juhani Connolly wrote:
>>>>
>>>>   I'm playing around with making a standalone tail client in python(so
>>>>> that
>>>>> I can access inode data) that tracks position in a file and then sends
>>>>> it
>>>>> across avro to an avro sink.
>>>>>
>>>>> However I'm having issues with the avro part of this and wondering if
>>>>> anyone more familiar with it could help.
>>>>>
>>>>> I took the flume.avdl file and converted it using "java -jar
>>>>> ~/Downloads/avro-tools-1.6.3.****jar idl flume.avdl flume.avpr"
>>>>>
>>>>>
>>>>> I then run it through a simple test program to see if its sending the
>>>>> data correctly and it sends from the python client fine, but the sink
>>>>> end
>>>>> OOM's because presumably the wire format is wrong:
>>>>>
>>>>> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>>>> 172.22.114.32:55671 => /172.28.19.112:41414] OPEN
>>>>> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>>>> 172.22.114.32:55671 => /172.28.19.112:41414] BOUND: /
>>>>> 172.28.19.112:41414
>>>>> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>>>> 172.22.114.32:55671 => /172.28.19.112:41414] CONNECTED: /
>>>>> 172.22.114.32:55671
>>>>> 2012-07-30 17:22:57,646 WARN ipc.NettyServer: Unexpected exception from
>>>>> downstream.
>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>           at java.util.ArrayList.<init>(****ArrayList.java:112)
>>>>>           at org.apache.avro.ipc.****NettyTransportCodec$****
>>>>> NettyFrameDecoder.
>>>>> **decodePackHeader(****NettyTransportCodec.java:154)
>>>>>           at org.apache.avro.ipc.****NettyTransportCodec$**
>>>>> NettyFrameDecoder.decode(****NettyTransportCodec.java:131)
>>>>>           at org.jboss.netty.handler.codec.****frame.FrameDecoder.**
>>>>> callDecode(
>>>>> **FrameDecoder.java:282)
>>>>>           at org.jboss.netty.handler.codec.****frame.FrameDecoder.**
>>>>> messageReceived(FrameDecoder.****java:216)
>>>>>           at org.jboss.netty.channel.****Channels.fireMessageReceived(**
>>>>> **
>>>>> Channels.java:274)
>>>>>           at org.jboss.netty.channel.****Channels.fireMessageReceived(**
>>>>> **
>>>>> Channels.java:261)
>>>>>           at org.jboss.netty.channel.****socket.nio.NioWorker.read(**
>>>>> NioWorker.java:351)
>>>>>           at org.jboss.netty.channel.****socket.nio.NioWorker.**
>>>>> processSelectedKeys(NioWorker.****java:282)
>>>>>           at org.jboss.netty.channel.****socket.nio.NioWorker.run(**
>>>>> NioWorker.java:202)
>>>>>           at java.util.concurrent.****ThreadPoolExecutor$Worker.**
>>>>> runTask(ThreadPoolExecutor.****java:886)
>>>>>           at java.util.concurrent.****ThreadPoolExecutor$Worker.run(****
>>>>> ThreadPoolExecutor.java:908)
>>>>>           at java.lang.Thread.run(Thread.****java:619)
>>>>>
>>>>> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>>>> 172.22.114.32:55671 :> /172.28.19.112:41414] DISCONNECTED
>>>>> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>>>> 172.22.114.32:55671 :> /172.28.19.112:41414] UNBOUND
>>>>> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>>>> 172.22.114.32:55671 :> /172.28.19.112:41414] CLOSED
>>>>>
>>>>> I've dumped the test program and its output
>>>>>
>>>>> http://pastebin.com/1DtXZyTu
>>>>> http://pastebin.com/T9kaqKHY
>>>>>
>>>>>
>>>>>
>


Re: Sending avro data from other languages

Posted by Brock Noland <br...@cloudera.com>.
Yeah I agree. FWIW, I am hoping in few weeks I will have a little more
spare time and I was planning on writing the Avro patches to ensure
languages such as Python, C#, etc could write messages to Flume.

On Fri, Aug 3, 2012 at 1:30 AM, Juhani Connolly <
juhani_connolly@cyberagent.co.jp> wrote:

> On paper it certainly seems like a good solution, it's just unfortunate
> that some "supported" languages can't actually interface to it. I
> understand that thrift can be quite a nuisance to deal with at times.
>
>
> On 08/02/2012 11:01 PM, Brock Noland wrote:
>
>> I cannot answer what made us move to Avro. However, I prefer Avro because
>> you don't have to build the thrift compiler and you aren't required to do
>> code generation.
>>
>> On Wed, Aug 1, 2012 at 11:06 PM, Juhani Connolly <
>> juhani_connolly@cyberagent.co.**jp <ju...@cyberagent.co.jp>>
>> wrote:
>>
>>  It looks to me like this was because of the transceiver I was using.
>>>
>>> Unfortunately it seems like avro doesn't have a python implementation of
>>> a
>>> transceiver that fits the format expected by netty/avro(in fact it only
>>> has
>>> one transceiver... HTTPTransceiver).
>>>
>>> To address this, I'm thinking of putting together a thrift source(the
>>> legacy source doesn't seem to be usable as it returns nothing, and lacks
>>> batching). Does this seem like a reasonable solution to making it
>>> possible
>>> to send data to flume from other languages(and allowing backoff on
>>> failure?). Historically, what made us move away from thrift to avro?
>>>
>>>
>>> On 07/30/2012 05:34 PM, Juhani Connolly wrote:
>>>
>>>  I'm playing around with making a standalone tail client in python(so
>>>> that
>>>> I can access inode data) that tracks position in a file and then sends
>>>> it
>>>> across avro to an avro sink.
>>>>
>>>> However I'm having issues with the avro part of this and wondering if
>>>> anyone more familiar with it could help.
>>>>
>>>> I took the flume.avdl file and converted it using "java -jar
>>>> ~/Downloads/avro-tools-1.6.3.****jar idl flume.avdl flume.avpr"
>>>>
>>>>
>>>> I then run it through a simple test program to see if its sending the
>>>> data correctly and it sends from the python client fine, but the sink
>>>> end
>>>> OOM's because presumably the wire format is wrong:
>>>>
>>>> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>>> 172.22.114.32:55671 => /172.28.19.112:41414] OPEN
>>>> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>>> 172.22.114.32:55671 => /172.28.19.112:41414] BOUND: /
>>>> 172.28.19.112:41414
>>>> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>>> 172.22.114.32:55671 => /172.28.19.112:41414] CONNECTED: /
>>>> 172.22.114.32:55671
>>>> 2012-07-30 17:22:57,646 WARN ipc.NettyServer: Unexpected exception from
>>>> downstream.
>>>> java.lang.OutOfMemoryError: Java heap space
>>>>          at java.util.ArrayList.<init>(****ArrayList.java:112)
>>>>          at org.apache.avro.ipc.****NettyTransportCodec$****
>>>> NettyFrameDecoder.
>>>> **decodePackHeader(****NettyTransportCodec.java:154)
>>>>          at org.apache.avro.ipc.****NettyTransportCodec$**
>>>> NettyFrameDecoder.decode(****NettyTransportCodec.java:131)
>>>>          at org.jboss.netty.handler.codec.****frame.FrameDecoder.**
>>>> callDecode(
>>>> **FrameDecoder.java:282)
>>>>          at org.jboss.netty.handler.codec.****frame.FrameDecoder.**
>>>> messageReceived(FrameDecoder.****java:216)
>>>>          at org.jboss.netty.channel.****Channels.fireMessageReceived(**
>>>> **
>>>> Channels.java:274)
>>>>          at org.jboss.netty.channel.****Channels.fireMessageReceived(**
>>>> **
>>>> Channels.java:261)
>>>>          at org.jboss.netty.channel.****socket.nio.NioWorker.read(**
>>>> NioWorker.java:351)
>>>>          at org.jboss.netty.channel.****socket.nio.NioWorker.**
>>>> processSelectedKeys(NioWorker.****java:282)
>>>>          at org.jboss.netty.channel.****socket.nio.NioWorker.run(**
>>>> NioWorker.java:202)
>>>>          at java.util.concurrent.****ThreadPoolExecutor$Worker.**
>>>> runTask(ThreadPoolExecutor.****java:886)
>>>>          at java.util.concurrent.****ThreadPoolExecutor$Worker.run(****
>>>> ThreadPoolExecutor.java:908)
>>>>          at java.lang.Thread.run(Thread.****java:619)
>>>>
>>>> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>>> 172.22.114.32:55671 :> /172.28.19.112:41414] DISCONNECTED
>>>> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>>> 172.22.114.32:55671 :> /172.28.19.112:41414] UNBOUND
>>>> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>>> 172.22.114.32:55671 :> /172.28.19.112:41414] CLOSED
>>>>
>>>> I've dumped the test program and its output
>>>>
>>>> http://pastebin.com/1DtXZyTu
>>>> http://pastebin.com/T9kaqKHY
>>>>
>>>>
>>>>
>>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Sending avro data from other languages

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.
On paper it certainly seems like a good solution, it's just unfortunate 
that some "supported" languages can't actually interface to it. I 
understand that thrift can be quite a nuisance to deal with at times.

On 08/02/2012 11:01 PM, Brock Noland wrote:
> I cannot answer what made us move to Avro. However, I prefer Avro because
> you don't have to build the thrift compiler and you aren't required to do
> code generation.
>
> On Wed, Aug 1, 2012 at 11:06 PM, Juhani Connolly <
> juhani_connolly@cyberagent.co.jp> wrote:
>
>> It looks to me like this was because of the transceiver I was using.
>>
>> Unfortunately it seems like avro doesn't have a python implementation of a
>> transceiver that fits the format expected by netty/avro(in fact it only has
>> one transceiver... HTTPTransceiver).
>>
>> To address this, I'm thinking of putting together a thrift source(the
>> legacy source doesn't seem to be usable as it returns nothing, and lacks
>> batching). Does this seem like a reasonable solution to making it possible
>> to send data to flume from other languages(and allowing backoff on
>> failure?). Historically, what made us move away from thrift to avro?
>>
>>
>> On 07/30/2012 05:34 PM, Juhani Connolly wrote:
>>
>>> I'm playing around with making a standalone tail client in python(so that
>>> I can access inode data) that tracks position in a file and then sends it
>>> across avro to an avro sink.
>>>
>>> However I'm having issues with the avro part of this and wondering if
>>> anyone more familiar with it could help.
>>>
>>> I took the flume.avdl file and converted it using "java -jar
>>> ~/Downloads/avro-tools-1.6.3.**jar idl flume.avdl flume.avpr"
>>>
>>> I then run it through a simple test program to see if its sending the
>>> data correctly and it sends from the python client fine, but the sink end
>>> OOM's because presumably the wire format is wrong:
>>>
>>> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>> 172.22.114.32:55671 => /172.28.19.112:41414] OPEN
>>> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>> 172.22.114.32:55671 => /172.28.19.112:41414] BOUND: /172.28.19.112:41414
>>> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>> 172.22.114.32:55671 => /172.28.19.112:41414] CONNECTED: /
>>> 172.22.114.32:55671
>>> 2012-07-30 17:22:57,646 WARN ipc.NettyServer: Unexpected exception from
>>> downstream.
>>> java.lang.OutOfMemoryError: Java heap space
>>>          at java.util.ArrayList.<init>(**ArrayList.java:112)
>>>          at org.apache.avro.ipc.**NettyTransportCodec$**NettyFrameDecoder.
>>> **decodePackHeader(**NettyTransportCodec.java:154)
>>>          at org.apache.avro.ipc.**NettyTransportCodec$**
>>> NettyFrameDecoder.decode(**NettyTransportCodec.java:131)
>>>          at org.jboss.netty.handler.codec.**frame.FrameDecoder.callDecode(
>>> **FrameDecoder.java:282)
>>>          at org.jboss.netty.handler.codec.**frame.FrameDecoder.**
>>> messageReceived(FrameDecoder.**java:216)
>>>          at org.jboss.netty.channel.**Channels.fireMessageReceived(**
>>> Channels.java:274)
>>>          at org.jboss.netty.channel.**Channels.fireMessageReceived(**
>>> Channels.java:261)
>>>          at org.jboss.netty.channel.**socket.nio.NioWorker.read(**
>>> NioWorker.java:351)
>>>          at org.jboss.netty.channel.**socket.nio.NioWorker.**
>>> processSelectedKeys(NioWorker.**java:282)
>>>          at org.jboss.netty.channel.**socket.nio.NioWorker.run(**
>>> NioWorker.java:202)
>>>          at java.util.concurrent.**ThreadPoolExecutor$Worker.**
>>> runTask(ThreadPoolExecutor.**java:886)
>>>          at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
>>> ThreadPoolExecutor.java:908)
>>>          at java.lang.Thread.run(Thread.**java:619)
>>> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>> 172.22.114.32:55671 :> /172.28.19.112:41414] DISCONNECTED
>>> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>> 172.22.114.32:55671 :> /172.28.19.112:41414] UNBOUND
>>> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>>> 172.22.114.32:55671 :> /172.28.19.112:41414] CLOSED
>>>
>>> I've dumped the test program and its output
>>>
>>> http://pastebin.com/1DtXZyTu
>>> http://pastebin.com/T9kaqKHY
>>>
>>>
>


Re: Sending avro data from other languages

Posted by Brock Noland <br...@cloudera.com>.
I cannot answer what made us move to Avro. However, I prefer Avro because
you don't have to build the thrift compiler and you aren't required to do
code generation.

On Wed, Aug 1, 2012 at 11:06 PM, Juhani Connolly <
juhani_connolly@cyberagent.co.jp> wrote:

> It looks to me like this was because of the transceiver I was using.
>
> Unfortunately it seems like avro doesn't have a python implementation of a
> transceiver that fits the format expected by netty/avro(in fact it only has
> one transceiver... HTTPTransceiver).
>
> To address this, I'm thinking of putting together a thrift source(the
> legacy source doesn't seem to be usable as it returns nothing, and lacks
> batching). Does this seem like a reasonable solution to making it possible
> to send data to flume from other languages(and allowing backoff on
> failure?). Historically, what made us move away from thrift to avro?
>
>
> On 07/30/2012 05:34 PM, Juhani Connolly wrote:
>
>> I'm playing around with making a standalone tail client in python(so that
>> I can access inode data) that tracks position in a file and then sends it
>> across avro to an avro sink.
>>
>> However I'm having issues with the avro part of this and wondering if
>> anyone more familiar with it could help.
>>
>> I took the flume.avdl file and converted it using "java -jar
>> ~/Downloads/avro-tools-1.6.3.**jar idl flume.avdl flume.avpr"
>>
>> I then run it through a simple test program to see if its sending the
>> data correctly and it sends from the python client fine, but the sink end
>> OOM's because presumably the wire format is wrong:
>>
>> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>> 172.22.114.32:55671 => /172.28.19.112:41414] OPEN
>> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>> 172.22.114.32:55671 => /172.28.19.112:41414] BOUND: /172.28.19.112:41414
>> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>> 172.22.114.32:55671 => /172.28.19.112:41414] CONNECTED: /
>> 172.22.114.32:55671
>> 2012-07-30 17:22:57,646 WARN ipc.NettyServer: Unexpected exception from
>> downstream.
>> java.lang.OutOfMemoryError: Java heap space
>>         at java.util.ArrayList.<init>(**ArrayList.java:112)
>>         at org.apache.avro.ipc.**NettyTransportCodec$**NettyFrameDecoder.
>> **decodePackHeader(**NettyTransportCodec.java:154)
>>         at org.apache.avro.ipc.**NettyTransportCodec$**
>> NettyFrameDecoder.decode(**NettyTransportCodec.java:131)
>>         at org.jboss.netty.handler.codec.**frame.FrameDecoder.callDecode(
>> **FrameDecoder.java:282)
>>         at org.jboss.netty.handler.codec.**frame.FrameDecoder.**
>> messageReceived(FrameDecoder.**java:216)
>>         at org.jboss.netty.channel.**Channels.fireMessageReceived(**
>> Channels.java:274)
>>         at org.jboss.netty.channel.**Channels.fireMessageReceived(**
>> Channels.java:261)
>>         at org.jboss.netty.channel.**socket.nio.NioWorker.read(**
>> NioWorker.java:351)
>>         at org.jboss.netty.channel.**socket.nio.NioWorker.**
>> processSelectedKeys(NioWorker.**java:282)
>>         at org.jboss.netty.channel.**socket.nio.NioWorker.run(**
>> NioWorker.java:202)
>>         at java.util.concurrent.**ThreadPoolExecutor$Worker.**
>> runTask(ThreadPoolExecutor.**java:886)
>>         at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
>> ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.**java:619)
>> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>> 172.22.114.32:55671 :> /172.28.19.112:41414] DISCONNECTED
>> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>> 172.22.114.32:55671 :> /172.28.19.112:41414] UNBOUND
>> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, /
>> 172.22.114.32:55671 :> /172.28.19.112:41414] CLOSED
>>
>> I've dumped the test program and its output
>>
>> http://pastebin.com/1DtXZyTu
>> http://pastebin.com/T9kaqKHY
>>
>>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Sending avro data from other languages

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.
It looks to me like this was because of the transceiver I was using.

Unfortunately it seems like avro doesn't have a python implementation of 
a transceiver that fits the format expected by netty/avro(in fact it 
only has one transceiver... HTTPTransceiver).

To address this, I'm thinking of putting together a thrift source(the 
legacy source doesn't seem to be usable as it returns nothing, and lacks 
batching). Does this seem like a reasonable solution to making it 
possible to send data to flume from other languages(and allowing backoff 
on failure?). Historically, what made us move away from thrift to avro?

On 07/30/2012 05:34 PM, Juhani Connolly wrote:
> I'm playing around with making a standalone tail client in python(so 
> that I can access inode data) that tracks position in a file and then 
> sends it across avro to an avro sink.
>
> However I'm having issues with the avro part of this and wondering if 
> anyone more familiar with it could help.
>
> I took the flume.avdl file and converted it using "java -jar 
> ~/Downloads/avro-tools-1.6.3.jar idl flume.avdl flume.avpr"
>
> I then run it through a simple test program to see if its sending the 
> data correctly and it sends from the python client fine, but the sink 
> end OOM's because presumably the wire format is wrong:
>
> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, 
> /172.22.114.32:55671 => /172.28.19.112:41414] OPEN
> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, 
> /172.22.114.32:55671 => /172.28.19.112:41414] BOUND: /172.28.19.112:41414
> 2012-07-30 17:22:57,565 INFO ipc.NettyServer: [id: 0x5fc6e818, 
> /172.22.114.32:55671 => /172.28.19.112:41414] CONNECTED: 
> /172.22.114.32:55671
> 2012-07-30 17:22:57,646 WARN ipc.NettyServer: Unexpected exception 
> from downstream.
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.ArrayList.<init>(ArrayList.java:112)
>         at 
> org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decodePackHeader(NettyTransportCodec.java:154)
>         at 
> org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decode(NettyTransportCodec.java:131)
>         at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:282)
>         at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
>         at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
>         at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
>         at 
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:351)
>         at 
> org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:282)
>         at 
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:202)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, 
> /172.22.114.32:55671 :> /172.28.19.112:41414] DISCONNECTED
> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, 
> /172.22.114.32:55671 :> /172.28.19.112:41414] UNBOUND
> 2012-07-30 17:22:57,647 INFO ipc.NettyServer: [id: 0x5fc6e818, 
> /172.22.114.32:55671 :> /172.28.19.112:41414] CLOSED
>
> I've dumped the test program and its output
>
> http://pastebin.com/1DtXZyTu
> http://pastebin.com/T9kaqKHY
>