You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Bart Verwilst <li...@verwilst.be> on 2012/11/08 19:45:12 UTC

Using Python and Flume to store avro data

Hi,

I've been spending quite a few hours trying to push avro data to Flume 
so i can store it on HDFS, this all with Python.
It seems like something that is impossible for now, since the only  way 
to push avro data to Flume is by the use of deprecated thrift binding 
that look pretty cumbersome to get working.
I would like to know what's the best way to import avro data into Flume 
with Python? Maybe Flume isnt the right tool and I should use something 
else? My goal is to have multiple python workers pushing data to HDFS 
which ( by means of Flume in this case ) consolidates this all in 1 file 
there.

Any thoughts?

Thanks!

Bart


Re: Using Python and Flume to store avro data

Posted by Bart Verwilst <li...@verwilst.be>.
 

Hello, 

You send avro to Flume, but how is it stored? I would like
to have avro files as a result in HDFS, not sequencefiles containing
json or something. Not sure if that's possible? Basically and
conceptually, I want to query my MySQL, and write that data to AVRO
files in HDFS. I can't use sqoop because for every row of table X, if
have an extra array of rows from table Y that are included in the same
avro record. The idea is to create a pretty continuous flow from MySQL
into HDFS. 

This is how i would like to store it in HDFS ( avro schema
): 

{
 "type": "record",
 "name": "trace",
 "namespace": "asp",

"fields": [
 { "name": "id" , "type": "long" },
 { "name": "timestamp" ,
"type": "long" },
 { "name": "terminalid", "type": "int" },
 { "name":
"mileage", "type": ["int","null"] },
 { "name": "creationtime", "type":
"long" },
 { "name": "type", "type": "int" },
 { "name": "properties",
"type": {
 "type": "array",
 "items": {
 "name": "property",
 "type":
"record",
 "fields": [
 { "name": "id", "type": "long" },
 { "name":
"value", "type": "string" },
 { "name": "key", "type": "string" },
 ]

}
 }
 } 
 ]
} 

How do you suggest i go about this ( knowing my Java foo
is very limited ;) )? 

Thanks! 

Kind regards, 

Bart 

Andrew Jones
schreef op 13.11.2012 10:28: 

> We also use Thrift to send from
multiple languages, but have written a custom source to accept the
messages. 
> 
> Writing a custom source was quite easy. Start by looking
at the code for ThriftLegacySource and AvroSource. 
> 
> Andrew 
> 
> On
12 November 2012 19:52, Camp, Roy <rc...@ebay.com> wrote:
> 
>> We use
thrift to send from Python, PHP & Java. Unfortunately with Flume-NG you
must use the legacyThrift source which works well but does not handle a
confirmation/ack back to the app. We have found that failures usually
result in connection exception thus allowing us to reconnect and retry
so we have virtually no data loss. Everything downstream from that
localhost Flume instance (after written to the file channel) is E2E
safe.
>> 
>> Roy
>> 
>> -----Original Message-----
>> From: Juhani
Connolly [mailto:juhani_connolly@cyberagent.co.jp]
>> Sent: Thursday,
November 08, 2012 5:46 PM
>> To: user@flume.apache.org
>> Subject: Re:
Using Python and Flume to store avro data
>> 
>> Hi Bart,
>> 
>> we send
data from python to the scribe source and it works fine. We had
everything set up in scribe before which made the switchover simple. If
you don't mind the extra overhead of http, go for that, but if you want
to keep things to a minimum, using the scribe source can be viable.
>>

>> You can't send data to avro because the python support in avro is
missing the appropriate encoder(I can't remember what it was, I'd have
to check over the code again)
>> 
>> On 11/09/2012 03:45 AM, Bart
Verwilst wrote:
>> > Hi,
>> >
>> > I've been spending quite a few hours
trying to push avro data to Flume
>> > so i can store it on HDFS, this
all with Python.
>> > It seems like something that is impossible for
now, since the only way
>> > to push avro data to Flume is by the use of
deprecated thrift binding
>> > that look pretty cumbersome to get
working.
>> > I would like to know what's the best way to import avro
data into
>> > Flume with Python? Maybe Flume isnt the right tool and I
should use
>> > something else? My goal is to have multiple python
workers pushing
>> > data to HDFS which ( by means of Flume in this case
) consolidates
>> > this all in 1 file there.
>> >
>> > Any thoughts?
>>
>
>> > Thanks!
>> >
>> > Bart
>> >
>> >
 

Re: Using Python and Flume to store avro data

Posted by Andrew Jones <an...@gmail.com>.
We also use Thrift to send from multiple languages, but have written a
custom source to accept the messages.

Writing a custom source was quite easy. Start by looking at the code
for ThriftLegacySource and AvroSource.

Andrew


On 12 November 2012 19:52, Camp, Roy <rc...@ebay.com> wrote:

> We use thrift to send from Python, PHP & Java.  Unfortunately with
> Flume-NG you must use the legacyThrift source which works well but does not
> handle a confirmation/ack back to the app.  We have found that failures
> usually result in connection exception thus allowing us to reconnect and
> retry so we have virtually no data loss. Everything downstream from that
> localhost Flume instance (after written to the file channel) is E2E safe.
>
> Roy
>
>
> -----Original Message-----
> From: Juhani Connolly [mailto:juhani_connolly@cyberagent.co.jp]
> Sent: Thursday, November 08, 2012 5:46 PM
> To: user@flume.apache.org
> Subject: Re: Using Python and Flume to store avro data
>
> Hi Bart,
>
> we send data  from python to the scribe source and it works fine. We had
> everything set up in scribe before which made the switchover simple. If you
> don't mind the extra overhead of http, go for that, but if you want to keep
> things to a minimum, using the scribe source can be viable.
>
> You can't send data to avro because the python support in avro is missing
> the appropriate encoder(I can't remember what it was, I'd have to check
> over the code again)
>
> On 11/09/2012 03:45 AM, Bart Verwilst wrote:
> > Hi,
> >
> > I've been spending quite a few hours trying to push avro data to Flume
> > so i can store it on HDFS, this all with Python.
> > It seems like something that is impossible for now, since the only way
> > to push avro data to Flume is by the use of deprecated thrift binding
> > that look pretty cumbersome to get working.
> > I would like to know what's the best way to import avro data into
> > Flume with Python? Maybe Flume isnt the right tool and I should use
> > something else? My goal is to have multiple python workers pushing
> > data to HDFS which ( by means of Flume in this case ) consolidates
> > this all in 1 file there.
> >
> > Any thoughts?
> >
> > Thanks!
> >
> > Bart
> >
> >
>
>

RE: Using Python and Flume to store avro data

Posted by "Camp, Roy" <rc...@ebay.com>.
We use thrift to send from Python, PHP & Java.  Unfortunately with Flume-NG you must use the legacyThrift source which works well but does not handle a confirmation/ack back to the app.  We have found that failures usually result in connection exception thus allowing us to reconnect and retry so we have virtually no data loss. Everything downstream from that localhost Flume instance (after written to the file channel) is E2E safe. 

Roy


-----Original Message-----
From: Juhani Connolly [mailto:juhani_connolly@cyberagent.co.jp] 
Sent: Thursday, November 08, 2012 5:46 PM
To: user@flume.apache.org
Subject: Re: Using Python and Flume to store avro data

Hi Bart,

we send data  from python to the scribe source and it works fine. We had everything set up in scribe before which made the switchover simple. If you don't mind the extra overhead of http, go for that, but if you want to keep things to a minimum, using the scribe source can be viable.

You can't send data to avro because the python support in avro is missing the appropriate encoder(I can't remember what it was, I'd have to check over the code again)

On 11/09/2012 03:45 AM, Bart Verwilst wrote:
> Hi,
>
> I've been spending quite a few hours trying to push avro data to Flume 
> so i can store it on HDFS, this all with Python.
> It seems like something that is impossible for now, since the only way 
> to push avro data to Flume is by the use of deprecated thrift binding 
> that look pretty cumbersome to get working.
> I would like to know what's the best way to import avro data into 
> Flume with Python? Maybe Flume isnt the right tool and I should use 
> something else? My goal is to have multiple python workers pushing 
> data to HDFS which ( by means of Flume in this case ) consolidates 
> this all in 1 file there.
>
> Any thoughts?
>
> Thanks!
>
> Bart
>
>


Re: Using Python and Flume to store avro data

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.
Hi Bart,

we send data  from python to the scribe source and it works fine. We had 
everything set up in scribe before which made the switchover simple. If 
you don't mind the extra overhead of http, go for that, but if you want 
to keep things to a minimum, using the scribe source can be viable.

You can't send data to avro because the python support in avro is 
missing the appropriate encoder(I can't remember what it was, I'd have 
to check over the code again)

On 11/09/2012 03:45 AM, Bart Verwilst wrote:
> Hi,
>
> I've been spending quite a few hours trying to push avro data to Flume 
> so i can store it on HDFS, this all with Python.
> It seems like something that is impossible for now, since the only  
> way to push avro data to Flume is by the use of deprecated thrift 
> binding that look pretty cumbersome to get working.
> I would like to know what's the best way to import avro data into 
> Flume with Python? Maybe Flume isnt the right tool and I should use 
> something else? My goal is to have multiple python workers pushing 
> data to HDFS which ( by means of Flume in this case ) consolidates 
> this all in 1 file there.
>
> Any thoughts?
>
> Thanks!
>
> Bart
>
>


Re: Using Python and Flume to store avro data

Posted by Brock Noland <br...@cloudera.com>.
Has it been three months since I said that? Yes I would like to get that
done but haven't had time.

However, if you can use python the HTTPSource which is in 1.3.0 should work?

Brock

On Thu, Nov 8, 2012 at 4:49 PM, Bart Verwilst <li...@verwilst.be> wrote:

> **
>
> Brock Noland, I read this on my search for information:
>
> "On 08/03/2012 09:49 PM, Brock Noland wrote:
> > Yeah I agree. FWIW, I am hoping in few weeks I will have a little more
> > spare time and I was planning on writing the Avro patches to ensure
> > languages such as Python, C#, etc could write messages to Flume."
>
> I was wondering if any of this was realized? Since I'm not really suited
> to write my own serializer, I'm still hoping to use Python to send my avro
> to Flume...
>
> Bart
>
>
> Hari Shreedharan schreef op 08.11.2012 22:50:
>
> Yes, the sink serializer is where you would serialize it. The Http/json
> can be used to send the event. This simply converts the json event into
> flume's own Event format. You can write a serializer that either knows the
> schema or reads it from configuration to parse the Flume event.
>
>
> Hari
>
> --
> Hari Shreedharan
>
>
> On Thursday, November 8, 2012 at 1:34 PM, Bart Verwilst wrote:
>
>  Would the sink serializer from
> https://cwiki.apache.org/FLUME/flume-1x-event-serializers.html (
> avro_event ) by the right tool for the job? Probably not since i won't be
> able to send the exact avro schema over the http/json link, and it will
> need conversion first. I'm not a Java programmer though, so i think writing
> my own serializer would be stretching it a bit. :(
>
>
>
> Maybe i can use hadoop streaming to import my avro or something... :(
>
> Kind regards,
>
> Bart
>
>
> Hari Shreedharan schreef op 08.11.2012 22:12:
>
>  Writing to avro files depends on how you serialize your data on the sink
> side, using a serializer. Note that JSON supports only UTF-8/16/32
> encoding, so if you want to send binary data you will need to write your
> own handler for that (you can use the JSON handler as an example) and
> configure the source to use that handler. Once the data is in Flume, just
> plug in your own serializer (which can take the byte array from the event
> and convert it into the schema you want) and write it out.
>
>
> Thanks,
> Hari
>
> --
> Hari Shreedharan
>
>
> On Thursday, November 8, 2012 at 1:02 PM, Bart Verwilst wrote:
>
>  Hi Hari,
>
>
>
> Just to be absolutely sure, you can write to avro files by using this? If
> so, I will try out a snapshot of 1.3 tomorrow and start playing with it. ;)
>
>
>
> Kind regards,
>
>
>
> Bart
>
>
>
>
> Hari Shreedharan schreef op 08.11.2012 20:06:
>
>  No, I am talking about:
> https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3
>
> This will be in the next release which will be out soon.
>
>
> Thanks,
> Hari
>
> --
> Hari Shreedharan
>
>
> On Thursday, November 8, 2012 at 10:57 AM, Bart Verwilst wrote:
>
>  Hi Hari,
>
>
> Are you talking about ipc.HTTPTransciever (
> http://nullege.com/codes/search/avro.ipc.HTTPTransceiver )? This was the
> class I tried before i noticed it wasn't supported by Flume-1.2 :)
>
> I assume the http/json source will also allow for avro to be received?
>
>
>
> Kind regards,
>
> Bart
>
>
> Hari Shreedharan schreef op 08.11.2012 19:51:
>
>   The next release of Flume-1.3.0 adds support for an HTTP source, which
> will allow you to send data to Flume via HTTP/JSON(the representation of
> the data is pluggable - but a JSON representation is default). You could
> use this to write data to Flume from Python, which I believe has good http
> and json support.
>
>
> Thanks,
> Hari
>
> --
> Hari Shreedharan
>
>
> On Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote:
>
>  Hi,
>
> I've been spending quite a few hours trying to push avro data to Flume
> so i can store it on HDFS, this all with Python.
> It seems like something that is impossible for now, since the only way
> to push avro data to Flume is by the use of deprecated thrift binding
> that look pretty cumbersome to get working.
> I would like to know what's the best way to import avro data into Flume
> with Python? Maybe Flume isnt the right tool and I should use something
> else? My goal is to have multiple python workers pushing data to HDFS
> which ( by means of Flume in this case ) consolidates this all in 1 file
> there.
>
> Any thoughts?
>
> Thanks!
>
> Bart
>
>
>
>
>
>
>
>
>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Using Python and Flume to store avro data

Posted by Bart Verwilst <li...@verwilst.be>.
 

Brock Noland, I read this on my search for information: 

"On
08/03/2012 09:49 PM, Brock Noland wrote:
> Yeah I agree. FWIW, I am
hoping in few weeks I will have a little more
> spare time and I was
planning on writing the Avro patches to ensure
> languages such as
Python, C#, etc could write messages to Flume."

I was wondering if any
of this was realized? Since I'm not really suited to write my own
serializer, I'm still hoping to use Python to send my avro to Flume...


Bart 

Hari Shreedharan schreef op 08.11.2012 22:50: 

> Yes, the sink
serializer is where you would serialize it. The Http/json can be used to
send the event. This simply converts the json event into flume's own
Event format. You can write a serializer that either knows the schema or
reads it from configuration to parse the Flume event. 
> 
> Hari 
> 
>
-- 
> Hari Shreedharan 
> 
> On Thursday, November 8, 2012 at 1:34 PM,
Bart Verwilst wrote: 
> 
>> Would the sink serializer from
https://cwiki.apache.org/FLUME/flume-1x-event-serializers.html [3] (
avro_event ) by the right tool for the job? Probably not since i won't
be able to send the exact avro schema over the http/json link, and it
will need conversion first. I'm not a Java programmer though, so i think
writing my own serializer would be stretching it a bit. :( 
>> 
>> Maybe
i can use hadoop streaming to import my avro or something... :( 
>> 
>>
Kind regards, 
>> 
>> Bart 
>> 
>> Hari Shreedharan schreef op
08.11.2012 22:12: 
>> 
>>> Writing to avro files depends on how you
serialize your data on the sink side, using a serializer. Note that JSON
supports only UTF-8/16/32 encoding, so if you want to send binary data
you will need to write your own handler for that (you can use the JSON
handler as an example) and configure the source to use that handler.
Once the data is in Flume, just plug in your own serializer (which can
take the byte array from the event and convert it into the schema you
want) and write it out. 
>>> 
>>> Thanks, 
>>> Hari 
>>> 
>>> -- 
>>>
Hari Shreedharan 
>>> 
>>> On Thursday, November 8, 2012 at 1:02 PM,
Bart Verwilst wrote: 
>>> 
>>>> Hi Hari, 
>>>> 
>>>> Just to be
absolutely sure, you can write to avro files by using this? If so, I
will try out a snapshot of 1.3 tomorrow and start playing with it. ;)

>>>> 
>>>> Kind regards, 
>>>> 
>>>> Bart 
>>>> 
>>>> Hari Shreedharan
schreef op 08.11.2012 20:06: 
>>>> 
>>>>> No, I am talking about:
https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3
[2] 
>>>>> 
>>>>> This will be in the next release which will be out
soon. 
>>>>> 
>>>>> Thanks, 
>>>>> Hari 
>>>>> 
>>>>> -- 
>>>>> Hari
Shreedharan 
>>>>> 
>>>>> On Thursday, November 8, 2012 at 10:57 AM,
Bart Verwilst wrote: 
>>>>> 
>>>>>> Hi Hari, 
>>>>>> 
>>>>>> Are you
talking about ipc.HTTPTransciever (
http://nullege.com/codes/search/avro.ipc.HTTPTransceiver [1] )? This was
the class I tried before i noticed it wasn't supported by Flume-1.2 :)

>>>>>> 
>>>>>> I assume the http/json source will also allow for avro
to be received? 
>>>>>> 
>>>>>> Kind regards, 
>>>>>> 
>>>>>> Bart

>>>>>> 
>>>>>> Hari Shreedharan schreef op 08.11.2012 19:51: 
>>>>>>

>>>>>>> The next release of Flume-1.3.0 adds support for an HTTP
source, which will allow you to send data to Flume via HTTP/JSON(the
representation of the data is pluggable - but a JSON representation is
default). You could use this to write data to Flume from Python, which I
believe has good http and json support. 
>>>>>>> 
>>>>>>> Thanks,

>>>>>>> Hari 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Hari Shreedharan 
>>>>>>>

>>>>>>> On Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote:

>>>>>>> 
>>>>>>>> Hi, 
>>>>>>>> 
>>>>>>>> I've been spending quite a
few hours trying to push avro data to Flume 
>>>>>>>> so i can store it
on HDFS, this all with Python. 
>>>>>>>> It seems like something that is
impossible for now, since the only way 
>>>>>>>> to push avro data to
Flume is by the use of deprecated thrift binding 
>>>>>>>> that look
pretty cumbersome to get working. 
>>>>>>>> I would like to know what's
the best way to import avro data into Flume 
>>>>>>>> with Python? Maybe
Flume isnt the right tool and I should use something 
>>>>>>>> else? My
goal is to have multiple python workers pushing data to HDFS 
>>>>>>>>
which ( by means of Flume in this case ) consolidates this all in 1 file

>>>>>>>> there. 
>>>>>>>> 
>>>>>>>> Any thoughts? 
>>>>>>>> 
>>>>>>>>
Thanks! 
>>>>>>>> 
>>>>>>>> Bart
 

Links:
------
[1]
http://nullege.com/codes/search/avro.ipc.HTTPTransceiver
[2]
https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3
[3]
https://cwiki.apache.org/FLUME/flume-1x-event-serializers.html

Re: Using Python and Flume to store avro data

Posted by Hari Shreedharan <hs...@cloudera.com>.
Yes, the sink serializer is where you would serialize it. The Http/json can be used to send the event. This simply converts the json event into flume's own Event format. You can write a serializer that either knows the schema or reads it from configuration to parse the Flume event.  


Hari

-- 
Hari Shreedharan


On Thursday, November 8, 2012 at 1:34 PM, Bart Verwilst wrote:

> Would the sink serializer from https://cwiki.apache.org/FLUME/flume-1x-event-serializers.html ( avro_event ) by the right tool for the job? Probably not since i won't be able to send the exact avro schema over the http/json link, and it will need conversion first. I'm not a Java programmer though, so i think writing my own serializer would be stretching it a bit. :(
>  
> Maybe i can use hadoop streaming to import my avro or something... :(
> Kind regards,
> Bart
>  
> Hari Shreedharan schreef op 08.11.2012 22:12:
> > Writing to avro files depends on how you serialize your data on the sink side, using a serializer. Note that JSON supports only UTF-8/16/32 encoding, so if you want to send binary data you will need to write your own handler for that (you can use the JSON handler as an example) and configure the source to use that handler. Once the data is in Flume, just plug in your own serializer (which can take the byte array from the event and convert it into the schema you want) and write it out.
> >  
> >  
> > Thanks,
> > Hari
> >  
> > -- 
> > Hari Shreedharan
> >  
> > 
> > 
> > On Thursday, November 8, 2012 at 1:02 PM, Bart Verwilst wrote:
> > 
> > > Hi Hari,
> > >  
> > > Just to be absolutely sure, you can write to avro files by using this? If so, I will try out a snapshot of 1.3 tomorrow and start playing with it. ;)
> > >  
> > > Kind regards,
> > >  
> > > Bart
> > >  
> > >  
> > > Hari Shreedharan schreef op 08.11.2012 20:06:
> > > > No, I am talking about: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3
> > > >  
> > > > This will be in the next release which will be out soon.
> > > >  
> > > >  
> > > > Thanks,
> > > > Hari
> > > >  
> > > > -- 
> > > > Hari Shreedharan
> > > >  
> > > > 
> > > > 
> > > > On Thursday, November 8, 2012 at 10:57 AM, Bart Verwilst wrote:
> > > > 
> > > > > Hi Hari,
> > > > > 
> > > > > Are you talking about ipc.HTTPTransciever ( http://nullege.com/codes/search/avro.ipc.HTTPTransceiver )? This was the class I tried before i noticed it wasn't supported by Flume-1.2 :) 
> > > > > I assume the http/json source will also allow for avro to be received?
> > > > >  
> > > > > Kind regards,
> > > > > Bart
> > > > >  
> > > > > Hari Shreedharan schreef op 08.11.2012 19:51:
> > > > > > The next release of Flume-1.3.0 adds support for an HTTP source, which will allow you to send data to Flume via HTTP/JSON(the representation of the data is pluggable - but a JSON representation is default). You could use this to write data to Flume from Python, which I believe has good http and json support.
> > > > > >  
> > > > > >  
> > > > > > Thanks,
> > > > > > Hari
> > > > > >  
> > > > > > -- 
> > > > > > Hari Shreedharan
> > > > > >  
> > > > > > 
> > > > > > 
> > > > > > On Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote:
> > > > > > 
> > > > > > > Hi,
> > > > > > >  
> > > > > > > I've been spending quite a few hours trying to push avro data to Flume
> > > > > > > so i can store it on HDFS, this all with Python.
> > > > > > > It seems like something that is impossible for now, since the only way
> > > > > > > to push avro data to Flume is by the use of deprecated thrift binding
> > > > > > > that look pretty cumbersome to get working.
> > > > > > > I would like to know what's the best way to import avro data into Flume
> > > > > > > with Python? Maybe Flume isnt the right tool and I should use something
> > > > > > > else? My goal is to have multiple python workers pushing data to HDFS
> > > > > > > which ( by means of Flume in this case ) consolidates this all in 1 file
> > > > > > > there.
> > > > > > >  
> > > > > > > Any thoughts?
> > > > > > >  
> > > > > > > Thanks!
> > > > > > >  
> > > > > > > Bart
> > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > >  
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > 
> > > >  
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > 
> >  
> > 
> 
> 
> 



Re: Using Python and Flume to store avro data

Posted by Bart Verwilst <li...@verwilst.be>.
 

Would the sink serializer from
https://cwiki.apache.org/FLUME/flume-1x-event-serializers.html (
avro_event ) by the right tool for the job? Probably not since i won't
be able to send the exact avro schema over the http/json link, and it
will need conversion first. I'm not a Java programmer though, so i think
writing my own serializer would be stretching it a bit. :( 

Maybe i can
use hadoop streaming to import my avro or something... :( 

Kind
regards, 

Bart 

Hari Shreedharan schreef op 08.11.2012 22:12: 

>
Writing to avro files depends on how you serialize your data on the sink
side, using a serializer. Note that JSON supports only UTF-8/16/32
encoding, so if you want to send binary data you will need to write your
own handler for that (you can use the JSON handler as an example) and
configure the source to use that handler. Once the data is in Flume,
just plug in your own serializer (which can take the byte array from the
event and convert it into the schema you want) and write it out. 
> 
>
Thanks, 
> Hari 
> 
> -- 
> Hari Shreedharan 
> 
> On Thursday, November
8, 2012 at 1:02 PM, Bart Verwilst wrote: 
> 
>> Hi Hari, 
>> 
>> Just to
be absolutely sure, you can write to avro files by using this? If so, I
will try out a snapshot of 1.3 tomorrow and start playing with it. ;)

>> 
>> Kind regards, 
>> 
>> Bart 
>> 
>> Hari Shreedharan schreef op
08.11.2012 20:06: 
>> 
>>> No, I am talking about:
https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3
[2] 
>>> 
>>> This will be in the next release which will be out soon.

>>> 
>>> Thanks, 
>>> Hari 
>>> 
>>> -- 
>>> Hari Shreedharan 
>>> 
>>>
On Thursday, November 8, 2012 at 10:57 AM, Bart Verwilst wrote: 
>>>

>>>> Hi Hari, 
>>>> 
>>>> Are you talking about ipc.HTTPTransciever (
http://nullege.com/codes/search/avro.ipc.HTTPTransceiver [1] )? This was
the class I tried before i noticed it wasn't supported by Flume-1.2 :)

>>>> 
>>>> I assume the http/json source will also allow for avro to be
received? 
>>>> 
>>>> Kind regards, 
>>>> 
>>>> Bart 
>>>> 
>>>> Hari
Shreedharan schreef op 08.11.2012 19:51: 
>>>> 
>>>>> The next release
of Flume-1.3.0 adds support for an HTTP source, which will allow you to
send data to Flume via HTTP/JSON(the representation of the data is
pluggable - but a JSON representation is default). You could use this to
write data to Flume from Python, which I believe has good http and json
support. 
>>>>> 
>>>>> Thanks, 
>>>>> Hari 
>>>>> 
>>>>> -- 
>>>>> Hari
Shreedharan 
>>>>> 
>>>>> On Thursday, November 8, 2012 at 10:45 AM,
Bart Verwilst wrote: 
>>>>> 
>>>>>> Hi, 
>>>>>> 
>>>>>> I've been
spending quite a few hours trying to push avro data to Flume 
>>>>>> so
i can store it on HDFS, this all with Python. 
>>>>>> It seems like
something that is impossible for now, since the only way 
>>>>>> to push
avro data to Flume is by the use of deprecated thrift binding 
>>>>>>
that look pretty cumbersome to get working. 
>>>>>> I would like to know
what's the best way to import avro data into Flume 
>>>>>> with Python?
Maybe Flume isnt the right tool and I should use something 
>>>>>> else?
My goal is to have multiple python workers pushing data to HDFS 
>>>>>>
which ( by means of Flume in this case ) consolidates this all in 1 file

>>>>>> there. 
>>>>>> 
>>>>>> Any thoughts? 
>>>>>> 
>>>>>> Thanks!

>>>>>> 
>>>>>> Bart
 

Links:
------
[1]
http://nullege.com/codes/search/avro.ipc.HTTPTransceiver
[2]
https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3

Re: Using Python and Flume to store avro data

Posted by Hari Shreedharan <hs...@cloudera.com>.
Writing to avro files depends on how you serialize your data on the sink side, using a serializer. Note that JSON supports only UTF-8/16/32 encoding, so if you want to send binary data you will need to write your own handler for that (you can use the JSON handler as an example) and configure the source to use that handler. Once the data is in Flume, just plug in your own serializer (which can take the byte array from the event and convert it into the schema you want) and write it out. 


Thanks,
Hari

-- 
Hari Shreedharan


On Thursday, November 8, 2012 at 1:02 PM, Bart Verwilst wrote:

> Hi Hari,
>  
> Just to be absolutely sure, you can write to avro files by using this? If so, I will try out a snapshot of 1.3 tomorrow and start playing with it. ;)
>  
> Kind regards,
>  
> Bart
>  
>  
> Hari Shreedharan schreef op 08.11.2012 20:06:
> > No, I am talking about: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3
> >  
> > This will be in the next release which will be out soon.
> >  
> >  
> > Thanks,
> > Hari
> >  
> > -- 
> > Hari Shreedharan
> >  
> > 
> > 
> > On Thursday, November 8, 2012 at 10:57 AM, Bart Verwilst wrote:
> > 
> > > Hi Hari,
> > > 
> > > Are you talking about ipc.HTTPTransciever ( http://nullege.com/codes/search/avro.ipc.HTTPTransceiver )? This was the class I tried before i noticed it wasn't supported by Flume-1.2 :) 
> > > I assume the http/json source will also allow for avro to be received?
> > >  
> > > Kind regards,
> > > Bart
> > >  
> > > Hari Shreedharan schreef op 08.11.2012 19:51:
> > > > The next release of Flume-1.3.0 adds support for an HTTP source, which will allow you to send data to Flume via HTTP/JSON(the representation of the data is pluggable - but a JSON representation is default). You could use this to write data to Flume from Python, which I believe has good http and json support.
> > > >  
> > > >  
> > > > Thanks,
> > > > Hari
> > > >  
> > > > -- 
> > > > Hari Shreedharan
> > > >  
> > > > 
> > > > 
> > > > On Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote:
> > > > 
> > > > > Hi,
> > > > >  
> > > > > I've been spending quite a few hours trying to push avro data to Flume
> > > > > so i can store it on HDFS, this all with Python.
> > > > > It seems like something that is impossible for now, since the only way
> > > > > to push avro data to Flume is by the use of deprecated thrift binding
> > > > > that look pretty cumbersome to get working.
> > > > > I would like to know what's the best way to import avro data into Flume
> > > > > with Python? Maybe Flume isnt the right tool and I should use something
> > > > > else? My goal is to have multiple python workers pushing data to HDFS
> > > > > which ( by means of Flume in this case ) consolidates this all in 1 file
> > > > > there.
> > > > >  
> > > > > Any thoughts?
> > > > >  
> > > > > Thanks!
> > > > >  
> > > > > Bart
> > > > > 
> > > > > 
> > > > 
> > > >  
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > 
> >  
> > 
> 
> 
> 



Re: Using Python and Flume to store avro data

Posted by Bart Verwilst <li...@verwilst.be>.
 

Hi Hari, 

Just to be absolutely sure, you can write to avro files
by using this? If so, I will try out a snapshot of 1.3 tomorrow and
start playing with it. ;) 

Kind regards, 

Bart 

Hari Shreedharan
schreef op 08.11.2012 20:06: 

> No, I am talking about:
https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3
[2] 
> 
> This will be in the next release which will be out soon. 
>

> Thanks, 
> Hari 
> 
> -- 
> Hari Shreedharan 
> 
> On Thursday,
November 8, 2012 at 10:57 AM, Bart Verwilst wrote: 
> 
>> Hi Hari, 
>>

>> Are you talking about ipc.HTTPTransciever (
http://nullege.com/codes/search/avro.ipc.HTTPTransceiver [1] )? This was
the class I tried before i noticed it wasn't supported by Flume-1.2 :)

>> 
>> I assume the http/json source will also allow for avro to be
received? 
>> 
>> Kind regards, 
>> 
>> Bart 
>> 
>> Hari Shreedharan
schreef op 08.11.2012 19:51: 
>> 
>>> The next release of Flume-1.3.0
adds support for an HTTP source, which will allow you to send data to
Flume via HTTP/JSON(the representation of the data is pluggable - but a
JSON representation is default). You could use this to write data to
Flume from Python, which I believe has good http and json support. 
>>>

>>> Thanks, 
>>> Hari 
>>> 
>>> -- 
>>> Hari Shreedharan 
>>> 
>>> On
Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote: 
>>> 
>>>>
Hi, 
>>>> 
>>>> I've been spending quite a few hours trying to push avro
data to Flume 
>>>> so i can store it on HDFS, this all with Python.

>>>> It seems like something that is impossible for now, since the only
way 
>>>> to push avro data to Flume is by the use of deprecated thrift
binding 
>>>> that look pretty cumbersome to get working. 
>>>> I would
like to know what's the best way to import avro data into Flume 
>>>>
with Python? Maybe Flume isnt the right tool and I should use something

>>>> else? My goal is to have multiple python workers pushing data to
HDFS 
>>>> which ( by means of Flume in this case ) consolidates this
all in 1 file 
>>>> there. 
>>>> 
>>>> Any thoughts? 
>>>> 
>>>> Thanks!

>>>> 
>>>> Bart
 

Links:
------
[1]
http://nullege.com/codes/search/avro.ipc.HTTPTransceiver
[2]
https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3

Re: Using Python and Flume to store avro data

Posted by Hari Shreedharan <hs...@cloudera.com>.
No, I am talking about: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3 

This will be in the next release which will be out soon.


Thanks,
Hari

-- 
Hari Shreedharan


On Thursday, November 8, 2012 at 10:57 AM, Bart Verwilst wrote:

> Hi Hari,
> 
> Are you talking about ipc.HTTPTransciever ( http://nullege.com/codes/search/avro.ipc.HTTPTransceiver )? This was the class I tried before i noticed it wasn't supported by Flume-1.2 :) 
> I assume the http/json source will also allow for avro to be received?
>  
> Kind regards,
> Bart
>  
> Hari Shreedharan schreef op 08.11.2012 19:51:
> > The next release of Flume-1.3.0 adds support for an HTTP source, which will allow you to send data to Flume via HTTP/JSON(the representation of the data is pluggable - but a JSON representation is default). You could use this to write data to Flume from Python, which I believe has good http and json support.
> >  
> >  
> > Thanks,
> > Hari
> >  
> > -- 
> > Hari Shreedharan
> >  
> > 
> > 
> > On Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote:
> > 
> > > Hi,
> > >  
> > > I've been spending quite a few hours trying to push avro data to Flume
> > > so i can store it on HDFS, this all with Python.
> > > It seems like something that is impossible for now, since the only way
> > > to push avro data to Flume is by the use of deprecated thrift binding
> > > that look pretty cumbersome to get working.
> > > I would like to know what's the best way to import avro data into Flume
> > > with Python? Maybe Flume isnt the right tool and I should use something
> > > else? My goal is to have multiple python workers pushing data to HDFS
> > > which ( by means of Flume in this case ) consolidates this all in 1 file
> > > there.
> > >  
> > > Any thoughts?
> > >  
> > > Thanks!
> > >  
> > > Bart
> > > 
> > > 
> > 
> >  
> > 
> 
> 
> 



Re: Using Python and Flume to store avro data

Posted by Bart Verwilst <li...@verwilst.be>.
 

Hi Hari, 

Are you talking about ipc.HTTPTransciever (
http://nullege.com/codes/search/avro.ipc.HTTPTransceiver )? This was the
class I tried before i noticed it wasn't supported by Flume-1.2 :) 

I
assume the http/json source will also allow for avro to be received?


Kind regards, 

Bart 

Hari Shreedharan schreef op 08.11.2012 19:51:


> The next release of Flume-1.3.0 adds support for an HTTP source,
which will allow you to send data to Flume via HTTP/JSON(the
representation of the data is pluggable - but a JSON representation is
default). You could use this to write data to Flume from Python, which I
believe has good http and json support. 
> 
> Thanks, 
> Hari 
> 
> --

> Hari Shreedharan 
> 
> On Thursday, November 8, 2012 at 10:45 AM,
Bart Verwilst wrote: 
> 
>> Hi, 
>> 
>> I've been spending quite a few
hours trying to push avro data to Flume 
>> so i can store it on HDFS,
this all with Python. 
>> It seems like something that is impossible for
now, since the only way 
>> to push avro data to Flume is by the use of
deprecated thrift binding 
>> that look pretty cumbersome to get
working. 
>> I would like to know what's the best way to import avro
data into Flume 
>> with Python? Maybe Flume isnt the right tool and I
should use something 
>> else? My goal is to have multiple python
workers pushing data to HDFS 
>> which ( by means of Flume in this case
) consolidates this all in 1 file 
>> there. 
>> 
>> Any thoughts? 
>>

>> Thanks! 
>> 
>> Bart
 

Re: Using Python and Flume to store avro data

Posted by Hari Shreedharan <hs...@cloudera.com>.
The next release of Flume-1.3.0 adds support for an HTTP source, which will allow you to send data to Flume via HTTP/JSON(the representation of the data is pluggable - but a JSON representation is default). You could use this to write data to Flume from Python, which I believe has good http and json support. 


Thanks,
Hari

-- 
Hari Shreedharan


On Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote:

> Hi,
> 
> I've been spending quite a few hours trying to push avro data to Flume 
> so i can store it on HDFS, this all with Python.
> It seems like something that is impossible for now, since the only way 
> to push avro data to Flume is by the use of deprecated thrift binding 
> that look pretty cumbersome to get working.
> I would like to know what's the best way to import avro data into Flume 
> with Python? Maybe Flume isnt the right tool and I should use something 
> else? My goal is to have multiple python workers pushing data to HDFS 
> which ( by means of Flume in this case ) consolidates this all in 1 file 
> there.
> 
> Any thoughts?
> 
> Thanks!
> 
> Bart