You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Deepak Subhramanian <de...@gmail.com> on 2013/10/03 16:36:27 UTC

Converting text to avro in Flume

Hi ,

I want to convert xml files in text to an avro file and store it in hdfs .
I get the xml files as a post request. I extended the  HTTPHandler to
process the XML post request. Do I have to convert the data in text to avro
in HTTPHandler or does the Avro Sink or HDFSSink convert it directly to
avro with some configuration details. I want to store the entire xml string
in an avro variable.

Thanks in advance for any inputs.
Deepak Subhramanian

Re: Converting text to avro in Flume

Posted by Deepak Subhramanian <de...@gmail.com>.
Hi Hari,

 Finally I got it working.I am not sure what was missing in my earlier
configuration.  My configuration might not have used by the instance agent
because of the way CDH uses inheritances on agent configuration. Thanks for
all your help.

tier1.sinks.sink1.type = hdfs
tier1.sinks.sink1.hdfs.path = /tmp/flumecollector
tier1.sinks.sink1.hdfs.filePrefix = access_log
tier1.sinks.sink1.hdfs.fileSuffix = .avro
tier1.sinks.sink1.serializer = avro_event
tier1.sinks.sink1.hdfs.fileType = DataStream
tier1.sinks.sink1.serializer.compressionCodec = snappy


Thanks, Deepak



On Mon, Oct 7, 2013 at 12:23 AM, Hari Shreedharan <hshreedharan@cloudera.com
> wrote:

>  The Avro Sink is used for comunication between Flume agents. To directly
> insert into HDFS you simply use an Avro Serializer with the HDFS sink.
>
>
> Thanks,
> Hari
>
> On Sunday, October 6, 2013 at 3:38 PM, Deepak Subhramanian wrote:
>
> Hi Hari ,
> I tried using an avro sink after HTTPSource and then an avro source and
> hdfs sink and it seems to be working. Do we have to use an avro sink first
> or can we directly convert to avro using HDFS sink ?
>
> Thanks, Deepak
>
>
> On Sun, Oct 6, 2013 at 11:27 PM, Deepak Subhramanian <
> deepak.subhramanian@gmail.com> wrote:
>
> There was a  mistake in my configuration.  I had hdfs infront of
> serializer.
> Changed
>  tier1.sinks.sink1.hdfs.serializer =  avro_event
>
> to  tier1.sinks.sink1.serializer =  avro_event
>
> But it is still generating a sequence file. This is what I get.
>
> SEQ!org.apache.hadoop.io.LongWritableorg.apache.hadoop.io.TextK???2-%??-/??
> A??,? ?<message>xmldata</message>
>
>
> On Fri, Oct 4, 2013 at 10:43 PM, Deepak Subhramanian <
> deepak.subhramanian@gmail.com> wrote:
>
> Thanks Hari.
>
> I speficied the fileType.  This is what I have. I will try again and let
> you know.
>
> tier1.sources  = httpsrc1
> tier1.channels = c1
> tier1.sinks    = sink1
>
> tier1.sources.httpsrc1.bind     = 127.0.0.1
> tier1.sources.httpsrc1.type = http
> tier1.sources.httpsrc1.port = 9999
> tier1.sources.httpsrc1.channels = c1
> tier1.sources.httpsrc1.handler = spikes.flume.XMLHandler
> tier1.sources.httpsrc1.handler.nickname = HTTPTesting
>
> tier1.channels.c1.type   = memory
> tier1.channels.c1.capacity = 100
> #tier1.sinks.sink1.type         = logger
> tier1.sinks.sink1.channel      = c1
>
>
>  tier1.sinks.sink1.type = hdfs
>
> tier1.sinks.sink1.hdfs.path = /tmp/flumecollector
> tier1.sinks.sink1.hdfs.filePrefix = access_log
> tier1.sinks.sink1.hdfs.fileSuffix = .avro
> tier1.sinks.sink1.hdfs.fileType = DataStream
> tier1.sinks.sink1.hdfs.serializer =  avro_event
>
> I also added this later.
> tier1.sinks.sink1.hdfs.serializer.appendNewline = true
> tier1.sinks.sink1.hdfs.serializer.compressionCodec = snappy
>
>
>
> On Fri, Oct 4, 2013 at 4:56 PM, Hari Shreedharan <
> hshreedharan@cloudera.com> wrote:
>
>  The default data type for HDFS Sink is Sequence file. Set the
> hdfs.fileType to DataStream. See details here:
> http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
>
>
> Thanks,
> Hari
>
> On Friday, October 4, 2013 at 6:52 AM, Deepak Subhramanian wrote:
>
> I tried using the HDFS Sink to generate the avro file by using the
> serializer as avro_event. But it is not generating avro file. But a
> sequence file. Is it not suppose to generate a avro file with default
> schema ?  Or do I have to generate the avro data from text in my
> HTTPHandler source ?
>
>  "{ \"type\":\"record\", \"name\": \"Event\", \"fields\": [" +
>
>       " {\"name\": \"headers\", \"type\": { \"type\": \"map\",
> \"values\": \"string\" } }, " +
>       " {\"name\": \"body\", \"type\": \"bytes\" } ] }");
>
>
> On Thu, Oct 3, 2013 at 3:36 PM, Deepak Subhramanian <
> deepak.subhramanian@gmail.com> wrote:
>
> Hi ,
>
> I want to convert xml files in text to an avro file and store it in hdfs .
> I get the xml files as a post request. I extended the  HTTPHandler to
> process the XML post request. Do I have to convert the data in text to avro
> in HTTPHandler or does the Avro Sink or HDFSSink convert it directly to
> avro with some configuration details. I want to store the entire xml string
> in an avro variable.
>
> Thanks in advance for any inputs.
> Deepak Subhramanian
>
>
>
>
> --
> Deepak Subhramanian
>
>
>
>
>
> --
> Deepak Subhramanian
>
>
>
>
> --
> Deepak Subhramanian
>
>
>
>
> --
> Deepak Subhramanian
>
>
>


-- 
Deepak Subhramanian

Re: Converting text to avro in Flume

Posted by Hari Shreedharan <hs...@cloudera.com>.
The Avro Sink is used for comunication between Flume agents. To directly insert into HDFS you simply use an Avro Serializer with the HDFS sink. 


Thanks,
Hari


On Sunday, October 6, 2013 at 3:38 PM, Deepak Subhramanian wrote:

> Hi Hari ,
> I tried using an avro sink after HTTPSource and then an avro source and hdfs sink and it seems to be working. Do we have to use an avro sink first or can we directly convert to avro using HDFS sink ?
> 
> Thanks, Deepak
> 
> 
> On Sun, Oct 6, 2013 at 11:27 PM, Deepak Subhramanian <deepak.subhramanian@gmail.com (mailto:deepak.subhramanian@gmail.com)> wrote:
> > There was a  mistake in my configuration.  I had hdfs infront of serializer. 
> > Changed 
> >  tier1.sinks.sink1.hdfs.serializer =  avro_event
> > 
> > to  tier1.sinks.sink1.serializer =  avro_event
> > 
> > But it is still generating a sequence file. This is what I get. 
> > 
> > SEQ!org.apache.hadoop.io.LongWritableorg.apache.hadoop.io.TextK???2-%??-/?? A??,? ?<message>xmldata</message>
> > 
> > 
> > On Fri, Oct 4, 2013 at 10:43 PM, Deepak Subhramanian <deepak.subhramanian@gmail.com (mailto:deepak.subhramanian@gmail.com)> wrote:
> > > Thanks Hari. 
> > > 
> > > I speficied the fileType.  This is what I have. I will try again and let you know.  
> > > 
> > > tier1.sources  = httpsrc1
> > > tier1.channels = c1  
> > > tier1.sinks    = sink1  
> > >  
> > > tier1.sources.httpsrc1.bind     = 127.0.0.1
> > > tier1.sources.httpsrc1.type = http
> > > tier1.sources.httpsrc1.port = 9999
> > > tier1.sources.httpsrc1.channels = c1
> > > tier1.sources.httpsrc1.handler = spikes.flume.XMLHandler
> > > tier1.sources.httpsrc1.handler.nickname = HTTPTesting
> > > 
> > > tier1.channels.c1.type   = memory
> > > tier1.channels.c1.capacity = 100
> > > #tier1.sinks.sink1.type         = logger
> > > tier1.sinks.sink1.channel      = c1
> > > 
> > >  
> > >  tier1.sinks.sink1.type = hdfs 
> > >  
> > > tier1.sinks.sink1.hdfs.path = /tmp/flumecollector 
> > > tier1.sinks.sink1.hdfs.filePrefix = access_log 
> > > tier1.sinks.sink1.hdfs.fileSuffix = .avro
> > > tier1.sinks.sink1.hdfs.fileType = DataStream
> > > tier1.sinks.sink1.hdfs.serializer =  avro_event
> > > 
> > > I also added this later. 
> > > tier1.sinks.sink1.hdfs.serializer.appendNewline = true
> > > tier1.sinks.sink1.hdfs.serializer.compressionCodec = snappy
> > >  
> > > 
> > > 
> > > 
> > > 
> > > On Fri, Oct 4, 2013 at 4:56 PM, Hari Shreedharan <hshreedharan@cloudera.com (mailto:hshreedharan@cloudera.com)> wrote:
> > > > The default data type for HDFS Sink is Sequence file. Set the hdfs.fileType to DataStream. See details here: http://flume.apache.org/FlumeUserGuide.html#hdfs-sink 
> > > > 
> > > > 
> > > > Thanks,
> > > > Hari
> > > > 
> > > > 
> > > > On Friday, October 4, 2013 at 6:52 AM, Deepak Subhramanian wrote:
> > > > 
> > > > > I tried using the HDFS Sink to generate the avro file by using the serializer as avro_event. But it is not generating avro file. But a sequence file. Is it not suppose to generate a avro file with default schema ?  Or do I have to generate the avro data from text in my HTTPHandler source ? 
> > > > > 
> > > > >  "{ \"type\":\"record\", \"name\": \"Event\", \"fields\": [" + 
> > > > >       " {\"name\": \"headers\", \"type\": { \"type\": \"map\", \"values\": \"string\" } }, " +
> > > > >       " {\"name\": \"body\", \"type\": \"bytes\" } ] }");  
> > > > > 
> > > > > 
> > > > > 
> > > > > On Thu, Oct 3, 2013 at 3:36 PM, Deepak Subhramanian <deepak.subhramanian@gmail.com (mailto:deepak.subhramanian@gmail.com)> wrote:
> > > > > > Hi ,
> > > > > > 
> > > > > > I want to convert xml files in text to an avro file and store it in hdfs . I get the xml files as a post request. I extended the  HTTPHandler to process the XML post request. Do I have to convert the data in text to avro in HTTPHandler or does the Avro Sink or HDFSSink convert it directly to avro with some configuration details. I want to store the entire xml string in an avro variable.  
> > > > > > 
> > > > > > Thanks in advance for any inputs. 
> > > > > > Deepak Subhramanian 
> > > > > > 
> > > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > -- 
> > > > > Deepak Subhramanian 
> > > > 
> > > 
> > > 
> > > 
> > > -- 
> > > Deepak Subhramanian 
> > 
> > 
> > -- 
> > Deepak Subhramanian 
> 
> 
> -- 
> Deepak Subhramanian 


Re: Converting text to avro in Flume

Posted by Deepak Subhramanian <de...@gmail.com>.
Hi Hari ,
I tried using an avro sink after HTTPSource and then an avro source and
hdfs sink and it seems to be working. Do we have to use an avro sink first
or can we directly convert to avro using HDFS sink ?

Thanks, Deepak


On Sun, Oct 6, 2013 at 11:27 PM, Deepak Subhramanian <
deepak.subhramanian@gmail.com> wrote:

> There was a  mistake in my configuration.  I had hdfs infront of
> serializer.
> Changed
>  tier1.sinks.sink1.hdfs.serializer =  avro_event
>
> to  tier1.sinks.sink1.serializer =  avro_event
>
> But it is still generating a sequence file. This is what I get.
>
> SEQ!org.apache.hadoop.io.LongWritableorg.apache.hadoop.io.TextK???2-%??-/??
> A??,? ?<message>xmldata</message>
>
>
> On Fri, Oct 4, 2013 at 10:43 PM, Deepak Subhramanian <
> deepak.subhramanian@gmail.com> wrote:
>
>> Thanks Hari.
>>
>> I speficied the fileType.  This is what I have. I will try again and let
>> you know.
>>
>> tier1.sources  = httpsrc1
>> tier1.channels = c1
>> tier1.sinks    = sink1
>>
>> tier1.sources.httpsrc1.bind     = 127.0.0.1
>> tier1.sources.httpsrc1.type = http
>> tier1.sources.httpsrc1.port = 9999
>> tier1.sources.httpsrc1.channels = c1
>> tier1.sources.httpsrc1.handler = spikes.flume.XMLHandler
>> tier1.sources.httpsrc1.handler.nickname = HTTPTesting
>>
>> tier1.channels.c1.type   = memory
>> tier1.channels.c1.capacity = 100
>> #tier1.sinks.sink1.type         = logger
>> tier1.sinks.sink1.channel      = c1
>>
>>
>>  tier1.sinks.sink1.type = hdfs
>>
>> tier1.sinks.sink1.hdfs.path = /tmp/flumecollector
>> tier1.sinks.sink1.hdfs.filePrefix = access_log
>> tier1.sinks.sink1.hdfs.fileSuffix = .avro
>> tier1.sinks.sink1.hdfs.fileType = DataStream
>> tier1.sinks.sink1.hdfs.serializer =  avro_event
>>
>> I also added this later.
>> tier1.sinks.sink1.hdfs.serializer.appendNewline = true
>> tier1.sinks.sink1.hdfs.serializer.compressionCodec = snappy
>>
>>
>>
>> On Fri, Oct 4, 2013 at 4:56 PM, Hari Shreedharan <
>> hshreedharan@cloudera.com> wrote:
>>
>>>  The default data type for HDFS Sink is Sequence file. Set the
>>> hdfs.fileType to DataStream. See details here:
>>> http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
>>>
>>>
>>> Thanks,
>>> Hari
>>>
>>> On Friday, October 4, 2013 at 6:52 AM, Deepak Subhramanian wrote:
>>>
>>> I tried using the HDFS Sink to generate the avro file by using the
>>> serializer as avro_event. But it is not generating avro file. But a
>>> sequence file. Is it not suppose to generate a avro file with default
>>> schema ?  Or do I have to generate the avro data from text in my
>>> HTTPHandler source ?
>>>
>>>  "{ \"type\":\"record\", \"name\": \"Event\", \"fields\": [" +
>>>
>>>       " {\"name\": \"headers\", \"type\": { \"type\": \"map\",
>>> \"values\": \"string\" } }, " +
>>>       " {\"name\": \"body\", \"type\": \"bytes\" } ] }");
>>>
>>>
>>> On Thu, Oct 3, 2013 at 3:36 PM, Deepak Subhramanian <
>>> deepak.subhramanian@gmail.com> wrote:
>>>
>>> Hi ,
>>>
>>> I want to convert xml files in text to an avro file and store it in hdfs
>>> . I get the xml files as a post request. I extended the  HTTPHandler to
>>> process the XML post request. Do I have to convert the data in text to avro
>>> in HTTPHandler or does the Avro Sink or HDFSSink convert it directly to
>>> avro with some configuration details. I want to store the entire xml string
>>> in an avro variable.
>>>
>>> Thanks in advance for any inputs.
>>> Deepak Subhramanian
>>>
>>>
>>>
>>>
>>> --
>>> Deepak Subhramanian
>>>
>>>
>>>
>>
>>
>> --
>> Deepak Subhramanian
>>
>
>
>
> --
> Deepak Subhramanian
>



-- 
Deepak Subhramanian

Re: Converting text to avro in Flume

Posted by Deepak Subhramanian <de...@gmail.com>.
There was a  mistake in my configuration.  I had hdfs infront of
serializer.
Changed
 tier1.sinks.sink1.hdfs.serializer =  avro_event

to  tier1.sinks.sink1.serializer =  avro_event

But it is still generating a sequence file. This is what I get.

SEQ!org.apache.hadoop.io.LongWritableorg.apache.hadoop.io.TextK???2-%??-/??
A??,? ?<message>xmldata</message>


On Fri, Oct 4, 2013 at 10:43 PM, Deepak Subhramanian <
deepak.subhramanian@gmail.com> wrote:

> Thanks Hari.
>
> I speficied the fileType.  This is what I have. I will try again and let
> you know.
>
> tier1.sources  = httpsrc1
> tier1.channels = c1
> tier1.sinks    = sink1
>
> tier1.sources.httpsrc1.bind     = 127.0.0.1
> tier1.sources.httpsrc1.type = http
> tier1.sources.httpsrc1.port = 9999
> tier1.sources.httpsrc1.channels = c1
> tier1.sources.httpsrc1.handler = spikes.flume.XMLHandler
> tier1.sources.httpsrc1.handler.nickname = HTTPTesting
>
> tier1.channels.c1.type   = memory
> tier1.channels.c1.capacity = 100
> #tier1.sinks.sink1.type         = logger
> tier1.sinks.sink1.channel      = c1
>
>
>  tier1.sinks.sink1.type = hdfs
>
> tier1.sinks.sink1.hdfs.path = /tmp/flumecollector
> tier1.sinks.sink1.hdfs.filePrefix = access_log
> tier1.sinks.sink1.hdfs.fileSuffix = .avro
> tier1.sinks.sink1.hdfs.fileType = DataStream
> tier1.sinks.sink1.hdfs.serializer =  avro_event
>
> I also added this later.
> tier1.sinks.sink1.hdfs.serializer.appendNewline = true
> tier1.sinks.sink1.hdfs.serializer.compressionCodec = snappy
>
>
>
> On Fri, Oct 4, 2013 at 4:56 PM, Hari Shreedharan <
> hshreedharan@cloudera.com> wrote:
>
>>  The default data type for HDFS Sink is Sequence file. Set the
>> hdfs.fileType to DataStream. See details here:
>> http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
>>
>>
>> Thanks,
>> Hari
>>
>> On Friday, October 4, 2013 at 6:52 AM, Deepak Subhramanian wrote:
>>
>> I tried using the HDFS Sink to generate the avro file by using the
>> serializer as avro_event. But it is not generating avro file. But a
>> sequence file. Is it not suppose to generate a avro file with default
>> schema ?  Or do I have to generate the avro data from text in my
>> HTTPHandler source ?
>>
>>  "{ \"type\":\"record\", \"name\": \"Event\", \"fields\": [" +
>>
>>       " {\"name\": \"headers\", \"type\": { \"type\": \"map\",
>> \"values\": \"string\" } }, " +
>>       " {\"name\": \"body\", \"type\": \"bytes\" } ] }");
>>
>>
>> On Thu, Oct 3, 2013 at 3:36 PM, Deepak Subhramanian <
>> deepak.subhramanian@gmail.com> wrote:
>>
>> Hi ,
>>
>> I want to convert xml files in text to an avro file and store it in hdfs
>> . I get the xml files as a post request. I extended the  HTTPHandler to
>> process the XML post request. Do I have to convert the data in text to avro
>> in HTTPHandler or does the Avro Sink or HDFSSink convert it directly to
>> avro with some configuration details. I want to store the entire xml string
>> in an avro variable.
>>
>> Thanks in advance for any inputs.
>> Deepak Subhramanian
>>
>>
>>
>>
>> --
>> Deepak Subhramanian
>>
>>
>>
>
>
> --
> Deepak Subhramanian
>



-- 
Deepak Subhramanian

Re: Converting text to avro in Flume

Posted by Deepak Subhramanian <de...@gmail.com>.
Thanks Hari.

I speficied the fileType.  This is what I have. I will try again and let
you know.

tier1.sources  = httpsrc1
tier1.channels = c1
tier1.sinks    = sink1

tier1.sources.httpsrc1.bind     = 127.0.0.1
tier1.sources.httpsrc1.type = http
tier1.sources.httpsrc1.port = 9999
tier1.sources.httpsrc1.channels = c1
tier1.sources.httpsrc1.handler = spikes.flume.XMLHandler
tier1.sources.httpsrc1.handler.nickname = HTTPTesting

tier1.channels.c1.type   = memory
tier1.channels.c1.capacity = 100
#tier1.sinks.sink1.type         = logger
tier1.sinks.sink1.channel      = c1


 tier1.sinks.sink1.type = hdfs

tier1.sinks.sink1.hdfs.path = /tmp/flumecollector
tier1.sinks.sink1.hdfs.filePrefix = access_log
tier1.sinks.sink1.hdfs.fileSuffix = .avro
tier1.sinks.sink1.hdfs.fileType = DataStream
tier1.sinks.sink1.hdfs.serializer =  avro_event

I also added this later.
tier1.sinks.sink1.hdfs.serializer.appendNewline = true
tier1.sinks.sink1.hdfs.serializer.compressionCodec = snappy



On Fri, Oct 4, 2013 at 4:56 PM, Hari Shreedharan
<hs...@cloudera.com>wrote:

>  The default data type for HDFS Sink is Sequence file. Set the
> hdfs.fileType to DataStream. See details here:
> http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
>
>
> Thanks,
> Hari
>
> On Friday, October 4, 2013 at 6:52 AM, Deepak Subhramanian wrote:
>
> I tried using the HDFS Sink to generate the avro file by using the
> serializer as avro_event. But it is not generating avro file. But a
> sequence file. Is it not suppose to generate a avro file with default
> schema ?  Or do I have to generate the avro data from text in my
> HTTPHandler source ?
>
>  "{ \"type\":\"record\", \"name\": \"Event\", \"fields\": [" +
>
>       " {\"name\": \"headers\", \"type\": { \"type\": \"map\",
> \"values\": \"string\" } }, " +
>       " {\"name\": \"body\", \"type\": \"bytes\" } ] }");
>
>
> On Thu, Oct 3, 2013 at 3:36 PM, Deepak Subhramanian <
> deepak.subhramanian@gmail.com> wrote:
>
> Hi ,
>
> I want to convert xml files in text to an avro file and store it in hdfs .
> I get the xml files as a post request. I extended the  HTTPHandler to
> process the XML post request. Do I have to convert the data in text to avro
> in HTTPHandler or does the Avro Sink or HDFSSink convert it directly to
> avro with some configuration details. I want to store the entire xml string
> in an avro variable.
>
> Thanks in advance for any inputs.
> Deepak Subhramanian
>
>
>
>
> --
> Deepak Subhramanian
>
>
>


-- 
Deepak Subhramanian

Re: Converting text to avro in Flume

Posted by Hari Shreedharan <hs...@cloudera.com>.
The default data type for HDFS Sink is Sequence file. Set the hdfs.fileType to DataStream. See details here: http://flume.apache.org/FlumeUserGuide.html#hdfs-sink 


Thanks,
Hari


On Friday, October 4, 2013 at 6:52 AM, Deepak Subhramanian wrote:

> I tried using the HDFS Sink to generate the avro file by using the serializer as avro_event. But it is not generating avro file. But a sequence file. Is it not suppose to generate a avro file with default schema ?  Or do I have to generate the avro data from text in my HTTPHandler source ? 
> 
>  "{ \"type\":\"record\", \"name\": \"Event\", \"fields\": [" + 
>       " {\"name\": \"headers\", \"type\": { \"type\": \"map\", \"values\": \"string\" } }, " +
>       " {\"name\": \"body\", \"type\": \"bytes\" } ] }");  
> 
> 
> 
> On Thu, Oct 3, 2013 at 3:36 PM, Deepak Subhramanian <deepak.subhramanian@gmail.com (mailto:deepak.subhramanian@gmail.com)> wrote:
> > Hi ,
> > 
> > I want to convert xml files in text to an avro file and store it in hdfs . I get the xml files as a post request. I extended the  HTTPHandler to process the XML post request. Do I have to convert the data in text to avro in HTTPHandler or does the Avro Sink or HDFSSink convert it directly to avro with some configuration details. I want to store the entire xml string in an avro variable.  
> > 
> > Thanks in advance for any inputs. 
> > Deepak Subhramanian 
> > 
> > 
> 
> 
> 
> 
> 
> -- 
> Deepak Subhramanian 


Re: Converting text to avro in Flume

Posted by Deepak Subhramanian <de...@gmail.com>.
I tried using the HDFS Sink to generate the avro file by using the
serializer as avro_event. But it is not generating avro file. But a
sequence file. Is it not suppose to generate a avro file with default
schema ?  Or do I have to generate the avro data from text in my
HTTPHandler source ?

 "{ \"type\":\"record\", \"name\": \"Event\", \"fields\": [" +

      " {\"name\": \"headers\", \"type\": { \"type\": \"map\", \"values\":
\"string\" } }, " +
      " {\"name\": \"body\", \"type\": \"bytes\" } ] }");


On Thu, Oct 3, 2013 at 3:36 PM, Deepak Subhramanian <
deepak.subhramanian@gmail.com> wrote:

> Hi ,
>
> I want to convert xml files in text to an avro file and store it in hdfs .
> I get the xml files as a post request. I extended the  HTTPHandler to
> process the XML post request. Do I have to convert the data in text to avro
> in HTTPHandler or does the Avro Sink or HDFSSink convert it directly to
> avro with some configuration details. I want to store the entire xml string
> in an avro variable.
>
> Thanks in advance for any inputs.
> Deepak Subhramanian
>



-- 
Deepak Subhramanian