You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Nikhil Shirke <Ni...@kpitcummins.com> on 2013/03/27 07:01:44 UTC

FW: Flume - HTTPSource & HDFSSink

Hello,



I have following configuration:



agent.sources = httpSrc
agent.channels = memoryChannel
agent.sinks = hdfsSink



# For each one of the sources, the type is defined
agent.sources.httpSrc.type = org.apache.flume.source.http.HTTPSource
agent.sources.httpSrc.port = 9000
agent.sources.httpSrc.handler = org.apache.flume.source.http.JSONHandler

# The channel can be defined as follows.
agent.sources.httpSrc.channels = memoryChannel



# Each sink's type must be defined
agent.sinks.hdfsSink.type = hdfs
agent.sinks.hdfsSink.hdfs.path = hdfs://10.187.142.39/flume/
agent.sinks.hdfsSink.fileType = DataStream
agent.sinks.hdfsSink.writeFormat = Text
agent.sinks.hdfsSink.serializer = Text



#Specify the channel the sink should use
agent.sinks.hdfsSink.channel = memoryChannel
agent.sinks.logSink.channel = memoryChannel



# Each channel's type is defined.
agent.channels.memoryChannel.type = memory

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 1000
agent.channels.memoryChannel.transactionCapacity = 100



When execute following command, it generates a file in /flume folder.

curl -X POST -d '[{ "headers" : { "timestamp" : "434324343", "host" : "random_host.example.com" }, "body" : "random_body" }, { "headers" : { "namenode" : "namenode.example.com", "datanode" : "random_datanode.example.com" }, "body" : "really_random_body" }]' 10.187.142.125:9000



However file contents are as follows and its in binary format.

SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritableÃç£idYvQS¸/\Á=ãCw
                                                                                         random_bod=«9really_random_body



Thanks,

Nikhil Shirke


Re: Flume - HTTPSource & HDFSSink

Posted by Hari Shreedharan <hs...@cloudera.com>.
Nikhil,  

Flume's HDFS Sink will by default write to HDFS as Sequence Files. If you want it to write as text or avro, you must use DataStream. Please see the Flume User Guide.

Thanks,
Hari

--  
Hari Shreedharan


On Tuesday, March 26, 2013 at 11:01 PM, Nikhil Shirke wrote:

> Hello,
>   
> I have following configuration:
>   
> agent.sources = httpSrc
> agent.channels = memoryChannel
> agent.sinks = hdfsSink  
>   
> # For each one of the sources, the type is defined
> agent.sources.httpSrc.type = org.apache.flume.source.http.HTTPSource
> agent.sources.httpSrc.port = 9000
> agent.sources.httpSrc.handler = org.apache.flume.source.http.JSONHandler  
>  
> # The channel can be defined as follows.
> agent.sources.httpSrc.channels = memoryChannel  
>   
> # Each sink's type must be defined
> agent.sinks.hdfsSink.type = hdfs
> agent.sinks.hdfsSink.hdfs.path = hdfs://10.187.142.39/flume/
> agent.sinks.hdfsSink.fileType = DataStream
> agent.sinks.hdfsSink.writeFormat = Text
> agent.sinks.hdfsSink.serializer = Text  
>   
> #Specify the channel the sink should use
> agent.sinks.hdfsSink.channel = memoryChannel
> agent.sinks.logSink.channel = memoryChannel  
>   
> # Each channel's type is defined.
> agent.channels.memoryChannel.type = memory  
> # Other config values specific to each type of channel(sink or source)
> # can be defined as well
> # In this case, it specifies the capacity of the memory channel
> agent.channels.memoryChannel.capacity = 1000
> agent.channels.memoryChannel.transactionCapacity = 100  
>   
> When execute following command, it generates a file in /flume folder.  
> curl -X POST -d '[{ "headers" : { "timestamp" : "434324343", "host" : "random_host.example.com (http://random_host.example.com)" }, "body" : "random_body" }, { "headers" : { "namenode" : "namenode.example.com (http://namenode.example.com)", "datanode" : "random_datanode.example.com (http://random_datanode.example.com)" }, "body" : "really_random_body" }]' 10.187.142.125:9000
>   
> However file contents are as follows and its in binary format.
> SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritableÃç£idYvQS¸/\Á=ãCw
>                                                                                          random_bod=«9really_random_body  
>   
> Thanks,
> Nikhil Shirke
>  
>  
>  
>  
>  
> This message contains information that may be privileged or confidential and is the property of the KPIT Cummins Infosystems Ltd. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. KPIT Cummins Infosystems Ltd. does not accept any liability for virus infected mails.