You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Jonathan <jo...@gmail.com> on 2011/10/01 20:50:42 UTC

Changing Flume output format

Hi,

I asked this question on IRC last night but I think it was too late so I
figured I would ask it again here. I am trying to get flume to output the
raw text of the message instead of its json-esqe syslog output. I am
currently using the sink: collectorSink("hdfs://
107.20.248.101/user/flume/%Y/%m/%d/%H","syslog", 120000, "raw") and getting
the output
  {"body":"68.40.84.98 - - [29/Sep/2011:17:38:16 -0400] \"GET
/beacon?action=imp&pos=2&PlacementId=&AdType=1&hmGUID=e9a51cd2-14ec-48d5-8652-b7607af26962&advertiserPostcode=WA12+0HE&lat=53.466&lon=-2.6327&searchRadius=&DistanceModified=&sort=priceasc&vMake=&vModel=&vMaximumPrice=&vMaximumPriceModified=&vMinimumPrice=&vMinimumPriceModified=&vFuelType=&vMileage=&vAge=&vBodyType=&vTransmission=&vEngineCc=&vColour=&sellertype=&vNoOfDoors=&channel=cars&pgN=1&vMinAge=&vMaxAge=&vMinMileage=&vMaxMileage=&vMinEngineCc=&vMaxEngineCc=&Platform=&PlatformVersion=&cUserID=&cid=70&pid=7fc9e5b6-6dfc-4c45-8cd9-8709b7c8e0fb&advertisementId=201113383357957&advertiserId=0
HTTP/1.1\" 304
148","timestamp":1317332296533,"pri":"INFO","nanos":10721896540315,"host":"domU-12-31-39-0F-1D-C3.compute-1.internal","fields":{"rolltag":"20110929-173823752-0400.7672766572025.00000058"}}.


I am trying to get flume to output only this instead.
68.40.84.98 - - [29/Sep/2011:17:38:16 -0400] \"GET
/beacon?action=imp&pos=2&PlacementId=&AdType=1&hmGUID=e9a51cd2-14ec-48d5-8652-b7607af26962&advertiserPostcode=WA12+0HE&lat=53.466&lon=-2.6327&searchRadius=&DistanceModified=&sort=priceasc&vMake=&vModel=&vMaximumPrice=&vMaximumPriceModified=&vMinimumPrice=&vMinimumPriceModified=&vFuelType=&vMileage=&vAge=&vBodyType=&vTransmission=&vEngineCc=&vColour=&sellertype=&vNoOfDoors=&channel=cars&pgN=1&vMinAge=&vMaxAge=&vMinMileage=&vMaxMileage=&vMinEngineCc=&vMaxEngineCc=&Platform=&PlatformVersion=&cUserID=&cid=70&pid=7fc9e5b6-6dfc-4c45-8cd9-8709b7c8e0fb&advertisementId=201113383357957&advertiserId=0
HTTP/1.1\" 304 148


As always any help would be greatly appreciated.

Jonathan

Re: Changing Flume output format

Posted by Matthew Rathbone <ma...@foursquare.com>.
 Hey,

simply set this property in your flume-site.xml config file:

<property>
<name>flume.collector.output.format</name>
<value>raw</value>

</property>


here are the possible values:
syslog - outputs events in a syslog-like format
log4j - outputs events in a pattern similar to Hadoop's log4j pattern 
avrojson - this outputs data as json encoded by avro
avrodata - this outputs data as a avro binary encoded data
debug - used only for debugging
raw - output only the event body, no metadata

(raw is the one you want)-- 
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
matthew@foursquare.com (mailto:matthew@foursquare.com) | @rathboma (http://twitter.com/rathboma) | 4sq (http://foursquare.com/rathboma)



On Saturday, October 1, 2011 at 1:50 PM, Jonathan wrote:

> Hi,
> 
> I asked this question on IRC last night but I think it was too late so I figured I would ask it again here. I am trying to get flume to output the raw text of the message instead of its json-esqe syslog output. I am currently using the sink: collectorSink("hdfs://107.20.248.101/user/flume/%Y/%m/%d/%H (http://107.20.248.101/user/flume/%Y/%m/%d/%H)","syslog", 120000, "raw") and getting the output 
>  {"body":"68.40.84.98 - - [29/Sep/2011:17:38:16 -0400] \"GET /beacon?action=imp&pos=2&PlacementId=&AdType=1&hmGUID=e9a51cd2-14ec-48d5-8652-b7607af26962&advertiserPostcode=WA12+0HE&lat=53.466&lon=-2.6327&searchRadius=&DistanceModified=&sort=priceasc&vMake=&vModel=&vMaximumPrice=&vMaximumPriceModified=&vMinimumPrice=&vMinimumPriceModified=&vFuelType=&vMileage=&vAge=&vBodyType=&vTransmission=&vEngineCc=&vColour=&sellertype=&vNoOfDoors=&channel=cars&pgN=1&vMinAge=&vMaxAge=&vMinMileage=&vMaxMileage=&vMinEngineCc=&vMaxEngineCc=&Platform=&PlatformVersion=&cUserID=&cid=70&pid=7fc9e5b6-6dfc-4c45-8cd9-8709b7c8e0fb&advertisementId=201113383357957&advertiserId=0 HTTP/1.1\" 304 148","timestamp":1317332296533,"pri":"INFO","nanos":10721896540315,"host":"domU-12-31-39-0F-1D-C3.compute-1.internal","fields":{"rolltag":"20110929-173823752-0400.7672766572025.00000058"}}. 
> 
> I am trying to get flume to output only this instead.
> 68.40.84.98 - - [29/Sep/2011:17:38:16 -0400] \"GET /beacon?action=imp&pos=2&PlacementId=&AdType=1&hmGUID=e9a51cd2-14ec-48d5-8652-b7607af26962&advertiserPostcode=WA12+0HE&lat=53.466&lon=-2.6327&searchRadius=&DistanceModified=&sort=priceasc&vMake=&vModel=&vMaximumPrice=&vMaximumPriceModified=&vMinimumPrice=&vMinimumPriceModified=&vFuelType=&vMileage=&vAge=&vBodyType=&vTransmission=&vEngineCc=&vColour=&sellertype=&vNoOfDoors=&channel=cars&pgN=1&vMinAge=&vMaxAge=&vMinMileage=&vMaxMileage=&vMinEngineCc=&vMaxEngineCc=&Platform=&PlatformVersion=&cUserID=&cid=70&pid=7fc9e5b6-6dfc-4c45-8cd9-8709b7c8e0fb&advertisementId=201113383357957&advertiserId=0 HTTP/1.1\" 304 148
> 
> 
> As always any help would be greatly appreciated. 
> Jonathan