You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Kartik Vashishta <ka...@gmail.com> on 2016/03/20 16:12:53 UTC

Flume/hadoop question

Team,

I have been following this web page:
http://cuddletech.com/?p=795

I have been using the most recent version of the software.

I have been able to install teh agent and the collector, I cannot however
get the logs to be written to the HDFS path.

flume was instaleed on the webserver and on the hadoop slave, however,
while the two collectors don't show any obvious errors, and logs are
written in /var/log/flume, they are not present on the HDFS path.

flume.conf on the collector (running on hadoop slave):
collector.sources = AvroIn
collector.sources.AvroIn.type = avro
collector.sources.AvroIn.bind = 0.0.0.0
collector.sources.AvroIn.port = 4545
collector.sources.AvroIn.channels = mc1 mc2

## Channels ########################################################
## Source writes to 2 channels, one for each sink (Fan Out)
collector.channels = mc1 mc2

# http://flume.apache.org/FlumeUserGuide.html#memory-channel
collector.channels.mc1.type = memory
collector.channels.mc1.capacity = 100

collector.channels.mc2.type = memory
collector.channels.mc2.capacity = 100

## Sinks ###########################################################
collector.sinks = LocalOut HadoopOut

## Write copy to Local Filesystem (Debugging)
# http://flume.apache.org/FlumeUserGuide.html#file-roll-sink
collector.sinks.LocalOut.type = file_roll
collector.sinks.LocalOut.sink.directory = /var/log/flume
collector.sinks.LocalOut.sink.rollInterval = 0
collector.sinks.LocalOut.channel = mc1

## Write to HDFS
# http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
collector.sinks.HadoopOut.type = hdfs
collector.sinks.HadoopOut.channel = mc2
collector.sinks.HadoopOut.hdfs.path =
/flume1/events/%{log_type}/%{host}/%y-%m-%d
#collector.sinks.HadoopOut.hdfs.path =
/opt/hadoop/hadoop/dfs/name/data/%{log_type}/%{host}/%y-%m-%d
collector.sinks.HadoopOut.hdfs.fileType = DataStream
collector.sinks.HadoopOut.hdfs.writeFormat = Text
collector.sinks.HadoopOut.hdfs.rollSize = 0
collector.sinks.HadoopOut.hdfs.rollCount = 10000
collector.sinks.HadoopOut.hdfs.rollInterval = 600

Command to start flume on collector 9hadoop slave)
 bin/flume-ng agent -c conf -f conf/flume.conf
 -Dflume.root.logger=INFO,console -n collector


NO ERRORS reported, only information messages

I did try this on the Hadoop master:
[hadoop@test49 hadoop]$ /opt/hadoop/hadoop/bin/hadoop fs -mkdir /flume1
/opt/hadoop/hadoop/bin/hadoop fs -mkdir /flume1/events

BUT
/opt/hadoop/hadoop/bin/hadoop fs -ls /flume1/events
shows nothing

On the hadoop slave:
[hadoop@test51 conf]$ /opt/hadoop/hadoop/bin/hadoop fs -ls /flume1/events
Exception in thread "main" java.lang.RuntimeException: core-site.xml not
found


Not sure if this is expected behaviour.

Please advise. Thanks in advance.

Re: Flume/hadoop question

Posted by Chris Horrocks <ch...@hor.rocks>.
Does the hadoop slave have the HDFS client config & jars?

How are you deploying the flume agent? Are you using a hadoop distribution manager like Cloudera Manager/Ambari/etc or is it a standalone instance?

> On 20 Mar 2016, at 15:12, Kartik Vashishta <ka...@gmail.com> wrote:
> 
> hadoop slave