You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by "Cochran, David M (Contractor)" <Da...@bsee.gov> on 2012/09/12 17:06:21 UTC

splitting functions

Okay folks, after spending the better part of a week reading the docs and experimenting I'm lost.  I have flume 1.3.x working pretty much as expected on a single host.  It tails a log file and writes it to another rolling log file via flume.  No problem there, seems to work flawlessly.  Where my issue is trying to break apart the functions across multiple hosts... a single host listening for others to send their logs to.  All of my efforts have resulted in little more than headaches.

I can't even see the specified port open on what should be the logging host.  I've tried the basic examples posted on different docs but can't seem to get things working across multiple hosts.  

Would someone post a working example of the conf's needed to get me started?  Something simple that works, so I can them pick it apart to gain more understanding.  Apparently, I just don't yet have a firm enough grasp on all the pieces yet, but want to learn!

Thanks in advance!
Dave 



Re: splitting functions

Posted by Brock Noland <br...@cloudera.com>.
Nevermind, it doesn't look like FILE_ROLL supports batching....

On Wed, Sep 12, 2012 at 4:56 PM, Brock Noland <br...@cloudera.com> wrote:
> It looks like have a batch size of 1000 which could mean the sink is
> waiting for a 1000 entries...
>
> node102.sinks.filesink1.batchSize = 1000
>
>
>
> On Wed, Sep 12, 2012 at 3:12 PM, Cochran, David M (Contractor)
> <Da...@bsee.gov> wrote:
>> Putting a copy of hadoop-core.jar in the lib directory did the trick.. at least it made the errors go away..
>>
>> Just trying to sort out why nothing is getting written to the sink's files now... but when I add entries to the file being tailed nothing makes it to the sink log file(s). guess I need to run tcpdump on that port and see if anything is being sent or if the problem is on the receive side now.
>>
>> Thanks for the help!
>> Dave
>>
>>
>>
>> -----Original Message-----
>> From: Brock Noland [mailto:brock@cloudera.com]
>> Sent: Wed 9/12/2012 12:41 PM
>> To: user@flume.apache.org
>> Subject: Re: splitting functions
>>
>> Yeah that is my fault. FileChannel uses a few hadoop classes for
>> serialization. I want to get rid of that but it's just not a priority
>> item. You either need the hadoop command in your path or the
>> hadoop-core.jar in your lib directory.
>>
>> On Wed, Sep 12, 2012 at 1:38 PM, Cochran, David M (Contractor)
>> <Da...@bsee.gov> wrote:
>>> Brock,
>>>
>>> Thanks for the sample!  Starting to see a bit more light and making a little more sense now...
>>>
>>> If you wouldn't mind and have a couple mins to spare...I'm getting this error and not sure how to make it go away.. I can not use hadoop for storage instead just FILE_ROLL (ultimately the logs will need to be processed further in plain text)  I'm just not sure why....
>>>
>>> The error follows and my conf further down.
>>>
>>> 12 Sep 2012 13:18:54,120 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.start:211)  - Starting FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, /tmp/flume/data3] }...
>>> 12 Sep 2012 13:18:54,124 ERROR [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.start:234)  - Failed to start the file channel [channel=fileChannel]
>>> java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable
>>>         at java.lang.ClassLoader.defineClass1(Native Method)
>>>         at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
>>>         at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
>>>         at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>>>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>>>         at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>         at org.apache.flume.channel.file.Log$Builder.build(Log.java:144)
>>>         at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:223)
>>>         at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
>>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>         at java.lang.Thread.run(Thread.java:662)
>>> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Writable
>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>         ... 24 more
>>> 12 Sep 2012 13:18:54,126 ERROR [lifecycleSupervisor-1-0] (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:238)  - Unable to start FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, /tmp/flume/data3] } - Exception follows.
>>> java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable
>>>         at java.lang.ClassLoader.defineClass1(Native Method)
>>>         at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
>>>         at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
>>>         at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>>>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>>>         at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>         at org.apache.flume.channel.file.Log$Builder.build(Log.java:144)
>>>         at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:223)
>>>         at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
>>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>         at java.lang.Thread.run(Thread.java:662)
>>> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Writable
>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>         ... 24 more
>>> 12 Sep 2012 13:18:54,127 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.stop:249)  - Stopping FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, /tmp/flume/data3] }...
>>> 12 Sep 2012 13:18:54,127 ERROR [lifecycleSupervisor-1-0] (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:249)  - Unsuccessful attempt to shutdown component: {} due to missing dependencies. Please shutdown the agentor disable this component, or the agent will bein an undefined state.
>>> java.lang.IllegalStateException: Channel closed[channel=fileChannel]
>>>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>>>         at org.apache.flume.channel.file.FileChannel.getDepth(FileChannel.java:282)
>>>         at org.apache.flume.channel.file.FileChannel.stop(FileChannel.java:250)
>>>         at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:244)
>>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>         at java.lang.Thread.run(Thread.java:662)
>>> 12 Sep 2012 13:18:54,622 INFO  [conf-file-poller-0] (org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents:141)  - Starting Sink filesink1
>>> 12 Sep 2012 13:18:54,624 INFO  [conf-file-poller-0] (org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents:152)  - Starting Source avroSource
>>> 12 Sep 2012 13:18:54,626 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.source.AvroSource.start:138)  - Starting Avro source avroSource: { bindAddress: 0.0.0.0, port: 9432 }...
>>> 12 Sep 2012 13:18:54,641 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver event. Exception follows.
>>> java.lang.IllegalStateException: Channel closed [channel=fileChannel]
>>>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>>>         at org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:267)
>>>         at org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:118)
>>>         at org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:172)
>>>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>         at java.lang.Thread.run(Thread.java:662)
>>>
>>>
>>>
>>>
>>> Using your config this is my starting point... (trying to get it functioning on a single host first)
>>>
>>> node105.sources = tailsource
>>> node105.channels = fileChannel
>>> node105.sinks = avroSink
>>>
>>> node105.sources.tailsource.type = exec
>>> node105.sources.tailsource.command =tail -F /root/Desktop/apache-flume-1.3.0-SNAPSHOT/test.log
>>> #node105.sources.stressSource.batchSize = 1000
>>> node105.sources.tailsource.channels = fileChannel
>>>
>>> ## Sink sends avro messages to node103.bashkew.com port 9432
>>> node105.sinks.avroSink.type = avro
>>> node105.sinks.avroSink.batch-size = 1000
>>> node105.sinks.avroSink.channel = fileChannel
>>> node105.sinks.avroSink.hostname = localhost
>>> node105.sinks.avroSink.port = 9432
>>>
>>> node105.channels.fileChannel.type = file
>>> node105.channels.fileChannel.checkpointDir = /root/Desktop/apache-flume-1.3.0-SNAPSHOT/tmp/flume/checkpoint
>>> node105.channels.fileChannel.dataDirs = /root/Desktop/apache-flume-1.3.0-SNAPSHOT/tmp/flume/tmp/flume/data
>>> node105.channels.fileChannel.capacity = 10000
>>> node105.channels.fileChannel.checkpointInterval = 3000
>>> node105.channels.fileChannel.maxFileSize = 5242880
>>>
>>> node102.sources = avroSource
>>> node102.channels = fileChannel
>>> node102.sinks = filesink1
>>>
>>> ## Source listens for avro messages on port 9432 on all ips
>>> node102.sources.avroSource.type = avro
>>> node102.sources.avroSource.channels = fileChannel
>>> node102.sources.avroSource.bind = 0.0.0.0
>>> node102.sources.avroSource.port = 9432
>>>
>>> node102.sinks.filesink1.type = FILE_ROLL
>>> node102.sinks.filesink1.batchSize = 1000
>>> node102.sinks.filesink1.channel = fileChannel
>>> node102.sinks.filesink1.sink.directory = /root/Desktop/apache-flume-1.3.0-SNAPSHOT/logs/rhel5/
>>> node102.channels.fileChannel.type = file
>>> node102.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
>>> node102.channels.fileChannel.dataDirs = /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
>>> node102.channels.fileChannel.capacity = 5000
>>> node102.channels.fileChannel.checkpointInterval = 45000
>>> node102.channels.fileChannel.maxFileSize = 5242880
>>>
>>>
>>>
>>> Thanks!
>>> Dave
>>>
>>>
>>> -----Original Message-----
>>> From: Brock Noland [mailto:brock@cloudera.com]
>>> Sent: Wed 9/12/2012 9:11 AM
>>> To: user@flume.apache.org
>>> Subject: Re: splitting functions
>>>
>>> Hi,
>>>
>>> Below is a config I use to test out the FileChannel. See the comments
>>> "##" for how messages are sent from one host to another.
>>>
>>> node105.sources = stressSource
>>> node105.channels = fileChannel
>>> node105.sinks = avroSink
>>>
>>> node105.sources.stressSource.type = org.apache.flume.source.StressSource
>>> node105.sources.stressSource.batchSize = 1000
>>> node105.sources.stressSource.channels = fileChannel
>>>
>>> ## Sink sends avro messages to node103.bashkew.com port 9432
>>> node105.sinks.avroSink.type = avro
>>> node105.sinks.avroSink.batch-size = 1000
>>> node105.sinks.avroSink.channel = fileChannel
>>> node105.sinks.avroSink.hostname = node102.bashkew.com
>>> node105.sinks.avroSink.port = 9432
>>>
>>> node105.channels.fileChannel.type = file
>>> node105.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
>>> node105.channels.fileChannel.dataDirs =
>>> /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
>>> node105.channels.fileChannel.capacity = 10000
>>> node105.channels.fileChannel.checkpointInterval = 3000
>>> node105.channels.fileChannel.maxFileSize = 5242880
>>>
>>> node102.sources = avroSource
>>> node102.channels = fileChannel
>>> node102.sinks = nullSink
>>>
>>>
>>> ## Source listens for avro messages on port 9432 on all ips
>>> node102.sources.avroSource.type = avro
>>> node102.sources.avroSource.channels = fileChannel
>>> node102.sources.avroSource.bind = 0.0.0.0
>>> node102.sources.avroSource.port = 9432
>>>
>>> node102.sinks.nullSink.type = null
>>> node102.sinks.nullSink.batchSize = 1000
>>> node102.sinks.nullSink.channel = fileChannel
>>>
>>> node102.channels.fileChannel.type = file
>>> node102.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
>>> node102.channels.fileChannel.dataDirs =
>>> /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
>>> node102.channels.fileChannel.capacity = 5000
>>> node102.channels.fileChannel.checkpointInterval = 45000
>>> node102.channels.fileChannel.maxFileSize = 5242880
>>>
>>>
>>>
>>> On Wed, Sep 12, 2012 at 10:06 AM, Cochran, David M (Contractor)
>>> <Da...@bsee.gov> wrote:
>>>> Okay folks, after spending the better part of a week reading the docs and
>>>> experimenting I'm lost.  I have flume 1.3.x working pretty much as expected
>>>> on a single host.  It tails a log file and writes it to another rolling log
>>>> file via flume.  No problem there, seems to work flawlessly.  Where my issue
>>>> is trying to break apart the functions across multiple hosts... a single
>>>> host listening for others to send their logs to.  All of my efforts have
>>>> resulted in little more than headaches.
>>>>
>>>> I can't even see the specified port open on what should be the logging host.
>>>> I've tried the basic examples posted on different docs but can't seem to get
>>>> things working across multiple hosts.
>>>>
>>>> Would someone post a working example of the conf's needed to get me started?
>>>> Something simple that works, so I can them pick it apart to gain more
>>>> understanding.  Apparently, I just don't yet have a firm enough grasp on all
>>>> the pieces yet, but want to learn!
>>>>
>>>> Thanks in advance!
>>>> Dave
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>>>
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: splitting functions

Posted by Brock Noland <br...@cloudera.com>.
It looks like have a batch size of 1000 which could mean the sink is
waiting for a 1000 entries...

node102.sinks.filesink1.batchSize = 1000



On Wed, Sep 12, 2012 at 3:12 PM, Cochran, David M (Contractor)
<Da...@bsee.gov> wrote:
> Putting a copy of hadoop-core.jar in the lib directory did the trick.. at least it made the errors go away..
>
> Just trying to sort out why nothing is getting written to the sink's files now... but when I add entries to the file being tailed nothing makes it to the sink log file(s). guess I need to run tcpdump on that port and see if anything is being sent or if the problem is on the receive side now.
>
> Thanks for the help!
> Dave
>
>
>
> -----Original Message-----
> From: Brock Noland [mailto:brock@cloudera.com]
> Sent: Wed 9/12/2012 12:41 PM
> To: user@flume.apache.org
> Subject: Re: splitting functions
>
> Yeah that is my fault. FileChannel uses a few hadoop classes for
> serialization. I want to get rid of that but it's just not a priority
> item. You either need the hadoop command in your path or the
> hadoop-core.jar in your lib directory.
>
> On Wed, Sep 12, 2012 at 1:38 PM, Cochran, David M (Contractor)
> <Da...@bsee.gov> wrote:
>> Brock,
>>
>> Thanks for the sample!  Starting to see a bit more light and making a little more sense now...
>>
>> If you wouldn't mind and have a couple mins to spare...I'm getting this error and not sure how to make it go away.. I can not use hadoop for storage instead just FILE_ROLL (ultimately the logs will need to be processed further in plain text)  I'm just not sure why....
>>
>> The error follows and my conf further down.
>>
>> 12 Sep 2012 13:18:54,120 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.start:211)  - Starting FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, /tmp/flume/data3] }...
>> 12 Sep 2012 13:18:54,124 ERROR [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.start:234)  - Failed to start the file channel [channel=fileChannel]
>> java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable
>>         at java.lang.ClassLoader.defineClass1(Native Method)
>>         at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
>>         at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
>>         at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>>         at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>         at org.apache.flume.channel.file.Log$Builder.build(Log.java:144)
>>         at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:223)
>>         at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Writable
>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>         ... 24 more
>> 12 Sep 2012 13:18:54,126 ERROR [lifecycleSupervisor-1-0] (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:238)  - Unable to start FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, /tmp/flume/data3] } - Exception follows.
>> java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable
>>         at java.lang.ClassLoader.defineClass1(Native Method)
>>         at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
>>         at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
>>         at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>>         at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>         at org.apache.flume.channel.file.Log$Builder.build(Log.java:144)
>>         at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:223)
>>         at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Writable
>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>         ... 24 more
>> 12 Sep 2012 13:18:54,127 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.stop:249)  - Stopping FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, /tmp/flume/data3] }...
>> 12 Sep 2012 13:18:54,127 ERROR [lifecycleSupervisor-1-0] (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:249)  - Unsuccessful attempt to shutdown component: {} due to missing dependencies. Please shutdown the agentor disable this component, or the agent will bein an undefined state.
>> java.lang.IllegalStateException: Channel closed[channel=fileChannel]
>>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>>         at org.apache.flume.channel.file.FileChannel.getDepth(FileChannel.java:282)
>>         at org.apache.flume.channel.file.FileChannel.stop(FileChannel.java:250)
>>         at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:244)
>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> 12 Sep 2012 13:18:54,622 INFO  [conf-file-poller-0] (org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents:141)  - Starting Sink filesink1
>> 12 Sep 2012 13:18:54,624 INFO  [conf-file-poller-0] (org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents:152)  - Starting Source avroSource
>> 12 Sep 2012 13:18:54,626 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.source.AvroSource.start:138)  - Starting Avro source avroSource: { bindAddress: 0.0.0.0, port: 9432 }...
>> 12 Sep 2012 13:18:54,641 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver event. Exception follows.
>> java.lang.IllegalStateException: Channel closed [channel=fileChannel]
>>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>>         at org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:267)
>>         at org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:118)
>>         at org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:172)
>>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>         at java.lang.Thread.run(Thread.java:662)
>>
>>
>>
>>
>> Using your config this is my starting point... (trying to get it functioning on a single host first)
>>
>> node105.sources = tailsource
>> node105.channels = fileChannel
>> node105.sinks = avroSink
>>
>> node105.sources.tailsource.type = exec
>> node105.sources.tailsource.command =tail -F /root/Desktop/apache-flume-1.3.0-SNAPSHOT/test.log
>> #node105.sources.stressSource.batchSize = 1000
>> node105.sources.tailsource.channels = fileChannel
>>
>> ## Sink sends avro messages to node103.bashkew.com port 9432
>> node105.sinks.avroSink.type = avro
>> node105.sinks.avroSink.batch-size = 1000
>> node105.sinks.avroSink.channel = fileChannel
>> node105.sinks.avroSink.hostname = localhost
>> node105.sinks.avroSink.port = 9432
>>
>> node105.channels.fileChannel.type = file
>> node105.channels.fileChannel.checkpointDir = /root/Desktop/apache-flume-1.3.0-SNAPSHOT/tmp/flume/checkpoint
>> node105.channels.fileChannel.dataDirs = /root/Desktop/apache-flume-1.3.0-SNAPSHOT/tmp/flume/tmp/flume/data
>> node105.channels.fileChannel.capacity = 10000
>> node105.channels.fileChannel.checkpointInterval = 3000
>> node105.channels.fileChannel.maxFileSize = 5242880
>>
>> node102.sources = avroSource
>> node102.channels = fileChannel
>> node102.sinks = filesink1
>>
>> ## Source listens for avro messages on port 9432 on all ips
>> node102.sources.avroSource.type = avro
>> node102.sources.avroSource.channels = fileChannel
>> node102.sources.avroSource.bind = 0.0.0.0
>> node102.sources.avroSource.port = 9432
>>
>> node102.sinks.filesink1.type = FILE_ROLL
>> node102.sinks.filesink1.batchSize = 1000
>> node102.sinks.filesink1.channel = fileChannel
>> node102.sinks.filesink1.sink.directory = /root/Desktop/apache-flume-1.3.0-SNAPSHOT/logs/rhel5/
>> node102.channels.fileChannel.type = file
>> node102.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
>> node102.channels.fileChannel.dataDirs = /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
>> node102.channels.fileChannel.capacity = 5000
>> node102.channels.fileChannel.checkpointInterval = 45000
>> node102.channels.fileChannel.maxFileSize = 5242880
>>
>>
>>
>> Thanks!
>> Dave
>>
>>
>> -----Original Message-----
>> From: Brock Noland [mailto:brock@cloudera.com]
>> Sent: Wed 9/12/2012 9:11 AM
>> To: user@flume.apache.org
>> Subject: Re: splitting functions
>>
>> Hi,
>>
>> Below is a config I use to test out the FileChannel. See the comments
>> "##" for how messages are sent from one host to another.
>>
>> node105.sources = stressSource
>> node105.channels = fileChannel
>> node105.sinks = avroSink
>>
>> node105.sources.stressSource.type = org.apache.flume.source.StressSource
>> node105.sources.stressSource.batchSize = 1000
>> node105.sources.stressSource.channels = fileChannel
>>
>> ## Sink sends avro messages to node103.bashkew.com port 9432
>> node105.sinks.avroSink.type = avro
>> node105.sinks.avroSink.batch-size = 1000
>> node105.sinks.avroSink.channel = fileChannel
>> node105.sinks.avroSink.hostname = node102.bashkew.com
>> node105.sinks.avroSink.port = 9432
>>
>> node105.channels.fileChannel.type = file
>> node105.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
>> node105.channels.fileChannel.dataDirs =
>> /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
>> node105.channels.fileChannel.capacity = 10000
>> node105.channels.fileChannel.checkpointInterval = 3000
>> node105.channels.fileChannel.maxFileSize = 5242880
>>
>> node102.sources = avroSource
>> node102.channels = fileChannel
>> node102.sinks = nullSink
>>
>>
>> ## Source listens for avro messages on port 9432 on all ips
>> node102.sources.avroSource.type = avro
>> node102.sources.avroSource.channels = fileChannel
>> node102.sources.avroSource.bind = 0.0.0.0
>> node102.sources.avroSource.port = 9432
>>
>> node102.sinks.nullSink.type = null
>> node102.sinks.nullSink.batchSize = 1000
>> node102.sinks.nullSink.channel = fileChannel
>>
>> node102.channels.fileChannel.type = file
>> node102.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
>> node102.channels.fileChannel.dataDirs =
>> /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
>> node102.channels.fileChannel.capacity = 5000
>> node102.channels.fileChannel.checkpointInterval = 45000
>> node102.channels.fileChannel.maxFileSize = 5242880
>>
>>
>>
>> On Wed, Sep 12, 2012 at 10:06 AM, Cochran, David M (Contractor)
>> <Da...@bsee.gov> wrote:
>>> Okay folks, after spending the better part of a week reading the docs and
>>> experimenting I'm lost.  I have flume 1.3.x working pretty much as expected
>>> on a single host.  It tails a log file and writes it to another rolling log
>>> file via flume.  No problem there, seems to work flawlessly.  Where my issue
>>> is trying to break apart the functions across multiple hosts... a single
>>> host listening for others to send their logs to.  All of my efforts have
>>> resulted in little more than headaches.
>>>
>>> I can't even see the specified port open on what should be the logging host.
>>> I've tried the basic examples posted on different docs but can't seem to get
>>> things working across multiple hosts.
>>>
>>> Would someone post a working example of the conf's needed to get me started?
>>> Something simple that works, so I can them pick it apart to gain more
>>> understanding.  Apparently, I just don't yet have a firm enough grasp on all
>>> the pieces yet, but want to learn!
>>>
>>> Thanks in advance!
>>> Dave
>>>
>>>
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

RE: splitting functions

Posted by "Cochran, David M (Contractor)" <Da...@bsee.gov>.
Putting a copy of hadoop-core.jar in the lib directory did the trick.. at least it made the errors go away..

Just trying to sort out why nothing is getting written to the sink's files now... but when I add entries to the file being tailed nothing makes it to the sink log file(s). guess I need to run tcpdump on that port and see if anything is being sent or if the problem is on the receive side now.

Thanks for the help!
Dave



-----Original Message-----
From: Brock Noland [mailto:brock@cloudera.com]
Sent: Wed 9/12/2012 12:41 PM
To: user@flume.apache.org
Subject: Re: splitting functions
 
Yeah that is my fault. FileChannel uses a few hadoop classes for
serialization. I want to get rid of that but it's just not a priority
item. You either need the hadoop command in your path or the
hadoop-core.jar in your lib directory.

On Wed, Sep 12, 2012 at 1:38 PM, Cochran, David M (Contractor)
<Da...@bsee.gov> wrote:
> Brock,
>
> Thanks for the sample!  Starting to see a bit more light and making a little more sense now...
>
> If you wouldn't mind and have a couple mins to spare...I'm getting this error and not sure how to make it go away.. I can not use hadoop for storage instead just FILE_ROLL (ultimately the logs will need to be processed further in plain text)  I'm just not sure why....
>
> The error follows and my conf further down.
>
> 12 Sep 2012 13:18:54,120 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.start:211)  - Starting FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, /tmp/flume/data3] }...
> 12 Sep 2012 13:18:54,124 ERROR [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.start:234)  - Failed to start the file channel [channel=fileChannel]
> java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable
>         at java.lang.ClassLoader.defineClass1(Native Method)
>         at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
>         at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
>         at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>         at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>         at org.apache.flume.channel.file.Log$Builder.build(Log.java:144)
>         at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:223)
>         at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Writable
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>         ... 24 more
> 12 Sep 2012 13:18:54,126 ERROR [lifecycleSupervisor-1-0] (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:238)  - Unable to start FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, /tmp/flume/data3] } - Exception follows.
> java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable
>         at java.lang.ClassLoader.defineClass1(Native Method)
>         at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
>         at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
>         at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>         at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>         at org.apache.flume.channel.file.Log$Builder.build(Log.java:144)
>         at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:223)
>         at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Writable
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>         ... 24 more
> 12 Sep 2012 13:18:54,127 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.stop:249)  - Stopping FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, /tmp/flume/data3] }...
> 12 Sep 2012 13:18:54,127 ERROR [lifecycleSupervisor-1-0] (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:249)  - Unsuccessful attempt to shutdown component: {} due to missing dependencies. Please shutdown the agentor disable this component, or the agent will bein an undefined state.
> java.lang.IllegalStateException: Channel closed[channel=fileChannel]
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.channel.file.FileChannel.getDepth(FileChannel.java:282)
>         at org.apache.flume.channel.file.FileChannel.stop(FileChannel.java:250)
>         at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:244)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 12 Sep 2012 13:18:54,622 INFO  [conf-file-poller-0] (org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents:141)  - Starting Sink filesink1
> 12 Sep 2012 13:18:54,624 INFO  [conf-file-poller-0] (org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents:152)  - Starting Source avroSource
> 12 Sep 2012 13:18:54,626 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.source.AvroSource.start:138)  - Starting Avro source avroSource: { bindAddress: 0.0.0.0, port: 9432 }...
> 12 Sep 2012 13:18:54,641 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver event. Exception follows.
> java.lang.IllegalStateException: Channel closed [channel=fileChannel]
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:267)
>         at org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:118)
>         at org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:172)
>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:662)
>
>
>
>
> Using your config this is my starting point... (trying to get it functioning on a single host first)
>
> node105.sources = tailsource
> node105.channels = fileChannel
> node105.sinks = avroSink
>
> node105.sources.tailsource.type = exec
> node105.sources.tailsource.command =tail -F /root/Desktop/apache-flume-1.3.0-SNAPSHOT/test.log
> #node105.sources.stressSource.batchSize = 1000
> node105.sources.tailsource.channels = fileChannel
>
> ## Sink sends avro messages to node103.bashkew.com port 9432
> node105.sinks.avroSink.type = avro
> node105.sinks.avroSink.batch-size = 1000
> node105.sinks.avroSink.channel = fileChannel
> node105.sinks.avroSink.hostname = localhost
> node105.sinks.avroSink.port = 9432
>
> node105.channels.fileChannel.type = file
> node105.channels.fileChannel.checkpointDir = /root/Desktop/apache-flume-1.3.0-SNAPSHOT/tmp/flume/checkpoint
> node105.channels.fileChannel.dataDirs = /root/Desktop/apache-flume-1.3.0-SNAPSHOT/tmp/flume/tmp/flume/data
> node105.channels.fileChannel.capacity = 10000
> node105.channels.fileChannel.checkpointInterval = 3000
> node105.channels.fileChannel.maxFileSize = 5242880
>
> node102.sources = avroSource
> node102.channels = fileChannel
> node102.sinks = filesink1
>
> ## Source listens for avro messages on port 9432 on all ips
> node102.sources.avroSource.type = avro
> node102.sources.avroSource.channels = fileChannel
> node102.sources.avroSource.bind = 0.0.0.0
> node102.sources.avroSource.port = 9432
>
> node102.sinks.filesink1.type = FILE_ROLL
> node102.sinks.filesink1.batchSize = 1000
> node102.sinks.filesink1.channel = fileChannel
> node102.sinks.filesink1.sink.directory = /root/Desktop/apache-flume-1.3.0-SNAPSHOT/logs/rhel5/
> node102.channels.fileChannel.type = file
> node102.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
> node102.channels.fileChannel.dataDirs = /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
> node102.channels.fileChannel.capacity = 5000
> node102.channels.fileChannel.checkpointInterval = 45000
> node102.channels.fileChannel.maxFileSize = 5242880
>
>
>
> Thanks!
> Dave
>
>
> -----Original Message-----
> From: Brock Noland [mailto:brock@cloudera.com]
> Sent: Wed 9/12/2012 9:11 AM
> To: user@flume.apache.org
> Subject: Re: splitting functions
>
> Hi,
>
> Below is a config I use to test out the FileChannel. See the comments
> "##" for how messages are sent from one host to another.
>
> node105.sources = stressSource
> node105.channels = fileChannel
> node105.sinks = avroSink
>
> node105.sources.stressSource.type = org.apache.flume.source.StressSource
> node105.sources.stressSource.batchSize = 1000
> node105.sources.stressSource.channels = fileChannel
>
> ## Sink sends avro messages to node103.bashkew.com port 9432
> node105.sinks.avroSink.type = avro
> node105.sinks.avroSink.batch-size = 1000
> node105.sinks.avroSink.channel = fileChannel
> node105.sinks.avroSink.hostname = node102.bashkew.com
> node105.sinks.avroSink.port = 9432
>
> node105.channels.fileChannel.type = file
> node105.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
> node105.channels.fileChannel.dataDirs =
> /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
> node105.channels.fileChannel.capacity = 10000
> node105.channels.fileChannel.checkpointInterval = 3000
> node105.channels.fileChannel.maxFileSize = 5242880
>
> node102.sources = avroSource
> node102.channels = fileChannel
> node102.sinks = nullSink
>
>
> ## Source listens for avro messages on port 9432 on all ips
> node102.sources.avroSource.type = avro
> node102.sources.avroSource.channels = fileChannel
> node102.sources.avroSource.bind = 0.0.0.0
> node102.sources.avroSource.port = 9432
>
> node102.sinks.nullSink.type = null
> node102.sinks.nullSink.batchSize = 1000
> node102.sinks.nullSink.channel = fileChannel
>
> node102.channels.fileChannel.type = file
> node102.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
> node102.channels.fileChannel.dataDirs =
> /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
> node102.channels.fileChannel.capacity = 5000
> node102.channels.fileChannel.checkpointInterval = 45000
> node102.channels.fileChannel.maxFileSize = 5242880
>
>
>
> On Wed, Sep 12, 2012 at 10:06 AM, Cochran, David M (Contractor)
> <Da...@bsee.gov> wrote:
>> Okay folks, after spending the better part of a week reading the docs and
>> experimenting I'm lost.  I have flume 1.3.x working pretty much as expected
>> on a single host.  It tails a log file and writes it to another rolling log
>> file via flume.  No problem there, seems to work flawlessly.  Where my issue
>> is trying to break apart the functions across multiple hosts... a single
>> host listening for others to send their logs to.  All of my efforts have
>> resulted in little more than headaches.
>>
>> I can't even see the specified port open on what should be the logging host.
>> I've tried the basic examples posted on different docs but can't seem to get
>> things working across multiple hosts.
>>
>> Would someone post a working example of the conf's needed to get me started?
>> Something simple that works, so I can them pick it apart to gain more
>> understanding.  Apparently, I just don't yet have a firm enough grasp on all
>> the pieces yet, but want to learn!
>>
>> Thanks in advance!
>> Dave
>>
>>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/


Re: splitting functions

Posted by Brock Noland <br...@cloudera.com>.
Yeah that is my fault. FileChannel uses a few hadoop classes for
serialization. I want to get rid of that but it's just not a priority
item. You either need the hadoop command in your path or the
hadoop-core.jar in your lib directory.

On Wed, Sep 12, 2012 at 1:38 PM, Cochran, David M (Contractor)
<Da...@bsee.gov> wrote:
> Brock,
>
> Thanks for the sample!  Starting to see a bit more light and making a little more sense now...
>
> If you wouldn't mind and have a couple mins to spare...I'm getting this error and not sure how to make it go away.. I can not use hadoop for storage instead just FILE_ROLL (ultimately the logs will need to be processed further in plain text)  I'm just not sure why....
>
> The error follows and my conf further down.
>
> 12 Sep 2012 13:18:54,120 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.start:211)  - Starting FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, /tmp/flume/data3] }...
> 12 Sep 2012 13:18:54,124 ERROR [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.start:234)  - Failed to start the file channel [channel=fileChannel]
> java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable
>         at java.lang.ClassLoader.defineClass1(Native Method)
>         at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
>         at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
>         at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>         at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>         at org.apache.flume.channel.file.Log$Builder.build(Log.java:144)
>         at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:223)
>         at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Writable
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>         ... 24 more
> 12 Sep 2012 13:18:54,126 ERROR [lifecycleSupervisor-1-0] (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:238)  - Unable to start FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, /tmp/flume/data3] } - Exception follows.
> java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable
>         at java.lang.ClassLoader.defineClass1(Native Method)
>         at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
>         at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
>         at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>         at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>         at org.apache.flume.channel.file.Log$Builder.build(Log.java:144)
>         at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:223)
>         at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Writable
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>         ... 24 more
> 12 Sep 2012 13:18:54,127 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.stop:249)  - Stopping FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, /tmp/flume/data3] }...
> 12 Sep 2012 13:18:54,127 ERROR [lifecycleSupervisor-1-0] (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:249)  - Unsuccessful attempt to shutdown component: {} due to missing dependencies. Please shutdown the agentor disable this component, or the agent will bein an undefined state.
> java.lang.IllegalStateException: Channel closed[channel=fileChannel]
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.channel.file.FileChannel.getDepth(FileChannel.java:282)
>         at org.apache.flume.channel.file.FileChannel.stop(FileChannel.java:250)
>         at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:244)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 12 Sep 2012 13:18:54,622 INFO  [conf-file-poller-0] (org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents:141)  - Starting Sink filesink1
> 12 Sep 2012 13:18:54,624 INFO  [conf-file-poller-0] (org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents:152)  - Starting Source avroSource
> 12 Sep 2012 13:18:54,626 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.source.AvroSource.start:138)  - Starting Avro source avroSource: { bindAddress: 0.0.0.0, port: 9432 }...
> 12 Sep 2012 13:18:54,641 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver event. Exception follows.
> java.lang.IllegalStateException: Channel closed [channel=fileChannel]
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:267)
>         at org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:118)
>         at org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:172)
>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:662)
>
>
>
>
> Using your config this is my starting point... (trying to get it functioning on a single host first)
>
> node105.sources = tailsource
> node105.channels = fileChannel
> node105.sinks = avroSink
>
> node105.sources.tailsource.type = exec
> node105.sources.tailsource.command =tail -F /root/Desktop/apache-flume-1.3.0-SNAPSHOT/test.log
> #node105.sources.stressSource.batchSize = 1000
> node105.sources.tailsource.channels = fileChannel
>
> ## Sink sends avro messages to node103.bashkew.com port 9432
> node105.sinks.avroSink.type = avro
> node105.sinks.avroSink.batch-size = 1000
> node105.sinks.avroSink.channel = fileChannel
> node105.sinks.avroSink.hostname = localhost
> node105.sinks.avroSink.port = 9432
>
> node105.channels.fileChannel.type = file
> node105.channels.fileChannel.checkpointDir = /root/Desktop/apache-flume-1.3.0-SNAPSHOT/tmp/flume/checkpoint
> node105.channels.fileChannel.dataDirs = /root/Desktop/apache-flume-1.3.0-SNAPSHOT/tmp/flume/tmp/flume/data
> node105.channels.fileChannel.capacity = 10000
> node105.channels.fileChannel.checkpointInterval = 3000
> node105.channels.fileChannel.maxFileSize = 5242880
>
> node102.sources = avroSource
> node102.channels = fileChannel
> node102.sinks = filesink1
>
> ## Source listens for avro messages on port 9432 on all ips
> node102.sources.avroSource.type = avro
> node102.sources.avroSource.channels = fileChannel
> node102.sources.avroSource.bind = 0.0.0.0
> node102.sources.avroSource.port = 9432
>
> node102.sinks.filesink1.type = FILE_ROLL
> node102.sinks.filesink1.batchSize = 1000
> node102.sinks.filesink1.channel = fileChannel
> node102.sinks.filesink1.sink.directory = /root/Desktop/apache-flume-1.3.0-SNAPSHOT/logs/rhel5/
> node102.channels.fileChannel.type = file
> node102.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
> node102.channels.fileChannel.dataDirs = /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
> node102.channels.fileChannel.capacity = 5000
> node102.channels.fileChannel.checkpointInterval = 45000
> node102.channels.fileChannel.maxFileSize = 5242880
>
>
>
> Thanks!
> Dave
>
>
> -----Original Message-----
> From: Brock Noland [mailto:brock@cloudera.com]
> Sent: Wed 9/12/2012 9:11 AM
> To: user@flume.apache.org
> Subject: Re: splitting functions
>
> Hi,
>
> Below is a config I use to test out the FileChannel. See the comments
> "##" for how messages are sent from one host to another.
>
> node105.sources = stressSource
> node105.channels = fileChannel
> node105.sinks = avroSink
>
> node105.sources.stressSource.type = org.apache.flume.source.StressSource
> node105.sources.stressSource.batchSize = 1000
> node105.sources.stressSource.channels = fileChannel
>
> ## Sink sends avro messages to node103.bashkew.com port 9432
> node105.sinks.avroSink.type = avro
> node105.sinks.avroSink.batch-size = 1000
> node105.sinks.avroSink.channel = fileChannel
> node105.sinks.avroSink.hostname = node102.bashkew.com
> node105.sinks.avroSink.port = 9432
>
> node105.channels.fileChannel.type = file
> node105.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
> node105.channels.fileChannel.dataDirs =
> /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
> node105.channels.fileChannel.capacity = 10000
> node105.channels.fileChannel.checkpointInterval = 3000
> node105.channels.fileChannel.maxFileSize = 5242880
>
> node102.sources = avroSource
> node102.channels = fileChannel
> node102.sinks = nullSink
>
>
> ## Source listens for avro messages on port 9432 on all ips
> node102.sources.avroSource.type = avro
> node102.sources.avroSource.channels = fileChannel
> node102.sources.avroSource.bind = 0.0.0.0
> node102.sources.avroSource.port = 9432
>
> node102.sinks.nullSink.type = null
> node102.sinks.nullSink.batchSize = 1000
> node102.sinks.nullSink.channel = fileChannel
>
> node102.channels.fileChannel.type = file
> node102.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
> node102.channels.fileChannel.dataDirs =
> /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
> node102.channels.fileChannel.capacity = 5000
> node102.channels.fileChannel.checkpointInterval = 45000
> node102.channels.fileChannel.maxFileSize = 5242880
>
>
>
> On Wed, Sep 12, 2012 at 10:06 AM, Cochran, David M (Contractor)
> <Da...@bsee.gov> wrote:
>> Okay folks, after spending the better part of a week reading the docs and
>> experimenting I'm lost.  I have flume 1.3.x working pretty much as expected
>> on a single host.  It tails a log file and writes it to another rolling log
>> file via flume.  No problem there, seems to work flawlessly.  Where my issue
>> is trying to break apart the functions across multiple hosts... a single
>> host listening for others to send their logs to.  All of my efforts have
>> resulted in little more than headaches.
>>
>> I can't even see the specified port open on what should be the logging host.
>> I've tried the basic examples posted on different docs but can't seem to get
>> things working across multiple hosts.
>>
>> Would someone post a working example of the conf's needed to get me started?
>> Something simple that works, so I can them pick it apart to gain more
>> understanding.  Apparently, I just don't yet have a firm enough grasp on all
>> the pieces yet, but want to learn!
>>
>> Thanks in advance!
>> Dave
>>
>>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

RE: splitting functions

Posted by "Cochran, David M (Contractor)" <Da...@bsee.gov>.
Brock,

Thanks for the sample!  Starting to see a bit more light and making a little more sense now...

If you wouldn't mind and have a couple mins to spare...I'm getting this error and not sure how to make it go away.. I can not use hadoop for storage instead just FILE_ROLL (ultimately the logs will need to be processed further in plain text)  I'm just not sure why....

The error follows and my conf further down.

12 Sep 2012 13:18:54,120 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.start:211)  - Starting FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, /tmp/flume/data3] }...
12 Sep 2012 13:18:54,124 ERROR [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.start:234)  - Failed to start the file channel [channel=fileChannel]
java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
        at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at org.apache.flume.channel.file.Log$Builder.build(Log.java:144)
        at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:223)
        at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Writable
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        ... 24 more
12 Sep 2012 13:18:54,126 ERROR [lifecycleSupervisor-1-0] (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:238)  - Unable to start FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, /tmp/flume/data3] } - Exception follows.
java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
        at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at org.apache.flume.channel.file.Log$Builder.build(Log.java:144)
        at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:223)
        at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Writable
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        ... 24 more
12 Sep 2012 13:18:54,127 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.stop:249)  - Stopping FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, /tmp/flume/data3] }...
12 Sep 2012 13:18:54,127 ERROR [lifecycleSupervisor-1-0] (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:249)  - Unsuccessful attempt to shutdown component: {} due to missing dependencies. Please shutdown the agentor disable this component, or the agent will bein an undefined state.
java.lang.IllegalStateException: Channel closed[channel=fileChannel]
        at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
        at org.apache.flume.channel.file.FileChannel.getDepth(FileChannel.java:282)
        at org.apache.flume.channel.file.FileChannel.stop(FileChannel.java:250)
        at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:244)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
12 Sep 2012 13:18:54,622 INFO  [conf-file-poller-0] (org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents:141)  - Starting Sink filesink1
12 Sep 2012 13:18:54,624 INFO  [conf-file-poller-0] (org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents:152)  - Starting Source avroSource
12 Sep 2012 13:18:54,626 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.source.AvroSource.start:138)  - Starting Avro source avroSource: { bindAddress: 0.0.0.0, port: 9432 }...
12 Sep 2012 13:18:54,641 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver event. Exception follows.
java.lang.IllegalStateException: Channel closed [channel=fileChannel]
        at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
        at org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:267)
        at org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:118)
        at org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:172)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:662)




Using your config this is my starting point... (trying to get it functioning on a single host first)

node105.sources = tailsource
node105.channels = fileChannel
node105.sinks = avroSink

node105.sources.tailsource.type = exec
node105.sources.tailsource.command =tail -F /root/Desktop/apache-flume-1.3.0-SNAPSHOT/test.log
#node105.sources.stressSource.batchSize = 1000
node105.sources.tailsource.channels = fileChannel

## Sink sends avro messages to node103.bashkew.com port 9432
node105.sinks.avroSink.type = avro
node105.sinks.avroSink.batch-size = 1000
node105.sinks.avroSink.channel = fileChannel
node105.sinks.avroSink.hostname = localhost 
node105.sinks.avroSink.port = 9432

node105.channels.fileChannel.type = file
node105.channels.fileChannel.checkpointDir = /root/Desktop/apache-flume-1.3.0-SNAPSHOT/tmp/flume/checkpoint
node105.channels.fileChannel.dataDirs = /root/Desktop/apache-flume-1.3.0-SNAPSHOT/tmp/flume/tmp/flume/data
node105.channels.fileChannel.capacity = 10000
node105.channels.fileChannel.checkpointInterval = 3000
node105.channels.fileChannel.maxFileSize = 5242880

node102.sources = avroSource
node102.channels = fileChannel
node102.sinks = filesink1

## Source listens for avro messages on port 9432 on all ips
node102.sources.avroSource.type = avro
node102.sources.avroSource.channels = fileChannel
node102.sources.avroSource.bind = 0.0.0.0
node102.sources.avroSource.port = 9432

node102.sinks.filesink1.type = FILE_ROLL
node102.sinks.filesink1.batchSize = 1000
node102.sinks.filesink1.channel = fileChannel
node102.sinks.filesink1.sink.directory = /root/Desktop/apache-flume-1.3.0-SNAPSHOT/logs/rhel5/
node102.channels.fileChannel.type = file
node102.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
node102.channels.fileChannel.dataDirs = /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
node102.channels.fileChannel.capacity = 5000
node102.channels.fileChannel.checkpointInterval = 45000
node102.channels.fileChannel.maxFileSize = 5242880



Thanks!
Dave


-----Original Message-----
From: Brock Noland [mailto:brock@cloudera.com]
Sent: Wed 9/12/2012 9:11 AM
To: user@flume.apache.org
Subject: Re: splitting functions
 
Hi,

Below is a config I use to test out the FileChannel. See the comments
"##" for how messages are sent from one host to another.

node105.sources = stressSource
node105.channels = fileChannel
node105.sinks = avroSink

node105.sources.stressSource.type = org.apache.flume.source.StressSource
node105.sources.stressSource.batchSize = 1000
node105.sources.stressSource.channels = fileChannel

## Sink sends avro messages to node103.bashkew.com port 9432
node105.sinks.avroSink.type = avro
node105.sinks.avroSink.batch-size = 1000
node105.sinks.avroSink.channel = fileChannel
node105.sinks.avroSink.hostname = node102.bashkew.com
node105.sinks.avroSink.port = 9432

node105.channels.fileChannel.type = file
node105.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
node105.channels.fileChannel.dataDirs =
/tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
node105.channels.fileChannel.capacity = 10000
node105.channels.fileChannel.checkpointInterval = 3000
node105.channels.fileChannel.maxFileSize = 5242880

node102.sources = avroSource
node102.channels = fileChannel
node102.sinks = nullSink


## Source listens for avro messages on port 9432 on all ips
node102.sources.avroSource.type = avro
node102.sources.avroSource.channels = fileChannel
node102.sources.avroSource.bind = 0.0.0.0
node102.sources.avroSource.port = 9432

node102.sinks.nullSink.type = null
node102.sinks.nullSink.batchSize = 1000
node102.sinks.nullSink.channel = fileChannel

node102.channels.fileChannel.type = file
node102.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
node102.channels.fileChannel.dataDirs =
/tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
node102.channels.fileChannel.capacity = 5000
node102.channels.fileChannel.checkpointInterval = 45000
node102.channels.fileChannel.maxFileSize = 5242880



On Wed, Sep 12, 2012 at 10:06 AM, Cochran, David M (Contractor)
<Da...@bsee.gov> wrote:
> Okay folks, after spending the better part of a week reading the docs and
> experimenting I'm lost.  I have flume 1.3.x working pretty much as expected
> on a single host.  It tails a log file and writes it to another rolling log
> file via flume.  No problem there, seems to work flawlessly.  Where my issue
> is trying to break apart the functions across multiple hosts... a single
> host listening for others to send their logs to.  All of my efforts have
> resulted in little more than headaches.
>
> I can't even see the specified port open on what should be the logging host.
> I've tried the basic examples posted on different docs but can't seem to get
> things working across multiple hosts.
>
> Would someone post a working example of the conf's needed to get me started?
> Something simple that works, so I can them pick it apart to gain more
> understanding.  Apparently, I just don't yet have a firm enough grasp on all
> the pieces yet, but want to learn!
>
> Thanks in advance!
> Dave
>
>



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/


Re: splitting functions

Posted by Brock Noland <br...@cloudera.com>.
Hi,

Below is a config I use to test out the FileChannel. See the comments
"##" for how messages are sent from one host to another.

node105.sources = stressSource
node105.channels = fileChannel
node105.sinks = avroSink

node105.sources.stressSource.type = org.apache.flume.source.StressSource
node105.sources.stressSource.batchSize = 1000
node105.sources.stressSource.channels = fileChannel

## Sink sends avro messages to node103.bashkew.com port 9432
node105.sinks.avroSink.type = avro
node105.sinks.avroSink.batch-size = 1000
node105.sinks.avroSink.channel = fileChannel
node105.sinks.avroSink.hostname = node102.bashkew.com
node105.sinks.avroSink.port = 9432

node105.channels.fileChannel.type = file
node105.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
node105.channels.fileChannel.dataDirs =
/tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
node105.channels.fileChannel.capacity = 10000
node105.channels.fileChannel.checkpointInterval = 3000
node105.channels.fileChannel.maxFileSize = 5242880

node102.sources = avroSource
node102.channels = fileChannel
node102.sinks = nullSink


## Source listens for avro messages on port 9432 on all ips
node102.sources.avroSource.type = avro
node102.sources.avroSource.channels = fileChannel
node102.sources.avroSource.bind = 0.0.0.0
node102.sources.avroSource.port = 9432

node102.sinks.nullSink.type = null
node102.sinks.nullSink.batchSize = 1000
node102.sinks.nullSink.channel = fileChannel

node102.channels.fileChannel.type = file
node102.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
node102.channels.fileChannel.dataDirs =
/tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
node102.channels.fileChannel.capacity = 5000
node102.channels.fileChannel.checkpointInterval = 45000
node102.channels.fileChannel.maxFileSize = 5242880



On Wed, Sep 12, 2012 at 10:06 AM, Cochran, David M (Contractor)
<Da...@bsee.gov> wrote:
> Okay folks, after spending the better part of a week reading the docs and
> experimenting I'm lost.  I have flume 1.3.x working pretty much as expected
> on a single host.  It tails a log file and writes it to another rolling log
> file via flume.  No problem there, seems to work flawlessly.  Where my issue
> is trying to break apart the functions across multiple hosts... a single
> host listening for others to send their logs to.  All of my efforts have
> resulted in little more than headaches.
>
> I can't even see the specified port open on what should be the logging host.
> I've tried the basic examples posted on different docs but can't seem to get
> things working across multiple hosts.
>
> Would someone post a working example of the conf's needed to get me started?
> Something simple that works, so I can them pick it apart to gain more
> understanding.  Apparently, I just don't yet have a firm enough grasp on all
> the pieces yet, but want to learn!
>
> Thanks in advance!
> Dave
>
>



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/