You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Siddharth Tiwari <si...@live.com> on 2013/11/01 03:05:01 UTC

RE: Flume not moving data to HDFS or local

Can you describe the process to setup spooling directory source ? I am sorry I do not know how to to do that. If you can give me a step by step description on how to configure that and the configuration changes I need to make in my conf to get it done I will be really thankful .. Appreciate your help :)

*------------------------*

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.” 

"Maybe other people will try to limit me but I don't limit myself"


From: pchavez@verticalsearchworks.com
To: user@flume.apache.org
Date: Thu, 31 Oct 2013 14:38:54 -0700
Subject: RE: Flume not moving data to HDFS or local

It should commit when one of the various file roll configuration values are hit. There’s a list of them and their defaults in the flume user guide. For managing new files on your app servers, the best option right now seems to be a spooling directory source along with some kind of cron jobs that run locally on the app servers to drop files in the spool directory when ready. In my case I run a job that executes a custom script to checkpoint a file that is appended to all day long, creating incremental files every minute to drop in the spool directory.  From: Siddharth Tiwari [mailto:siddharth.tiwari@live.com] 
Sent: Thursday, October 31, 2013 12:47 PM
To: user@flume.apache.org
Subject: RE: Flume not moving data to HDFS or local 
It got resolved it was due to wrong version of guava jar file in flume lib, but still I can see a .tmp extention in teh fiel in HDFS, when does it actually gets commited ? :) ... One another question though What should I change in my configuration file to capture new files being generated in a directory in remote m,achine ?Say for example there is one new file generated every hour in my webserver hostlog directory. What do I change in my configuration so that I get teh new file directly in my HDFS compressed ?

*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.” 
"Maybe other people will try to limit me but I don't limit myself"

From: siddharth.tiwari@live.com
To: user@flume.apache.org
Subject: RE: Flume not moving data to HDFS or local
Date: Thu, 31 Oct 2013 19:29:36 +0000Hi Paul I see following error :- 13/10/31 12:27:01 ERROR hdfs.HDFSEventSink: process failedjava.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build()Lcom/google/common/cache/Cache;          at org.apache.hadoop.hdfs.DomainSocketFactory.<init>(DomainSocketFactory.java:45)          at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:490)          at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:445)          at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)          at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2429)          at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)          at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2463)          at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2445)          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:363)          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:165)          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:347)          at org.apache.hadoop.fs.Path.getFileSystem(Path.java:275)          at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:186)          at org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:48)          at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:155)          at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:152)          at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:125)          at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:152)          at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:307)          at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:717)          at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:714)          at java.util.concurrent.FutureTask.run(FutureTask.java:262)          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)          at java.lang.Thread.run(Thread.java:724)Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build()Lcom/google/common/cache/Cache;          at org.apache.hadoop.hdfs.DomainSocketFactory.<init>(DomainSocketFactory.java:45)          at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:490)          at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:445)          at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)          at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2429)          at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)          at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2463)          at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2445)          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:363)          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:165)          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:347)          at org.apache.hadoop.fs.Path.getFileSystem(Path.java:275)          at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:186)          at org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:48)          at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:155)          at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:152)          at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:125)          at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:152)          at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:307)          at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:717)          at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:714)          at java.util.concurrent.FutureTask.run(FutureTask.java:262)          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)          at java.lang.Thread.run(Thread.java:724) 

*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.” 
"Maybe other people will try to limit me but I don't limit myself"

From: pchavez@verticalsearchworks.com
To: user@flume.apache.org
Date: Thu, 31 Oct 2013 12:19:42 -0700
Subject: RE: Flume not moving data to HDFS or localTry bumping your memory channel capacities up, they are the same as the batch size. I would go to at least 1000 on each mem channel. Also, what to the logs and metrics show? From: Siddharth Tiwari [mailto:siddharth.tiwari@live.com] 
Sent: Thursday, October 31, 2013 11:53 AM
To: user@flume.apache.org
Subject: Flume not moving data to HDFS or local Hi team I created flume source and sink as following in hadoop yarn and I am not getting data transferred from source to sink in HDFS it doesnt create any file and on local everytime I start agent it creates one empty file. Below are my configs in source and sink  Source :-  agent.sources = logger1agent.sources.logger1.type = execagent.sources.logger1.command = tail -f /var/log/messagesagent.sources.logger1.batchsSize = 0agent.sources.logger1.channels = memoryChannelagent.channels = memoryChannelagent.channels.memoryChannel.type = memoryagent.channels.memoryChannel.capacity = 100agent.sinks = AvroSinkagent.sinks.AvroSink.type = avroagent.sinks.AvroSink.channel = memoryChannelagent.sinks.AvroSink.hostname = 192.168.147.101agent.sinks.AvroSink.port = 4545agent.sources.logger1.interceptors = itime ihostagent.sources.logger1.interceptors.itime.type = TimestampInterceptoragent.sources.logger1.interceptors.ihost.type = hostagent.sources.logger1.interceptors.ihost.useIP = falseagent.sources.logger1.interceptors.ihost.hostHeader = host  Sink at one of the slave ( datanodes on my Yarn cluster ) : collector.sources = AvroIncollector.sources.AvroIn.type = avrocollector.sources.AvroIn.bind = 0.0.0.0collector.sources.AvroIn.port = 4545collector.sources.AvroIn.channels = mc1 mc2collector.channels = mc1 mc2collector.channels.mc1.type = memorycollector.channels.mc1.capacity = 100 collector.channels.mc2.type = memorycollector.channels.mc2.capacity = 100 collector.sinks = LocalOut HadoopOutcollector.sinks.LocalOut.type = file_rollcollector.sinks.LocalOut.sink.directory = /home/hadoop/flumecollector.sinks.LocalOut.sink.rollInterval = 0collector.sinks.LocalOut.channel = mc1collector.sinks.HadoopOut.type = hdfscollector.sinks.HadoopOut.channel = mc2collector.sinks.HadoopOut.hdfs.path = /flumecollector.sinks.HadoopOut.hdfs.fileType = DataStreamcollector.sinks.HadoopOut.hdfs.writeFormat = Textcollector.sinks.HadoopOut.hdfs.rollSize = 0collector.sinks.HadoopOut.hdfs.rollCount = 10000collector.sinks.HadoopOut.hdfs.rollInterval = 600  can somebody point me to what am I doing wrong ? This is what I get in my local directory [hadoop@node1 flume]$ ls -lrttotal 0-rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:25 1383243942803-1-rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:28 1383244097923-1-rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:31 1383244302225-1-rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:33 1383244404929-1  when I restart the collector it creates one 0 bytes file. Please help 

*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.” 
"Maybe other people will try to limit me but I don't limit myself" 		 	   		  

Re: Flume not moving data to HDFS or local

Posted by Siddharth Tiwari <si...@live.com>.
Thank you so much Paul. You are life saver.  :)

Sent from my iPhone

> On Oct 31, 2013, at 8:11 PM, "Paul Chavez" <pc...@verticalsearchworks.com> wrote:
> 
> Here’s a piece of my app server configuration. It’s for IIS logs and has an interceptor to pull a timestamp out of the event data. It’s backed by a fileChannel and I drop files into the spool directory once a minute.
>  
> # SpoolDir source for Weblogs
> appserver.sources.spool_WebLogs.type = spooldir
> appserver.sources.spool_WebLogs.spoolDir = c:\\flume_data\\spool\\web
> appserver.sources.spool_WebLogs.channels = fc_WebLogs
> appserver.sources.spool_WebLogs.batchSize = 1000
> appserver.sources.spool_WebLogs.bufferMaxLines = 1200
> appserver.sources.spool_WebLogs.bufferMaxLineLength = 5000
>  
> appserver.sources.spool_WebLogs.interceptors = add_time
> appserver.sources.spool_WebLogs.interceptors.add_time.type = regex_extractor
> appserver.sources.spool_WebLogs.interceptors.add_time.regex = \\t(\\d{4}-\\d{2}-\\d{2}.\\d{2}:\\d{2})
> appserver.sources.spool_WebLogs.interceptors.add_time.serializers = millis
> appserver.sources.spool_WebLogs.interceptors.add_time.serializers.millis.type = org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
> appserver.sources.spool_WebLogs.interceptors.add_time.serializers.millis.name = timestamp
> appserver.sources.spool_WebLogs.interceptors.add_time.serializers.millis.pattern = yyyy-MM-dd HH:mm
>  
> Hope that helps,
> Paul Chavez
>  
>  
> From: Siddharth Tiwari [mailto:siddharth.tiwari@live.com] 
> Sent: Thursday, October 31, 2013 7:05 PM
> To: user@flume.apache.org
> Subject: RE: Flume not moving data to HDFS or local
>  
> Can you describe the process to setup spooling directory source ? I am sorry I do not know how to to do that. If you can give me a step by step description on how to configure that and the configuration changes I need to make in my conf to get it done I will be really thankful .. Appreciate your help :)
> 
> 
> *------------------------*
> Cheers !!!
> Siddharth Tiwari
> Have a refreshing day !!!
> "Every duty is holy, and devotion to duty is the highest form of worship of God.” 
> "Maybe other people will try to limit me but I don't limit myself"
> 
> 
> From: pchavez@verticalsearchworks.com
> To: user@flume.apache.org
> Date: Thu, 31 Oct 2013 14:38:54 -0700
> Subject: RE: Flume not moving data to HDFS or local
> 
> It should commit when one of the various file roll configuration values are hit. There’s a list of them and their defaults in the flume user guide.
>  
> For managing new files on your app servers, the best option right now seems to be a spooling directory source along with some kind of cron jobs that run locally on the app servers to drop files in the spool directory when ready. In my case I run a job that executes a custom script to checkpoint a file that is appended to all day long, creating incremental files every minute to drop in the spool directory.
>  
>  
> From: Siddharth Tiwari [mailto:siddharth.tiwari@live.com] 
> Sent: Thursday, October 31, 2013 12:47 PM
> To: user@flume.apache.org
> Subject: RE: Flume not moving data to HDFS or local
>  
> 
> It got resolved it was due to wrong version of guava jar file in flume lib, but still I can see a .tmp extention in teh fiel in HDFS, when does it actually gets commited ? :) ... One another question though What should I change in my configuration file to capture new files being generated in a directory in remote m,achine ?
> Say for example there is one new file generated every hour in my webserver hostlog directory. What do I change in my configuration so that I get teh new file directly in my HDFS compressed ?
> 
> *------------------------*
> Cheers !!!
> Siddharth Tiwari
> Have a refreshing day !!!
> "Every duty is holy, and devotion to duty is the highest form of worship of God.” 
> "Maybe other people will try to limit me but I don't limit myself"
> 
> From: siddharth.tiwari@live.com
> To: user@flume.apache.org
> Subject: RE: Flume not moving data to HDFS or local
> Date: Thu, 31 Oct 2013 19:29:36 +0000
> Hi Paul
>  
> I see following error :-
>  
> 13/10/31 12:27:01 ERROR hdfs.HDFSEventSink: process failed
> java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build()Lcom/google/common/cache/Cache;
>           at org.apache.hadoop.hdfs.DomainSocketFactory.<init>(DomainSocketFactory.java:45)
>           at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:490)
>           at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:445)
>           at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
>           at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2429)
>           at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
>           at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2463)
>           at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2445)
>           at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:363)
>           at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:165)
>           at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:347)
>           at org.apache.hadoop.fs.Path.getFileSystem(Path.java:275)
>           at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:186)
>           at org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:48)
>           at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:155)
>           at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:152)
>           at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:125)
>           at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:152)
>           at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:307)
>           at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:717)
>           at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:714)
>           at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>           at java.lang.Thread.run(Thread.java:724)
> Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build()Lcom/google/common/cache/Cache;
>           at org.apache.hadoop.hdfs.DomainSocketFactory.<init>(DomainSocketFactory.java:45)
>           at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:490)
>           at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:445)
>           at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
>           at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2429)
>           at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
>           at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2463)
>           at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2445)
>           at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:363)
>           at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:165)
>           at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:347)
>           at org.apache.hadoop.fs.Path.getFileSystem(Path.java:275)
>           at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:186)
>           at org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:48)
>           at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:155)
>           at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:152)
>           at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:125)
>           at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:152)
>           at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:307)
>           at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:717)
>           at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:714)
>           at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>           at java.lang.Thread.run(Thread.java:724)
>  
> 
> 
> *------------------------*
> Cheers !!!
> Siddharth Tiwari
> Have a refreshing day !!!
> "Every duty is holy, and devotion to duty is the highest form of worship of God.” 
> "Maybe other people will try to limit me but I don't limit myself"
> 
> From: pchavez@verticalsearchworks.com
> To: user@flume.apache.org
> Date: Thu, 31 Oct 2013 12:19:42 -0700
> Subject: RE: Flume not moving data to HDFS or local
> Try bumping your memory channel capacities up, they are the same as the batch size. I would go to at least 1000 on each mem channel.
>  
> Also, what to the logs and metrics show?
>  
> From: Siddharth Tiwari [mailto:siddharth.tiwari@live.com] 
> Sent: Thursday, October 31, 2013 11:53 AM
> To: user@flume.apache.org
> Subject: Flume not moving data to HDFS or local
>  
> Hi team I created flume source and sink as following in hadoop yarn and I am not getting data transferred from source to sink in HDFS it doesnt create any file and on local everytime I start agent it creates one empty file. Below are my configs in source and sink
>  
>  
> Source :-
>  
>  
> agent.sources = logger1
> agent.sources.logger1.type = exec
> agent.sources.logger1.command = tail -f /var/log/messages
> agent.sources.logger1.batchsSize = 0
> agent.sources.logger1.channels = memoryChannel
> agent.channels = memoryChannel
> agent.channels.memoryChannel.type = memory
> agent.channels.memoryChannel.capacity = 100
> agent.sinks = AvroSink
> agent.sinks.AvroSink.type = avro
> agent.sinks.AvroSink.channel = memoryChannel
> agent.sinks.AvroSink.hostname = 192.168.147.101
> agent.sinks.AvroSink.port = 4545
> agent.sources.logger1.interceptors = itime ihost
> agent.sources.logger1.interceptors.itime.type = TimestampInterceptor
> agent.sources.logger1.interceptors.ihost.type = host
> agent.sources.logger1.interceptors.ihost.useIP = false
> agent.sources.logger1.interceptors.ihost.hostHeader = host
>  
>  
> Sink at one of the slave ( datanodes on my Yarn cluster ) :
>  
> collector.sources = AvroIn
> collector.sources.AvroIn.type = avro
> collector.sources.AvroIn.bind = 0.0.0.0
> collector.sources.AvroIn.port = 4545
> collector.sources.AvroIn.channels = mc1 mc2
> collector.channels = mc1 mc2
> collector.channels.mc1.type = memory
> collector.channels.mc1.capacity = 100
>  
> collector.channels.mc2.type = memory
> collector.channels.mc2.capacity = 100
>  
> collector.sinks = LocalOut HadoopOut
> collector.sinks.LocalOut.type = file_roll
> collector.sinks.LocalOut.sink.directory = /home/hadoop/flume
> collector.sinks.LocalOut.sink.rollInterval = 0
> collector.sinks.LocalOut.channel = mc1
> collector.sinks.HadoopOut.type = hdfs
> collector.sinks.HadoopOut.channel = mc2
> collector.sinks.HadoopOut.hdfs.path = /flume
> collector.sinks.HadoopOut.hdfs.fileType = DataStream
> collector.sinks.HadoopOut.hdfs.writeFormat = Text
> collector.sinks.HadoopOut.hdfs.rollSize = 0
> collector.sinks.HadoopOut.hdfs.rollCount = 10000
> collector.sinks.HadoopOut.hdfs.rollInterval = 600
>  
>  
> can somebody point me to what am I doing wrong ?
>  
> This is what I get in my local directory
>  
> [hadoop@node1 flume]$ ls -lrt
> total 0
> -rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:25 1383243942803-1
> -rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:28 1383244097923-1
> -rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:31 1383244302225-1
> -rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:33 1383244404929-1
>  
>  
> when I restart the collector it creates one 0 bytes file.
>  
> Please help 
> 
> 
> *------------------------*
> Cheers !!!
> Siddharth Tiwari
> Have a refreshing day !!!
> "Every duty is holy, and devotion to duty is the highest form of worship of God.” 
> "Maybe other people will try to limit me but I don't limit myself"

RE: Flume not moving data to HDFS or local

Posted by Paul Chavez <pc...@verticalsearchworks.com>.
Here's a piece of my app server configuration. It's for IIS logs and has an interceptor to pull a timestamp out of the event data. It's backed by a fileChannel and I drop files into the spool directory once a minute.

# SpoolDir source for Weblogs
appserver.sources.spool_WebLogs.type = spooldir
appserver.sources.spool_WebLogs.spoolDir = c:\\flume_data\\spool\\web
appserver.sources.spool_WebLogs.channels = fc_WebLogs
appserver.sources.spool_WebLogs.batchSize = 1000
appserver.sources.spool_WebLogs.bufferMaxLines = 1200
appserver.sources.spool_WebLogs.bufferMaxLineLength = 5000

appserver.sources.spool_WebLogs.interceptors = add_time
appserver.sources.spool_WebLogs.interceptors.add_time.type = regex_extractor
appserver.sources.spool_WebLogs.interceptors.add_time.regex = \\t(\\d{4}-\\d{2}-\\d{2}.\\d{2}:\\d{2})
appserver.sources.spool_WebLogs.interceptors.add_time.serializers = millis
appserver.sources.spool_WebLogs.interceptors.add_time.serializers.millis.type = org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
appserver.sources.spool_WebLogs.interceptors.add_time.serializers.millis.name = timestamp
appserver.sources.spool_WebLogs.interceptors.add_time.serializers.millis.pattern = yyyy-MM-dd HH:mm

Hope that helps,
Paul Chavez


From: Siddharth Tiwari [mailto:siddharth.tiwari@live.com]
Sent: Thursday, October 31, 2013 7:05 PM
To: user@flume.apache.org
Subject: RE: Flume not moving data to HDFS or local

Can you describe the process to setup spooling directory source ? I am sorry I do not know how to to do that. If you can give me a step by step description on how to configure that and the configuration changes I need to make in my conf to get it done I will be really thankful .. Appreciate your help :)


*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God."
"Maybe other people will try to limit me but I don't limit myself"

________________________________
From: pchavez@verticalsearchworks.com<ma...@verticalsearchworks.com>
To: user@flume.apache.org<ma...@flume.apache.org>
Date: Thu, 31 Oct 2013 14:38:54 -0700
Subject: RE: Flume not moving data to HDFS or local
It should commit when one of the various file roll configuration values are hit. There's a list of them and their defaults in the flume user guide.

For managing new files on your app servers, the best option right now seems to be a spooling directory source along with some kind of cron jobs that run locally on the app servers to drop files in the spool directory when ready. In my case I run a job that executes a custom script to checkpoint a file that is appended to all day long, creating incremental files every minute to drop in the spool directory.


From: Siddharth Tiwari [mailto:siddharth.tiwari@live.com]
Sent: Thursday, October 31, 2013 12:47 PM
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: RE: Flume not moving data to HDFS or local


It got resolved it was due to wrong version of guava jar file in flume lib, but still I can see a .tmp extention in teh fiel in HDFS, when does it actually gets commited ? :) ... One another question though What should I change in my configuration file to capture new files being generated in a directory in remote m,achine ?
Say for example there is one new file generated every hour in my webserver hostlog directory. What do I change in my configuration so that I get teh new file directly in my HDFS compressed ?

*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God."
"Maybe other people will try to limit me but I don't limit myself"
________________________________
From: siddharth.tiwari@live.com<ma...@live.com>
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: RE: Flume not moving data to HDFS or local
Date: Thu, 31 Oct 2013 19:29:36 +0000
Hi Paul

I see following error :-

13/10/31 12:27:01 ERROR hdfs.HDFSEventSink: process failed
java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build()Lcom/google/common/cache/Cache;
          at org.apache.hadoop.hdfs.DomainSocketFactory.<init>(DomainSocketFactory.java:45)
          at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:490)
          at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:445)
          at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
          at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2429)
          at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
          at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2463)
          at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2445)
          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:363)
          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:165)
          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:347)
          at org.apache.hadoop.fs.Path.getFileSystem(Path.java:275)
          at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:186)
          at org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:48)
          at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:155)
          at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:152)
          at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:125)
          at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:152)
          at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:307)
          at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:717)
          at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:714)
          at java.util.concurrent.FutureTask.run(FutureTask.java:262)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          at java.lang.Thread.run(Thread.java:724)
Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build()Lcom/google/common/cache/Cache;
          at org.apache.hadoop.hdfs.DomainSocketFactory.<init>(DomainSocketFactory.java:45)
          at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:490)
          at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:445)
          at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
          at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2429)
          at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
          at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2463)
          at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2445)
          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:363)
          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:165)
          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:347)
          at org.apache.hadoop.fs.Path.getFileSystem(Path.java:275)
          at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:186)
          at org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:48)
          at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:155)
          at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:152)
          at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:125)
          at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:152)
          at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:307)
          at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:717)
          at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:714)
          at java.util.concurrent.FutureTask.run(FutureTask.java:262)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          at java.lang.Thread.run(Thread.java:724)



*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God."
"Maybe other people will try to limit me but I don't limit myself"
________________________________
From: pchavez@verticalsearchworks.com<ma...@verticalsearchworks.com>
To: user@flume.apache.org<ma...@flume.apache.org>
Date: Thu, 31 Oct 2013 12:19:42 -0700
Subject: RE: Flume not moving data to HDFS or local
Try bumping your memory channel capacities up, they are the same as the batch size. I would go to at least 1000 on each mem channel.

Also, what to the logs and metrics show?

From: Siddharth Tiwari [mailto:siddharth.tiwari@live.com]
Sent: Thursday, October 31, 2013 11:53 AM
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Flume not moving data to HDFS or local

Hi team I created flume source and sink as following in hadoop yarn and I am not getting data transferred from source to sink in HDFS it doesnt create any file and on local everytime I start agent it creates one empty file. Below are my configs in source and sink


Source :-


agent.sources = logger1
agent.sources.logger1.type = exec
agent.sources.logger1.command = tail -f /var/log/messages
agent.sources.logger1.batchsSize = 0
agent.sources.logger1.channels = memoryChannel
agent.channels = memoryChannel
agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 100
agent.sinks = AvroSink
agent.sinks.AvroSink.type = avro
agent.sinks.AvroSink.channel = memoryChannel
agent.sinks.AvroSink.hostname = 192.168.147.101
agent.sinks.AvroSink.port = 4545
agent.sources.logger1.interceptors = itime ihost
agent.sources.logger1.interceptors.itime.type = TimestampInterceptor
agent.sources.logger1.interceptors.ihost.type = host
agent.sources.logger1.interceptors.ihost.useIP = false
agent.sources.logger1.interceptors.ihost.hostHeader = host


Sink at one of the slave ( datanodes on my Yarn cluster ) :

collector.sources = AvroIn
collector.sources.AvroIn.type = avro
collector.sources.AvroIn.bind = 0.0.0.0
collector.sources.AvroIn.port = 4545
collector.sources.AvroIn.channels = mc1 mc2
collector.channels = mc1 mc2
collector.channels.mc1.type = memory
collector.channels.mc1.capacity = 100

collector.channels.mc2.type = memory
collector.channels.mc2.capacity = 100

collector.sinks = LocalOut HadoopOut
collector.sinks.LocalOut.type = file_roll
collector.sinks.LocalOut.sink.directory = /home/hadoop/flume
collector.sinks.LocalOut.sink.rollInterval = 0
collector.sinks.LocalOut.channel = mc1
collector.sinks.HadoopOut.type = hdfs
collector.sinks.HadoopOut.channel = mc2
collector.sinks.HadoopOut.hdfs.path = /flume
collector.sinks.HadoopOut.hdfs.fileType = DataStream
collector.sinks.HadoopOut.hdfs.writeFormat = Text
collector.sinks.HadoopOut.hdfs.rollSize = 0
collector.sinks.HadoopOut.hdfs.rollCount = 10000
collector.sinks.HadoopOut.hdfs.rollInterval = 600


can somebody point me to what am I doing wrong ?

This is what I get in my local directory

[hadoop@node1 flume]$ ls -lrt
total 0
-rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:25 1383243942803-1
-rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:28 1383244097923-1
-rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:31 1383244302225-1
-rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:33 1383244404929-1


when I restart the collector it creates one 0 bytes file.

Please help


*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God."
"Maybe other people will try to limit me but I don't limit myself"