You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Saurabh Sharma <sa...@nviz.com> on 2016/04/20 08:48:04 UTC

Flume not marking log files as completed and do not process file further

Hi,

I have a scenario where we are ingesting the log files(around 80MB each) in flume and flume does process these files and marks as completed but after processing few files it does not process any files further and moreover it does not mark the log file as completed.

We are using spooling directory source.

I looked into the flume logs and found that when this scenario happens it shows the following line continuously in flume logs.

DEBUG [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:126)  - Checking file:../conf/commerce-sense.conf for changes

We have following configuration.
agent.sources = spoolDir
agent.channels = memoryChannel
agent.sinks = sink
agent.sources.spoolDir.interceptors = i1


#Channel Configuration
agent.channels.memoryChannel.type = memory

#Source configuration
agent.sources.spoolDir.type = spooldir
agent.sources.spoolDir.spoolDir = /opt/flume/spoolDir
agent.sources.spoolDir.fileHeader = true
agent.sources.spoolDir.basenameHeader = true
agent.sources.spoolDir.deserializer = LINE
agent.sources.spoolDir.inputCharset = ISO8859-1
agent.sources.spoolDir.deserializer.maxLineLength = 10000
agent.sources.spoolDir.interceptors.i1.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
agent.sources.spoolDir.interceptors.i1.preserveExisting = true
agent.sources.spoolDir.interceptors.i1.prefix = test
agent.sources.spoolDir.channels = memoryChannel


#Sink Configuration
agent.sinks.sink.type = com. flume.sink.ExtendedKafkaSink
agent.sinks.sink.topic = cdnLogsTopic
agent.sinks.sink.brokerList = localhost:9092
agent.sinks.sink.batchSize = 100
agent.sinks.sink.sink.serializer = com. flume.serializer.ExtendedSerializer$Builder
agent.sinks.sink.channel = memoryChannel

Thanks,
Saurabh

Re: Flume not marking log files as completed and do not process file further

Posted by Ronald Van de Kuil <ro...@nl.ibm.com>.
If you move it within the same file system then i cannot imagine that this creates the issue. 

You can turn off the reloading of the configuration file using --no-reload-conf

> Op 20 apr. 2016 om 19:21 heeft Saurabh Sharma <sa...@nviz.com> het volgende geschreven:
> 
> I increased it but no luck.
> I am using cp command instead of mv to place log file into the spooling directory. Will there be an issue?
>  
> Also when flume stops processing the file I could see the following line in logs continuously.
>  
> DEBUG [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:126)  - Checking file:../conf/flume.conf for changes
>  
> From: Ronald van de Kuil [mailto:ronald.van.de.kuil@gmail.com] 
> Sent: 20 April 2016 14:12
> To: user@flume.apache.org
> Subject: Re: Flume not marking log files as completed and do not process file further
>  
> Could you set the value for the open files to 8000, reboot to OS, inspect that it is 8000 and see whether this resolves your problem?
>  
> 2016-04-20 10:20 GMT+02:00 Chris Horrocks <ch...@hor.rocks>:
> Are the permissions on the files the same? Does the user running the flume agents have read permissions?
> Are the files still being written to/locked open by another process?
> Are there any logs being generated by the flume agent?
> 
> -- 
> Chris Horrocks
> On 20 April 2016 at 08:00:14, Saurabh Sharma (saurabh.sharma@nviz.com) wrote:
> 
> Hi Ron,
>  
> The maximum number of open files in our OS is 1024.
> 
> Thanks
> 
>  
> From: Ronald Van De Kuil [mailto:ronald.van.de.kuil@gmail.com] 
> Sent: 20 April 2016 12:24
> To: user@flume.apache.org
> Subject: Re: Flume not marking log files as completed and do not process file further
>  
> Not sure that this helps, ... have you checked your operating system settings for the maximum number of open files? 
> 
> Met vriendelijke groet,
> Ronald van de Kuil
> 
> Op 20 apr. 2016 om 08:48 heeft Saurabh Sharma <sa...@nviz.com> het volgende geschreven:
> 
> Hi,
>  
> I have a scenario where we are ingesting the log files(around 80MB each) in flume and flume does process these files and marks as completed but after processing few files it does not process any files further and moreover it does not mark the log file as completed.
>  
> We are using spooling directory source.
>  
> I looked into the flume logs and found that when this scenario happens it shows the following line continuously in flume logs.
>  
> DEBUG [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:126)  - Checking file:../conf/commerce-sense.conf for changes
>  
> We have following configuration.
> agent.sources = spoolDir
> agent.channels = memoryChannel
> agent.sinks = sink
> agent.sources.spoolDir.interceptors = i1
>  
>  
> #Channel Configuration
> agent.channels.memoryChannel.type = memory
>  
> #Source configuration
> agent.sources.spoolDir.type = spooldir
> agent.sources.spoolDir.spoolDir = /opt/flume/spoolDir
> agent.sources.spoolDir.fileHeader = true
> agent.sources.spoolDir.basenameHeader = true
> agent.sources.spoolDir.deserializer = LINE
> agent.sources.spoolDir.inputCharset = ISO8859-1
> agent.sources.spoolDir.deserializer.maxLineLength = 10000
> agent.sources.spoolDir.interceptors.i1.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
> agent.sources.spoolDir.interceptors.i1.preserveExisting = true
> agent.sources.spoolDir.interceptors.i1.prefix = test
> agent.sources.spoolDir.channels = memoryChannel
>  
>  
> #Sink Configuration
> agent.sinks.sink.type = com. flume.sink.ExtendedKafkaSink
> agent.sinks.sink.topic = cdnLogsTopic
> agent.sinks.sink.brokerList = localhost:9092
> agent.sinks.sink.batchSize = 100
> agent.sinks.sink.sink.serializer = com. flume.serializer.ExtendedSerializer$Builder
> agent.sinks.sink.channel = memoryChannel
>  
> Thanks,
> Saurabh
>  
Tenzij hierboven anders aangegeven: / Unless stated otherwise above:
IBM Nederland B.V.
Gevestigd te Amsterdam
Inschrijving Handelsregister Amsterdam Nr. 33054214

RE: Flume not marking log files as completed and do not process file further

Posted by Saurabh Sharma <sa...@nviz.com>.
I increased it but no luck.
I am using cp command instead of mv to place log file into the spooling directory. Will there be an issue?

Also when flume stops processing the file I could see the following line in logs continuously.

DEBUG [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:126)  - Checking file:../conf/flume.conf for changes

From: Ronald van de Kuil [mailto:ronald.van.de.kuil@gmail.com]
Sent: 20 April 2016 14:12
To: user@flume.apache.org
Subject: Re: Flume not marking log files as completed and do not process file further

Could you set the value for the open files to 8000, reboot to OS, inspect that it is 8000 and see whether this resolves your problem?

2016-04-20 10:20 GMT+02:00 Chris Horrocks <ch...@hor.rocks>>:
Are the permissions on the files the same? Does the user running the flume agents have read permissions?
Are the files still being written to/locked open by another process?
Are there any logs being generated by the flume agent?
--
Chris Horrocks

On 20 April 2016 at 08:00:14, Saurabh Sharma (saurabh.sharma@nviz.com<ma...@nviz.com>) wrote:
Hi Ron,

The maximum number of open files in our OS is 1024.
Thanks

From: Ronald Van De Kuil [mailto:ronald.van.de.kuil@gmail.com<ma...@gmail.com>]
Sent: 20 April 2016 12:24
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: Flume not marking log files as completed and do not process file further

Not sure that this helps, ... have you checked your operating system settings for the maximum number of open files?

Met vriendelijke groet,

Ronald van de Kuil

Op 20 apr. 2016 om 08:48 heeft Saurabh Sharma <sa...@nviz.com>> het volgende geschreven:
Hi,

I have a scenario where we are ingesting the log files(around 80MB each) in flume and flume does process these files and marks as completed but after processing few files it does not process any files further and moreover it does not mark the log file as completed.

We are using spooling directory source.

I looked into the flume logs and found that when this scenario happens it shows the following line continuously in flume logs.

DEBUG [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:126)  - Checking file:../conf/commerce-sense.conf for changes

We have following configuration.
agent.sources = spoolDir
agent.channels = memoryChannel
agent.sinks = sink
agent.sources.spoolDir.interceptors = i1


#Channel Configuration
agent.channels.memoryChannel.type = memory

#Source configuration
agent.sources.spoolDir.type = spooldir
agent.sources.spoolDir.spoolDir = /opt/flume/spoolDir
agent.sources.spoolDir.fileHeader = true
agent.sources.spoolDir.basenameHeader = true
agent.sources.spoolDir.deserializer = LINE
agent.sources.spoolDir.inputCharset = ISO8859-1
agent.sources.spoolDir.deserializer.maxLineLength = 10000
agent.sources.spoolDir.interceptors.i1.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
agent.sources.spoolDir.interceptors.i1.preserveExisting = true
agent.sources.spoolDir.interceptors.i1.prefix = test
agent.sources.spoolDir.channels = memoryChannel


#Sink Configuration
agent.sinks.sink.type = com. flume.sink.ExtendedKafkaSink
agent.sinks.sink.topic = cdnLogsTopic
agent.sinks.sink.brokerList = localhost:9092
agent.sinks.sink.batchSize = 100
agent.sinks.sink.sink.serializer = com. flume.serializer.ExtendedSerializer$Builder
agent.sinks.sink.channel = memoryChannel

Thanks,
Saurabh


Re: Flume not marking log files as completed and do not process file further

Posted by Ronald van de Kuil <ro...@gmail.com>.
Could you set the value for the open files to 8000, reboot to OS, inspect
that it is 8000 and see whether this resolves your problem?

2016-04-20 10:20 GMT+02:00 Chris Horrocks <ch...@hor.rocks>:

> Are the permissions on the files the same? Does the user running the flume
> agents have read permissions?
> Are the files still being written to/locked open by another process?
> Are there any logs being generated by the flume agent?
>
> --
> Chris Horrocks
>
> On 20 April 2016 at 08:00:14, Saurabh Sharma (saurabh.sharma@nviz.com)
> wrote:
>
>> Hi Ron,
>>
>>
>>
>> The maximum number of open files in our OS is 1024.
>>
>> Thanks
>>
>>
>>
>> *From:* Ronald Van De Kuil [mailto:ronald.van.de.kuil@gmail.com]
>> *Sent:* 20 April 2016 12:24
>> *To:* user@flume.apache.org
>> *Subject:* Re: Flume not marking log files as completed and do not
>> process file further
>>
>>
>>
>> Not sure that this helps, ... have you checked your operating system
>> settings for the maximum number of open files?
>>
>> Met vriendelijke groet,
>>
>> Ronald van de Kuil
>>
>>
>> Op 20 apr. 2016 om 08:48 heeft Saurabh Sharma <sa...@nviz.com>
>> het volgende geschreven:
>>
>> Hi,
>>
>>
>>
>> I have a scenario where we are ingesting the log files(around 80MB each)
>> in flume and flume does process these files and marks as completed but
>> after processing few files it does not process any files further and
>> moreover it does not mark the log file as completed.
>>
>>
>>
>> We are using spooling directory source.
>>
>>
>>
>> I looked into the flume logs and found that when this scenario happens it
>> shows the following line continuously in flume logs.
>>
>>
>>
>> DEBUG [conf-file-poller-0]
>> (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:126)
>> - Checking file:../conf/commerce-sense.conf for changes
>>
>>
>>
>> We have following configuration.
>>
>> agent.sources = spoolDir
>>
>> agent.channels = memoryChannel
>>
>> agent.sinks = sink
>>
>> agent.sources.spoolDir.interceptors = i1
>>
>>
>>
>>
>>
>> #Channel Configuration
>>
>> agent.channels.memoryChannel.type = memory
>>
>>
>>
>> #Source configuration
>>
>> agent.sources.spoolDir.type = spooldir
>>
>> agent.sources.spoolDir.spoolDir = /opt/flume/spoolDir
>>
>> agent.sources.spoolDir.fileHeader = true
>>
>> agent.sources.spoolDir.basenameHeader = true
>>
>> agent.sources.spoolDir.deserializer = LINE
>>
>> agent.sources.spoolDir.inputCharset = ISO8859-1
>>
>> agent.sources.spoolDir.deserializer.maxLineLength = 10000
>>
>> agent.sources.spoolDir.interceptors.i1.type =
>> org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
>>
>> agent.sources.spoolDir.interceptors.i1.preserveExisting = true
>>
>> agent.sources.spoolDir.interceptors.i1.prefix = test
>>
>> agent.sources.spoolDir.channels = memoryChannel
>>
>>
>>
>>
>>
>> #Sink Configuration
>>
>> agent.sinks.sink.type = com. flume.sink.ExtendedKafkaSink
>>
>> agent.sinks.sink.topic = cdnLogsTopic
>>
>> agent.sinks.sink.brokerList = localhost:9092
>>
>> agent.sinks.sink.batchSize = 100
>>
>> agent.sinks.sink.sink.serializer = com.
>> flume.serializer.ExtendedSerializer$Builder
>>
>> agent.sinks.sink.channel = memoryChannel
>>
>>
>>
>> Thanks,
>> Saurabh
>>
>>

RE: Flume not marking log files as completed and do not process file further

Posted by Chris Horrocks <ch...@hor.rocks>.
Are the permissions on the files the same? Does the user running the flume agents have read permissions? 
Are the files still being written to/locked open by another process?
Are there any logs being generated by the flume agent?

-- 
Chris Horrocks 

On 20 April 2016 at 08:00:14, Saurabh Sharma (saurabh.sharma@nviz.com(mailto:saurabh.sharma@nviz.com)) wrote:

> 
> 
> Hi Ron,
> 
> 
> 
> 
> 
> The maximum number of open files in our OS is 1024.
> 
> 
> Thanks
> 
> 
> 
> 
> 
> From: Ronald Van De Kuil [mailto:ronald.van.de.kuil@gmail.com]
> Sent: 20 April 2016 12:24
> To: user@flume.apache.org
> Subject: Re: Flume not marking log files as completed and do not process file further
> 
> 
> 
> 
> 
> 
> 
> Not sure that this helps, ... have you checked your operating system settings for the maximum number of open files? 
> 
> Met vriendelijke groet, 
> Ronald van de Kuil 
> 
> 
> 
> Op 20 apr. 2016 om 08:48 heeft Saurabh Sharma <saurabh.sharma@nviz.com(mailto:saurabh.sharma@nviz.com)> het volgende geschreven:
> 
> 
> > 
> > Hi,
> > 
> > 
> > 
> > 
> > 
> > I have a scenario where we are ingesting the log files(around 80MB each) in flume and flume does process these files and marks as completed but after processing few files it does not process any files further and moreover it does not mark the log file as completed.
> > 
> > 
> > 
> > 
> > 
> > We are using spooling directory source.
> > 
> > 
> > 
> > 
> > 
> > I looked into the flume logs and found that when this scenario happens it shows the following line continuously in flume logs.
> > 
> > 
> > 
> > 
> > 
> > DEBUG [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:126) - Checking file:../conf/commerce-sense.conf for changes
> > 
> > 
> > 
> > 
> > 
> > We have following configuration.
> > 
> > 
> > agent.sources = spoolDir
> > 
> > 
> > agent.channels = memoryChannel
> > 
> > 
> > agent.sinks = sink
> > 
> > 
> > agent.sources.spoolDir.interceptors = i1
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > #Channel Configuration
> > 
> > 
> > agent.channels.memoryChannel.type = memory
> > 
> > 
> > 
> > 
> > 
> > #Source configuration
> > 
> > 
> > agent.sources.spoolDir.type = spooldir
> > 
> > 
> > agent.sources.spoolDir.spoolDir = /opt/flume/spoolDir
> > 
> > 
> > agent.sources.spoolDir.fileHeader = true
> > 
> > 
> > agent.sources.spoolDir.basenameHeader = true
> > 
> > 
> > agent.sources.spoolDir.deserializer = LINE
> > 
> > 
> > agent.sources.spoolDir.inputCharset = ISO8859-1
> > 
> > 
> > agent.sources.spoolDir.deserializer.maxLineLength = 10000
> > 
> > 
> > agent.sources.spoolDir.interceptors.i1.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
> > 
> > 
> > agent.sources.spoolDir.interceptors.i1.preserveExisting = true
> > 
> > 
> > agent.sources.spoolDir.interceptors.i1.prefix = test
> > 
> > 
> > agent.sources.spoolDir.channels = memoryChannel
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > #Sink Configuration
> > 
> > 
> > agent.sinks.sink.type = com. flume.sink.ExtendedKafkaSink
> > 
> > 
> > agent.sinks.sink.topic = cdnLogsTopic
> > 
> > 
> > agent.sinks.sink.brokerList = localhost:9092
> > 
> > 
> > agent.sinks.sink.batchSize = 100
> > 
> > 
> > agent.sinks.sink.sink.serializer = com. flume.serializer.ExtendedSerializer$Builder
> > 
> > 
> > agent.sinks.sink.channel = memoryChannel
> > 
> > 
> > 
> > 
> > 
> > Thanks,
> > Saurabh
> > 
> > 
> 
> 
> 
> 



RE: Flume not marking log files as completed and do not process file further

Posted by Saurabh Sharma <sa...@nviz.com>.
Hi Ron,

The maximum number of open files in our OS is 1024.
Thanks

From: Ronald Van De Kuil [mailto:ronald.van.de.kuil@gmail.com]
Sent: 20 April 2016 12:24
To: user@flume.apache.org
Subject: Re: Flume not marking log files as completed and do not process file further

Not sure that this helps, ... have you checked your operating system settings for the maximum number of open files?

Met vriendelijke groet,

Ronald van de Kuil

Op 20 apr. 2016 om 08:48 heeft Saurabh Sharma <sa...@nviz.com>> het volgende geschreven:
Hi,

I have a scenario where we are ingesting the log files(around 80MB each) in flume and flume does process these files and marks as completed but after processing few files it does not process any files further and moreover it does not mark the log file as completed.

We are using spooling directory source.

I looked into the flume logs and found that when this scenario happens it shows the following line continuously in flume logs.

DEBUG [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:126)  - Checking file:../conf/commerce-sense.conf for changes

We have following configuration.
agent.sources = spoolDir
agent.channels = memoryChannel
agent.sinks = sink
agent.sources.spoolDir.interceptors = i1


#Channel Configuration
agent.channels.memoryChannel.type = memory

#Source configuration
agent.sources.spoolDir.type = spooldir
agent.sources.spoolDir.spoolDir = /opt/flume/spoolDir
agent.sources.spoolDir.fileHeader = true
agent.sources.spoolDir.basenameHeader = true
agent.sources.spoolDir.deserializer = LINE
agent.sources.spoolDir.inputCharset = ISO8859-1
agent.sources.spoolDir.deserializer.maxLineLength = 10000
agent.sources.spoolDir.interceptors.i1.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
agent.sources.spoolDir.interceptors.i1.preserveExisting = true
agent.sources.spoolDir.interceptors.i1.prefix = test
agent.sources.spoolDir.channels = memoryChannel


#Sink Configuration
agent.sinks.sink.type = com. flume.sink.ExtendedKafkaSink
agent.sinks.sink.topic = cdnLogsTopic
agent.sinks.sink.brokerList = localhost:9092
agent.sinks.sink.batchSize = 100
agent.sinks.sink.sink.serializer = com. flume.serializer.ExtendedSerializer$Builder
agent.sinks.sink.channel = memoryChannel

Thanks,
Saurabh

Re: Flume not marking log files as completed and do not process file further

Posted by Ronald Van De Kuil <ro...@gmail.com>.
Not sure that this helps, ... have you checked your operating system settings for the maximum number of open files? 

Met vriendelijke groet,
Ronald van de Kuil

> Op 20 apr. 2016 om 08:48 heeft Saurabh Sharma <sa...@nviz.com> het volgende geschreven:
> 
> Hi,
>  
> I have a scenario where we are ingesting the log files(around 80MB each) in flume and flume does process these files and marks as completed but after processing few files it does not process any files further and moreover it does not mark the log file as completed.
>  
> We are using spooling directory source.
>  
> I looked into the flume logs and found that when this scenario happens it shows the following line continuously in flume logs.
>  
> DEBUG [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:126)  - Checking file:../conf/commerce-sense.conf for changes
>  
> We have following configuration.
> agent.sources = spoolDir
> agent.channels = memoryChannel
> agent.sinks = sink
> agent.sources.spoolDir.interceptors = i1
>  
>  
> #Channel Configuration
> agent.channels.memoryChannel.type = memory
>  
> #Source configuration
> agent.sources.spoolDir.type = spooldir
> agent.sources.spoolDir.spoolDir = /opt/flume/spoolDir
> agent.sources.spoolDir.fileHeader = true
> agent.sources.spoolDir.basenameHeader = true
> agent.sources.spoolDir.deserializer = LINE
> agent.sources.spoolDir.inputCharset = ISO8859-1
> agent.sources.spoolDir.deserializer.maxLineLength = 10000
> agent.sources.spoolDir.interceptors.i1.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
> agent.sources.spoolDir.interceptors.i1.preserveExisting = true
> agent.sources.spoolDir.interceptors.i1.prefix = test
> agent.sources.spoolDir.channels = memoryChannel
>  
>  
> #Sink Configuration
> agent.sinks.sink.type = com. flume.sink.ExtendedKafkaSink
> agent.sinks.sink.topic = cdnLogsTopic
> agent.sinks.sink.brokerList = localhost:9092
> agent.sinks.sink.batchSize = 100
> agent.sinks.sink.sink.serializer = com. flume.serializer.ExtendedSerializer$Builder
> agent.sinks.sink.channel = memoryChannel
>  
> Thanks,
> Saurabh