You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Chalcy Raja <Ch...@careerbuilder.com> on 2012/03/12 16:38:57 UTC

flume agent Error

Hi,

I am using one flume agent to taildir a few directories.  One agent thread went into an ERROR state.  I see a file stuck in error state.  I know starting the agent again can resolve the issue, but would like to know the cause of the error state.

1.How would I find the cause of the error state?  I checked the log on the agent and could not find any error log.
2. How to restart only that thread without having to restart the agent?

Any answer is appreciated.

Thanks,
Chalcy


RE: flume agent Error

Posted by Chalcy Raja <Ch...@careerbuilder.com>.
Thank you! Alex.  

I have set up taildir to switch to end ( true) of file.  Does this mean when the agent starts again, we loose data or it will catch up from where left off?

How do I roll the tailed files after they are written to hdfs?

My config:
agent1: tailDir( "/mylogdir/", ".*\\.log", true, 0) | agentE2EChain("collector1:35853", "collector2:35853");
collector1: collectorSource( 35853 ) | collectorSink( "hdfs://mycluster/%Y-%m-%d-%H/", "logfragment-" ,12000); 

P.S. Looks like we may have to change to syslog.


Thanks,
Chalcy

-----Original Message-----
From: alo alt [mailto:wget.null@googlemail.com] 
Sent: Wednesday, March 14, 2012 3:57 AM
To: flume-user@incubator.apache.org
Subject: Re: flume agent Error

Hi,

You could the agent config reload over the CLI the configuration. 

Let me say some notes:
I know that tail and tailDir are the coolest features in flume. But, you have to notice, flume uses a tail, which means that act like a tail -f in a console. The file will be larger, the memory will be higher used. The tail process will be restarted, the marker get lose and it will start again from the beginning of the file (or end, if use use that switch). So, the loosing of threads in a larger setup is a cause of that.

You can prevent it if you roll the tailed files after they are written into HDFS. Or do not use tail, use syslog or avro instead. Or, when you use tailDir spilt the directories to catch only small numbers of files.

best,
 Alex 

--
Alexander Lorenz
http://mapredit.blogspot.com

On Mar 12, 2012, at 4:38 PM, Chalcy Raja wrote:

> Hi,
> 
> I am using one flume agent to taildir a few directories.  One agent thread went into an ERROR state.  I see a file stuck in error state.  I know starting the agent again can resolve the issue, but would like to know the cause of the error state.
> 
> 1.How would I find the cause of the error state?  I checked the log on the agent and could not find any error log.
> 2. How to restart only that thread without having to restart the agent?
> 
> Any answer is appreciated.
> 
> Thanks,
> Chalcy
> 



Re: flume agent Error

Posted by alo alt <wg...@googlemail.com>.
Hi,

You could the agent config reload over the CLI the configuration. 

Let me say some notes:
I know that tail and tailDir are the coolest features in flume. But, you have to notice, flume uses a tail, which means that act like a tail -f in a console. The file will be larger, the memory will be higher used. The tail process will be restarted, the marker get lose and it will start again from the beginning of the file (or end, if use use that switch). So, the loosing of threads in a larger setup is a cause of that.

You can prevent it if you roll the tailed files after they are written into HDFS. Or do not use tail, use syslog or avro instead. Or, when you use tailDir spilt the directories to catch only small numbers of files.

best,
 Alex 

--
Alexander Lorenz
http://mapredit.blogspot.com

On Mar 12, 2012, at 4:38 PM, Chalcy Raja wrote:

> Hi,
> 
> I am using one flume agent to taildir a few directories.  One agent thread went into an ERROR state.  I see a file stuck in error state.  I know starting the agent again can resolve the issue, but would like to know the cause of the error state.
> 
> 1.How would I find the cause of the error state?  I checked the log on the agent and could not find any error log.
> 2. How to restart only that thread without having to restart the agent?
> 
> Any answer is appreciated.
> 
> Thanks,
> Chalcy
> 


RE: flume agent Error

Posted by Chalcy Raja <Ch...@careerbuilder.com>.
Thanks, Julian.  We may have to do that.  I do not want to go in that route, since we are still experimenting.

--Chalcy
________________________________________
From: Julian Henry Alcala [zenloop@gmail.com]
Sent: Tuesday, March 13, 2012 11:08 AM
To: flume-user@incubator.apache.org
Cc: flume-user@incubator.apache.org
Subject: Re: flume agent Error

I have issues with agents dying as well.  I have talked to several people who administer large systems and just live with the failures by using a "supervisor" daemon that watches flume and restarts the agent when it dies.  I have opened a bug but have yet to see a response.  Let us know if you find a resolution.

Sent from my iPhone

On Mar 12, 2012, at 8:38 AM, Chalcy Raja <Ch...@careerbuilder.com> wrote:

> Hi,
>
> I am using one flume agent to taildir a few directories.  One agent thread went into an ERROR state.  I see a file stuck in error state.  I know starting the agent again can resolve the issue, but would like to know the cause of the error state.
>
> 1.How would I find the cause of the error state?  I checked the log on the agent and could not find any error log.
> 2. How to restart only that thread without having to restart the agent?
>
> Any answer is appreciated.
>
> Thanks,
> Chalcy
>


Re: flume agent Error

Posted by Julian Henry Alcala <ze...@gmail.com>.
I have issues with agents dying as well.  I have talked to several people who administer large systems and just live with the failures by using a "supervisor" daemon that watches flume and restarts the agent when it dies.  I have opened a bug but have yet to see a response.  Let us know if you find a resolution.

Sent from my iPhone

On Mar 12, 2012, at 8:38 AM, Chalcy Raja <Ch...@careerbuilder.com> wrote:

> Hi,
> 
> I am using one flume agent to taildir a few directories.  One agent thread went into an ERROR state.  I see a file stuck in error state.  I know starting the agent again can resolve the issue, but would like to know the cause of the error state.
> 
> 1.How would I find the cause of the error state?  I checked the log on the agent and could not find any error log.
> 2. How to restart only that thread without having to restart the agent?
> 
> Any answer is appreciated.
> 
> Thanks,
> Chalcy
>