You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Björn Edström <bj...@spotify.com> on 2011/11/01 10:43:44 UTC

Events silently dropped in DFO on upstream collector ERROR

Hello List,

I have a setup like this:

agent: source | agentDFOChain("collector1:35853", "collector2:35853")
collector1: collectorSource(35853) | collectorSink(
"hdfs://namenode.company.net:54310/flume/", "%Y_%m_%d_%H-")
collector2: collectorSource(35853) | collectorSink(
"hdfs://namenode.company.net:54310/flume/", "%Y_%m_%d_%H-")

If both collectors are running, and then collector1 gets into an ERROR
state (such as because of the DirectDriver issues discussed on this
list), events are silently dropped. No fail-over takes place to the
other node in the chain, and no events are written to disk.

Otherwise, as long as no collector is in ERROR, everything works as
expected. If I cleanly shut down collector1, the other collector will
start receiving traffic (as expected). If I then shut down collector2,
the agent will start writing data to /var/lib/flume as it can't send
the data upstream (as expected).

Is this a known issue?

Best regards
Björn Edström

Re: Events silently dropped in DFO on upstream collector ERROR

Posted by Björn Edström <bj...@spotify.com>.
Thanks. I have filed 2) as FLUME-839, for further discussions.

For 1), I am following developments closely and I am thankful of your
and others efforts to resolve the issue.

On Thu, Nov 3, 2011 at 9:45 PM, Mingjie Lai <mj...@gmail.com> wrote:
>
> Hi.
>
> I think you got 2 issues here:
>
> 1) as described in flume-798, you saw interruption exception at collectors
> who have rollsink + dfs sink.
>
> 2) agent chain doesn't switch to a backup collector if the primary one get
> to ERROR state.
>
> There is early patch for flume-798 but need to work to push to trunk. It's a
> quite important issue. I may have time to work on it this or next week to
> push it in.
>
> I'm not aware of 2), do you want to file a jira?
>
> Thanks,
> Mingjie
>
> On 11/01/2011 02:43 AM, Björn Edström wrote:
>>
>> Hello List,
>>
>> I have a setup like this:
>>
>> agent: source | agentDFOChain("collector1:35853", "collector2:35853")
>> collector1: collectorSource(35853) | collectorSink(
>> "hdfs://namenode.company.net:54310/flume/", "%Y_%m_%d_%H-")
>> collector2: collectorSource(35853) | collectorSink(
>> "hdfs://namenode.company.net:54310/flume/", "%Y_%m_%d_%H-")
>>
>> If both collectors are running, and then collector1 gets into an ERROR
>> state (such as because of the DirectDriver issues discussed on this
>> list), events are silently dropped. No fail-over takes place to the
>> other node in the chain, and no events are written to disk.
>>
>> Otherwise, as long as no collector is in ERROR, everything works as
>> expected. If I cleanly shut down collector1, the other collector will
>> start receiving traffic (as expected). If I then shut down collector2,
>> the agent will start writing data to /var/lib/flume as it can't send
>> the data upstream (as expected).
>>
>> Is this a known issue?
>>
>> Best regards
>> Björn Edström
>>
>

Re: Events silently dropped in DFO on upstream collector ERROR

Posted by Mingjie Lai <mj...@gmail.com>.
Hi.

I think you got 2 issues here:

1) as described in flume-798, you saw interruption exception at 
collectors who have rollsink + dfs sink.

2) agent chain doesn't switch to a backup collector if the primary one 
get to ERROR state.

There is early patch for flume-798 but need to work to push to trunk. 
It's a quite important issue. I may have time to work on it this or next 
week to push it in.

I'm not aware of 2), do you want to file a jira?

Thanks,
Mingjie

On 11/01/2011 02:43 AM, Björn Edström wrote:
> Hello List,
>
> I have a setup like this:
>
> agent: source | agentDFOChain("collector1:35853", "collector2:35853")
> collector1: collectorSource(35853) | collectorSink(
> "hdfs://namenode.company.net:54310/flume/", "%Y_%m_%d_%H-")
> collector2: collectorSource(35853) | collectorSink(
> "hdfs://namenode.company.net:54310/flume/", "%Y_%m_%d_%H-")
>
> If both collectors are running, and then collector1 gets into an ERROR
> state (such as because of the DirectDriver issues discussed on this
> list), events are silently dropped. No fail-over takes place to the
> other node in the chain, and no events are written to disk.
>
> Otherwise, as long as no collector is in ERROR, everything works as
> expected. If I cleanly shut down collector1, the other collector will
> start receiving traffic (as expected). If I then shut down collector2,
> the agent will start writing data to /var/lib/flume as it can't send
> the data upstream (as expected).
>
> Is this a known issue?
>
> Best regards
> Björn Edström
>