You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by jm...@apache.org on 2011/08/18 17:48:33 UTC

Review Request: FLUME-746: Correct the behavior and logging messages about states transition of wal chunks on retry

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1584/
-----------------------------------------------------------

Review request for Flume, Arvind Prabhakar and Eric Sammer.


Summary
-------

Flume logs often have scary looking log messages that look like this:

2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states

This because previously we only expected deal with three states:

LOGGED, SENDING, SENT.

We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:

Current state, state to transition to.
IMPORT -> IMPORT // *new* warn that this is an odd case.
WRITING -> WRITING // *new*  warn that this is an odd case.
LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
others -> others // other states are unexpected and remain in their state.


This addresses bug flume-746.
    https://issues.apache.org/jira/browse/flume-746


Diffs
-----

  flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java e7d5c8b 
  flume-core/src/test/java/com/cloudera/flume/agent/TestFlumeNodeWALNotifier.java PRE-CREATION 

Diff: https://reviews.apache.org/r/1584/diff


Testing
-------

Tests pass but was run on top of other patches (FLUME-706, revert FLUME-656).  This should be orthogonal to those changes.  Full suite currently running with just this patch.


Thanks,

jmhsieh


Re: Review Request: FLUME-746: Correct the behavior and logging messages about states transition of wal chunks on retry

Posted by Eric Sammer <es...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1584/#review1574
-----------------------------------------------------------

Ship it!


lgtm.

- Eric


On 2011-08-19 18:43:32, jmhsieh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1584/
> -----------------------------------------------------------
> 
> (Updated 2011-08-19 18:43:32)
> 
> 
> Review request for Flume, Arvind Prabhakar and Eric Sammer.
> 
> 
> Summary
> -------
> 
> Flume logs often have scary looking log messages that look like this:
> 
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 
> This because previously we only expected deal with three states:
> 
> LOGGED, SENDING, SENT.
> 
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:
> 
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new*  warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
> others -> others // other states are unexpected and remain in their state.
> 
> 
> This addresses bug flume-746.
>     https://issues.apache.org/jira/browse/flume-746
> 
> 
> Diffs
> -----
> 
>   flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java e7d5c8b 
>   flume-core/src/test/java/com/cloudera/flume/agent/TestFlumeNodeWALNotifier.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/1584/diff
> 
> 
> Testing
> -------
> 
> Tests pass but was run on top of other patches (FLUME-706, revert FLUME-656).  This should be orthogonal to those changes.  Full suite currently running with just this patch.
> 
> 
> Thanks,
> 
> jmhsieh
> 
>


Re: Review Request: FLUME-746: Correct the behavior and logging messages about states transition of wal chunks on retry

Posted by jm...@apache.org.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1584/
-----------------------------------------------------------

(Updated 2011-08-19 18:43:32.589078)


Review request for Flume, Arvind Prabhakar and Eric Sammer.


Changes
-------

This patch changes behavior so that it throws IllegalStateException when it makes an impossible state transition.  I've also updated a test to reflect this.  Finally, if something gets into ERROR state, I change the behavior so that it stays in error state. (we've already caught this error, we should not throw a new error again!)


Summary
-------

Flume logs often have scary looking log messages that look like this:

2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states

This because previously we only expected deal with three states:

LOGGED, SENDING, SENT.

We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:

Current state, state to transition to.
IMPORT -> IMPORT // *new* warn that this is an odd case.
WRITING -> WRITING // *new*  warn that this is an odd case.
LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
others -> others // other states are unexpected and remain in their state.


This addresses bug flume-746.
    https://issues.apache.org/jira/browse/flume-746


Diffs (updated)
-----

  flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java e7d5c8b 
  flume-core/src/test/java/com/cloudera/flume/agent/TestFlumeNodeWALNotifier.java PRE-CREATION 

Diff: https://reviews.apache.org/r/1584/diff


Testing
-------

Tests pass but was run on top of other patches (FLUME-706, revert FLUME-656).  This should be orthogonal to those changes.  Full suite currently running with just this patch.


Thanks,

jmhsieh


Re: Review Request: FLUME-746: Correct the behavior and logging messages about states transition of wal chunks on retry

Posted by jm...@apache.org.

> On 2011-08-18 21:05:10, Eric Sammer wrote:
> > flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java, lines 818-821
> > <https://reviews.apache.org/r/1584/diff/1/?file=33425#file33425line818>
> >
> >     If we're doing odd state transitions, it seems like a bug and we should complain loudly (i.e. fast fast, fail big). What do you think about making these IllegalStateExceptions?

I think that is a great idea.  I'll change that and update tests if necessary.  


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1584/#review1530
-----------------------------------------------------------


On 2011-08-18 15:48:33, jmhsieh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1584/
> -----------------------------------------------------------
> 
> (Updated 2011-08-18 15:48:33)
> 
> 
> Review request for Flume, Arvind Prabhakar and Eric Sammer.
> 
> 
> Summary
> -------
> 
> Flume logs often have scary looking log messages that look like this:
> 
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 
> This because previously we only expected deal with three states:
> 
> LOGGED, SENDING, SENT.
> 
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:
> 
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new*  warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
> others -> others // other states are unexpected and remain in their state.
> 
> 
> This addresses bug flume-746.
>     https://issues.apache.org/jira/browse/flume-746
> 
> 
> Diffs
> -----
> 
>   flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java e7d5c8b 
>   flume-core/src/test/java/com/cloudera/flume/agent/TestFlumeNodeWALNotifier.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/1584/diff
> 
> 
> Testing
> -------
> 
> Tests pass but was run on top of other patches (FLUME-706, revert FLUME-656).  This should be orthogonal to those changes.  Full suite currently running with just this patch.
> 
> 
> Thanks,
> 
> jmhsieh
> 
>


Re: Review Request: FLUME-746: Correct the behavior and logging messages about states transition of wal chunks on retry

Posted by Eric Sammer <es...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1584/#review1530
-----------------------------------------------------------


I have a question about how tolerant of illegal state transitions we should be. I'm in favor of failing here. Thoughts?


flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java
<https://reviews.apache.org/r/1584/#comment3482>

    If we're doing odd state transitions, it seems like a bug and we should complain loudly (i.e. fast fast, fail big). What do you think about making these IllegalStateExceptions?


- Eric


On 2011-08-18 15:48:33, jmhsieh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1584/
> -----------------------------------------------------------
> 
> (Updated 2011-08-18 15:48:33)
> 
> 
> Review request for Flume, Arvind Prabhakar and Eric Sammer.
> 
> 
> Summary
> -------
> 
> Flume logs often have scary looking log messages that look like this:
> 
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 
> This because previously we only expected deal with three states:
> 
> LOGGED, SENDING, SENT.
> 
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:
> 
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new*  warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
> others -> others // other states are unexpected and remain in their state.
> 
> 
> This addresses bug flume-746.
>     https://issues.apache.org/jira/browse/flume-746
> 
> 
> Diffs
> -----
> 
>   flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java e7d5c8b 
>   flume-core/src/test/java/com/cloudera/flume/agent/TestFlumeNodeWALNotifier.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/1584/diff
> 
> 
> Testing
> -------
> 
> Tests pass but was run on top of other patches (FLUME-706, revert FLUME-656).  This should be orthogonal to those changes.  Full suite currently running with just this patch.
> 
> 
> Thanks,
> 
> jmhsieh
> 
>