You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flume.apache.org by "Jonathan Hsieh (JIRA)" <ji...@apache.org> on 2011/08/18 17:30:27 UTC

[jira] [Created] (FLUME-746) Correct the behavior and logging messages about states transition of wal chunks on retry

Correct the behavior and logging messages about states transition of wal chunks on retry
----------------------------------------------------------------------------------------

                 Key: FLUME-746
                 URL: https://issues.apache.org/jira/browse/FLUME-746
             Project: Flume
          Issue Type: Bug
          Components: Node
    Affects Versions: v0.9.4
            Reporter: Jonathan Hsieh
             Fix For: v0.9.5


Flume logs often have scary looking log messages that look like this:

2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states

This because previously we only expected deal with three states:

LOGGED, SENT. 

We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:

Current state, state to transition to.
IMPORT -> IMPORT // *new* warn that this is an odd case.
WRITING -> WRITING // *new*  warn that this is an odd case
LOGGED -> LOGGED // if it is log, it is slated for retry so stay put
SENDING -> SENDING // *This is the change* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
others -> others // other states are unexpected and remain in their state.






--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-746) Correct the behavior and logging messages about states transition of wal chunks on retry

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087879#comment-13087879 ] 

jiraposter@reviews.apache.org commented on FLUME-746:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1584/
-----------------------------------------------------------

(Updated 2011-08-19 18:43:32.589078)


Review request for Flume, Arvind Prabhakar and Eric Sammer.


Changes
-------

This patch changes behavior so that it throws IllegalStateException when it makes an impossible state transition.  I've also updated a test to reflect this.  Finally, if something gets into ERROR state, I change the behavior so that it stays in error state. (we've already caught this error, we should not throw a new error again!)


Summary
-------

Flume logs often have scary looking log messages that look like this:

2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states

This because previously we only expected deal with three states:

LOGGED, SENDING, SENT.

We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:

Current state, state to transition to.
IMPORT -> IMPORT // *new* warn that this is an odd case.
WRITING -> WRITING // *new*  warn that this is an odd case.
LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
others -> others // other states are unexpected and remain in their state.


This addresses bug flume-746.
    https://issues.apache.org/jira/browse/flume-746


Diffs (updated)
-----

  flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java e7d5c8b 
  flume-core/src/test/java/com/cloudera/flume/agent/TestFlumeNodeWALNotifier.java PRE-CREATION 

Diff: https://reviews.apache.org/r/1584/diff


Testing
-------

Tests pass but was run on top of other patches (FLUME-706, revert FLUME-656).  This should be orthogonal to those changes.  Full suite currently running with just this patch.


Thanks,

jmhsieh



> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
>                 Key: FLUME-746
>                 URL: https://issues.apache.org/jira/browse/FLUME-746
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v0.9.4
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>              Labels: wal
>             Fix For: v0.9.5
>
>         Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch, 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT. 
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new*  warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
> others -> others // other states are unexpected and remain in their state.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (FLUME-746) Correct the behavior and logging messages about states transition of wal chunks on retry

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated FLUME-746:
---------------------------------

    Attachment: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch

> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
>                 Key: FLUME-746
>                 URL: https://issues.apache.org/jira/browse/FLUME-746
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v0.9.4
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>              Labels: wal
>             Fix For: v0.9.5
>
>         Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT. 
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new*  warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
> others -> others // other states are unexpected and remain in their state.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-746) Correct the behavior and logging messages about states transition of wal chunks on retry

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088040#comment-13088040 ] 

jiraposter@reviews.apache.org commented on FLUME-746:
-----------------------------------------------------

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1584/#review1574
-----------------------------------------------------------

Ship it!

lgtm.

- Eric

On 2011-08-19 18:43:32, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1584/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-19 18:43:32)
bq.  
bq.  
bq.  Review request for Flume, Arvind Prabhakar and Eric Sammer.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Flume logs often have scary looking log messages that look like this:
bq.  
bq.  2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
bq.  2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq.  2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
bq.  2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq.  2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
bq.  2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq.  
bq.  This because previously we only expected deal with three states:
bq.  
bq.  LOGGED, SENDING, SENT.
bq.  
bq.  We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:
bq.  
bq.  Current state, state to transition to.
bq.  IMPORT -> IMPORT // *new* warn that this is an odd case.
bq.  WRITING -> WRITING // *new*  warn that this is an odd case.
bq.  LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
bq.  SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
bq.  SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
bq.  E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
bq.  others -> others // other states are unexpected and remain in their state.
bq.  
bq.  
bq.  This addresses bug flume-746.
bq.      https://issues.apache.org/jira/browse/flume-746
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java e7d5c8b 
bq.    flume-core/src/test/java/com/cloudera/flume/agent/TestFlumeNodeWALNotifier.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/1584/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Tests pass but was run on top of other patches (FLUME-706, revert FLUME-656).  This should be orthogonal to those changes.  Full suite currently running with just this patch.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.

> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
>                 Key: FLUME-746
>                 URL: https://issues.apache.org/jira/browse/FLUME-746
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v0.9.4
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>              Labels: wal
>             Fix For: v0.9.5
>
>         Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch, 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT. 
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new*  warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
> others -> others // other states are unexpected and remain in their state.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-746) Correct the behavior and logging messages about states transition of wal chunks on retry

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087069#comment-13087069 ] 

jiraposter@reviews.apache.org commented on FLUME-746:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1584/
-----------------------------------------------------------

Review request for Flume, Arvind Prabhakar and Eric Sammer.


Summary
-------

Flume logs often have scary looking log messages that look like this:

2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states

This because previously we only expected deal with three states:

LOGGED, SENDING, SENT.

We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:

Current state, state to transition to.
IMPORT -> IMPORT // *new* warn that this is an odd case.
WRITING -> WRITING // *new*  warn that this is an odd case.
LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
others -> others // other states are unexpected and remain in their state.


This addresses bug flume-746.
    https://issues.apache.org/jira/browse/flume-746


Diffs
-----

  flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java e7d5c8b 
  flume-core/src/test/java/com/cloudera/flume/agent/TestFlumeNodeWALNotifier.java PRE-CREATION 

Diff: https://reviews.apache.org/r/1584/diff


Testing
-------

Tests pass but was run on top of other patches (FLUME-706, revert FLUME-656).  This should be orthogonal to those changes.  Full suite currently running with just this patch.


Thanks,

jmhsieh



> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
>                 Key: FLUME-746
>                 URL: https://issues.apache.org/jira/browse/FLUME-746
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v0.9.4
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>              Labels: wal
>             Fix For: v0.9.5
>
>         Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT. 
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new*  warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
> others -> others // other states are unexpected and remain in their state.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (FLUME-746) Correct the behavior and logging messages about states transition of wal chunks on retry

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated FLUME-746:
---------------------------------

    Attachment: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch

Updated code/test to make bad state throw IllegalStateException.

> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
>                 Key: FLUME-746
>                 URL: https://issues.apache.org/jira/browse/FLUME-746
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v0.9.4
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>              Labels: wal
>             Fix For: v0.9.5
>
>         Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch, 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT. 
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new*  warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
> others -> others // other states are unexpected and remain in their state.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (FLUME-746) Correct the behavior and logging messages about states transition of wal chunks on retry

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated FLUME-746:
---------------------------------

    Description: 
Flume logs often have scary looking log messages that look like this:

2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states

This because previously we only expected deal with three states:

LOGGED, SENDING, SENT. 

We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:

Current state, state to transition to.
IMPORT -> IMPORT // *new* warn that this is an odd case.
WRITING -> WRITING // *new*  warn that this is an odd case.
LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
others -> others // other states are unexpected and remain in their state.






  was:
Flume logs often have scary looking log messages that look like this:

2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states

This because previously we only expected deal with three states:

LOGGED, SENT. 

We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:

Current state, state to transition to.
IMPORT -> IMPORT // *new* warn that this is an odd case.
WRITING -> WRITING // *new*  warn that this is an odd case
LOGGED -> LOGGED // if it is log, it is slated for retry so stay put
SENDING -> SENDING // *This is the change* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
others -> others // other states are unexpected and remain in their state.







> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
>                 Key: FLUME-746
>                 URL: https://issues.apache.org/jira/browse/FLUME-746
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v0.9.4
>            Reporter: Jonathan Hsieh
>              Labels: wal
>             Fix For: v0.9.5
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT. 
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new*  warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
> others -> others // other states are unexpected and remain in their state.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-746) Correct the behavior and logging messages about states transition of wal chunks on retry

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087294#comment-13087294 ] 

jiraposter@reviews.apache.org commented on FLUME-746:
-----------------------------------------------------

bq.  On 2011-08-18 21:05:10, Eric Sammer wrote:
bq.  > flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java, lines 818-821
bq.  > <https://reviews.apache.org/r/1584/diff/1/?file=33425#file33425line818>
bq.  >
bq.  >     If we're doing odd state transitions, it seems like a bug and we should complain loudly (i.e. fast fast, fail big). What do you think about making these IllegalStateExceptions?

I think that is a great idea.  I'll change that and update tests if necessary.  

- jmhsieh

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1584/#review1530
-----------------------------------------------------------

On 2011-08-18 15:48:33, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1584/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-18 15:48:33)
bq.  
bq.  
bq.  Review request for Flume, Arvind Prabhakar and Eric Sammer.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Flume logs often have scary looking log messages that look like this:
bq.  
bq.  2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
bq.  2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq.  2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
bq.  2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq.  2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
bq.  2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq.  
bq.  This because previously we only expected deal with three states:
bq.  
bq.  LOGGED, SENDING, SENT.
bq.  
bq.  We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:
bq.  
bq.  Current state, state to transition to.
bq.  IMPORT -> IMPORT // *new* warn that this is an odd case.
bq.  WRITING -> WRITING // *new*  warn that this is an odd case.
bq.  LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
bq.  SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
bq.  SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
bq.  E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
bq.  others -> others // other states are unexpected and remain in their state.
bq.  
bq.  
bq.  This addresses bug flume-746.
bq.      https://issues.apache.org/jira/browse/flume-746
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java e7d5c8b 
bq.    flume-core/src/test/java/com/cloudera/flume/agent/TestFlumeNodeWALNotifier.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/1584/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Tests pass but was run on top of other patches (FLUME-706, revert FLUME-656).  This should be orthogonal to those changes.  Full suite currently running with just this patch.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.

> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
>                 Key: FLUME-746
>                 URL: https://issues.apache.org/jira/browse/FLUME-746
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v0.9.4
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>              Labels: wal
>             Fix For: v0.9.5
>
>         Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT. 
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new*  warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
> others -> others // other states are unexpected and remain in their state.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-746) Correct the behavior and logging messages about states transition of wal chunks on retry

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087285#comment-13087285 ] 

jiraposter@reviews.apache.org commented on FLUME-746:
-----------------------------------------------------

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1584/#review1530
-----------------------------------------------------------

I have a question about how tolerant of illegal state transitions we should be. I'm in favor of failing here. Thoughts?

flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java
<https://reviews.apache.org/r/1584/#comment3482>

    If we're doing odd state transitions, it seems like a bug and we should complain loudly (i.e. fast fast, fail big). What do you think about making these IllegalStateExceptions?

- Eric

On 2011-08-18 15:48:33, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1584/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-18 15:48:33)
bq.  
bq.  
bq.  Review request for Flume, Arvind Prabhakar and Eric Sammer.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Flume logs often have scary looking log messages that look like this:
bq.  
bq.  2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
bq.  2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq.  2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
bq.  2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq.  2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
bq.  2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq.  
bq.  This because previously we only expected deal with three states:
bq.  
bq.  LOGGED, SENDING, SENT.
bq.  
bq.  We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:
bq.  
bq.  Current state, state to transition to.
bq.  IMPORT -> IMPORT // *new* warn that this is an odd case.
bq.  WRITING -> WRITING // *new*  warn that this is an odd case.
bq.  LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
bq.  SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
bq.  SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
bq.  E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
bq.  others -> others // other states are unexpected and remain in their state.
bq.  
bq.  
bq.  This addresses bug flume-746.
bq.      https://issues.apache.org/jira/browse/flume-746
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java e7d5c8b 
bq.    flume-core/src/test/java/com/cloudera/flume/agent/TestFlumeNodeWALNotifier.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/1584/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Tests pass but was run on top of other patches (FLUME-706, revert FLUME-656).  This should be orthogonal to those changes.  Full suite currently running with just this patch.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.

> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
>                 Key: FLUME-746
>                 URL: https://issues.apache.org/jira/browse/FLUME-746
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v0.9.4
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>              Labels: wal
>             Fix For: v0.9.5
>
>         Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT. 
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new*  warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
> others -> others // other states are unexpected and remain in their state.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (FLUME-746) Correct the behavior and logging messages about states transition of wal chunks on retry

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated FLUME-746:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
>                 Key: FLUME-746
>                 URL: https://issues.apache.org/jira/browse/FLUME-746
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v0.9.4
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>              Labels: wal
>             Fix For: v0.9.5
>
>         Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch, 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT. 
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new*  warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
> others -> others // other states are unexpected and remain in their state.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (FLUME-746) Correct the behavior and logging messages about states transition of wal chunks on retry

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated FLUME-746:
---------------------------------

    Assignee: Jonathan Hsieh
      Status: Patch Available  (was: Open)

review here https://reviews.apache.org/r/1584/

> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
>                 Key: FLUME-746
>                 URL: https://issues.apache.org/jira/browse/FLUME-746
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v0.9.4
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>              Labels: wal
>             Fix For: v0.9.5
>
>         Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT. 
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported).  Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new*  warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good.  No need to retry.
> others -> others // other states are unexpected and remain in their state.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira