You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Jonathan Hsieh (JIRA)" <ji...@apache.org> on 2011/08/18 17:30:27 UTC
[jira] [Created] (FLUME-746) Correct the behavior and logging
messages about states transition of wal chunks on retry
Correct the behavior and logging messages about states transition of wal chunks on retry
----------------------------------------------------------------------------------------
Key: FLUME-746
URL: https://issues.apache.org/jira/browse/FLUME-746
Project: Flume
Issue Type: Bug
Components: Node
Affects Versions: v0.9.4
Reporter: Jonathan Hsieh
Fix For: v0.9.5
Flume logs often have scary looking log messages that look like this:
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
This because previously we only expected deal with three states:
LOGGED, SENT.
We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
Current state, state to transition to.
IMPORT -> IMPORT // *new* warn that this is an odd case.
WRITING -> WRITING // *new* warn that this is an odd case
LOGGED -> LOGGED // if it is log, it is slated for retry so stay put
SENDING -> SENDING // *This is the change* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
others -> others // other states are unexpected and remain in their state.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FLUME-746) Correct the behavior and logging
messages about states transition of wal chunks on retry
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087879#comment-13087879 ]
jiraposter@reviews.apache.org commented on FLUME-746:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1584/
-----------------------------------------------------------
(Updated 2011-08-19 18:43:32.589078)
Review request for Flume, Arvind Prabhakar and Eric Sammer.
Changes
-------
This patch changes behavior so that it throws IllegalStateException when it makes an impossible state transition. I've also updated a test to reflect this. Finally, if something gets into ERROR state, I change the behavior so that it stays in error state. (we've already caught this error, we should not throw a new error again!)
Summary
-------
Flume logs often have scary looking log messages that look like this:
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
This because previously we only expected deal with three states:
LOGGED, SENDING, SENT.
We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
Current state, state to transition to.
IMPORT -> IMPORT // *new* warn that this is an odd case.
WRITING -> WRITING // *new* warn that this is an odd case.
LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
others -> others // other states are unexpected and remain in their state.
This addresses bug flume-746.
https://issues.apache.org/jira/browse/flume-746
Diffs (updated)
-----
flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java e7d5c8b
flume-core/src/test/java/com/cloudera/flume/agent/TestFlumeNodeWALNotifier.java PRE-CREATION
Diff: https://reviews.apache.org/r/1584/diff
Testing
-------
Tests pass but was run on top of other patches (FLUME-706, revert FLUME-656). This should be orthogonal to those changes. Full suite currently running with just this patch.
Thanks,
jmhsieh
> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
> Key: FLUME-746
> URL: https://issues.apache.org/jira/browse/FLUME-746
> Project: Flume
> Issue Type: Bug
> Components: Node
> Affects Versions: v0.9.4
> Reporter: Jonathan Hsieh
> Assignee: Jonathan Hsieh
> Labels: wal
> Fix For: v0.9.5
>
> Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch, 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT.
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new* warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
> others -> others // other states are unexpected and remain in their state.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (FLUME-746) Correct the behavior and logging
messages about states transition of wal chunks on retry
Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Hsieh updated FLUME-746:
---------------------------------
Attachment: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
> Key: FLUME-746
> URL: https://issues.apache.org/jira/browse/FLUME-746
> Project: Flume
> Issue Type: Bug
> Components: Node
> Affects Versions: v0.9.4
> Reporter: Jonathan Hsieh
> Assignee: Jonathan Hsieh
> Labels: wal
> Fix For: v0.9.5
>
> Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT.
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new* warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
> others -> others // other states are unexpected and remain in their state.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FLUME-746) Correct the behavior and logging
messages about states transition of wal chunks on retry
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088040#comment-13088040 ]
jiraposter@reviews.apache.org commented on FLUME-746:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1584/#review1574
-----------------------------------------------------------
Ship it!
lgtm.
- Eric
On 2011-08-19 18:43:32, jmhsieh wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/1584/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-08-19 18:43:32)
bq.
bq.
bq. Review request for Flume, Arvind Prabhakar and Eric Sammer.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Flume logs often have scary looking log messages that look like this:
bq.
bq. 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
bq. 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq. 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
bq. 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq. 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
bq. 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq.
bq. This because previously we only expected deal with three states:
bq.
bq. LOGGED, SENDING, SENT.
bq.
bq. We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
bq.
bq. Current state, state to transition to.
bq. IMPORT -> IMPORT // *new* warn that this is an odd case.
bq. WRITING -> WRITING // *new* warn that this is an odd case.
bq. LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
bq. SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
bq. SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
bq. E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
bq. others -> others // other states are unexpected and remain in their state.
bq.
bq.
bq. This addresses bug flume-746.
bq. https://issues.apache.org/jira/browse/flume-746
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java e7d5c8b
bq. flume-core/src/test/java/com/cloudera/flume/agent/TestFlumeNodeWALNotifier.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/1584/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Tests pass but was run on top of other patches (FLUME-706, revert FLUME-656). This should be orthogonal to those changes. Full suite currently running with just this patch.
bq.
bq.
bq. Thanks,
bq.
bq. jmhsieh
bq.
bq.
> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
> Key: FLUME-746
> URL: https://issues.apache.org/jira/browse/FLUME-746
> Project: Flume
> Issue Type: Bug
> Components: Node
> Affects Versions: v0.9.4
> Reporter: Jonathan Hsieh
> Assignee: Jonathan Hsieh
> Labels: wal
> Fix For: v0.9.5
>
> Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch, 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT.
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new* warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
> others -> others // other states are unexpected and remain in their state.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FLUME-746) Correct the behavior and logging
messages about states transition of wal chunks on retry
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087069#comment-13087069 ]
jiraposter@reviews.apache.org commented on FLUME-746:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1584/
-----------------------------------------------------------
Review request for Flume, Arvind Prabhakar and Eric Sammer.
Summary
-------
Flume logs often have scary looking log messages that look like this:
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
This because previously we only expected deal with three states:
LOGGED, SENDING, SENT.
We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
Current state, state to transition to.
IMPORT -> IMPORT // *new* warn that this is an odd case.
WRITING -> WRITING // *new* warn that this is an odd case.
LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
others -> others // other states are unexpected and remain in their state.
This addresses bug flume-746.
https://issues.apache.org/jira/browse/flume-746
Diffs
-----
flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java e7d5c8b
flume-core/src/test/java/com/cloudera/flume/agent/TestFlumeNodeWALNotifier.java PRE-CREATION
Diff: https://reviews.apache.org/r/1584/diff
Testing
-------
Tests pass but was run on top of other patches (FLUME-706, revert FLUME-656). This should be orthogonal to those changes. Full suite currently running with just this patch.
Thanks,
jmhsieh
> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
> Key: FLUME-746
> URL: https://issues.apache.org/jira/browse/FLUME-746
> Project: Flume
> Issue Type: Bug
> Components: Node
> Affects Versions: v0.9.4
> Reporter: Jonathan Hsieh
> Assignee: Jonathan Hsieh
> Labels: wal
> Fix For: v0.9.5
>
> Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT.
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new* warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
> others -> others // other states are unexpected and remain in their state.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (FLUME-746) Correct the behavior and logging
messages about states transition of wal chunks on retry
Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Hsieh updated FLUME-746:
---------------------------------
Attachment: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
Updated code/test to make bad state throw IllegalStateException.
> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
> Key: FLUME-746
> URL: https://issues.apache.org/jira/browse/FLUME-746
> Project: Flume
> Issue Type: Bug
> Components: Node
> Affects Versions: v0.9.4
> Reporter: Jonathan Hsieh
> Assignee: Jonathan Hsieh
> Labels: wal
> Fix For: v0.9.5
>
> Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch, 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT.
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new* warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
> others -> others // other states are unexpected and remain in their state.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (FLUME-746) Correct the behavior and logging
messages about states transition of wal chunks on retry
Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Hsieh updated FLUME-746:
---------------------------------
Description:
Flume logs often have scary looking log messages that look like this:
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
This because previously we only expected deal with three states:
LOGGED, SENDING, SENT.
We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
Current state, state to transition to.
IMPORT -> IMPORT // *new* warn that this is an odd case.
WRITING -> WRITING // *new* warn that this is an odd case.
LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
others -> others // other states are unexpected and remain in their state.
was:
Flume logs often have scary looking log messages that look like this:
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
This because previously we only expected deal with three states:
LOGGED, SENT.
We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
Current state, state to transition to.
IMPORT -> IMPORT // *new* warn that this is an odd case.
WRITING -> WRITING // *new* warn that this is an odd case
LOGGED -> LOGGED // if it is log, it is slated for retry so stay put
SENDING -> SENDING // *This is the change* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
others -> others // other states are unexpected and remain in their state.
> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
> Key: FLUME-746
> URL: https://issues.apache.org/jira/browse/FLUME-746
> Project: Flume
> Issue Type: Bug
> Components: Node
> Affects Versions: v0.9.4
> Reporter: Jonathan Hsieh
> Labels: wal
> Fix For: v0.9.5
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT.
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new* warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
> others -> others // other states are unexpected and remain in their state.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FLUME-746) Correct the behavior and logging
messages about states transition of wal chunks on retry
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087294#comment-13087294 ]
jiraposter@reviews.apache.org commented on FLUME-746:
-----------------------------------------------------
bq. On 2011-08-18 21:05:10, Eric Sammer wrote:
bq. > flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java, lines 818-821
bq. > <https://reviews.apache.org/r/1584/diff/1/?file=33425#file33425line818>
bq. >
bq. > If we're doing odd state transitions, it seems like a bug and we should complain loudly (i.e. fast fast, fail big). What do you think about making these IllegalStateExceptions?
I think that is a great idea. I'll change that and update tests if necessary.
- jmhsieh
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1584/#review1530
-----------------------------------------------------------
On 2011-08-18 15:48:33, jmhsieh wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/1584/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-08-18 15:48:33)
bq.
bq.
bq. Review request for Flume, Arvind Prabhakar and Eric Sammer.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Flume logs often have scary looking log messages that look like this:
bq.
bq. 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
bq. 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq. 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
bq. 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq. 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
bq. 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq.
bq. This because previously we only expected deal with three states:
bq.
bq. LOGGED, SENDING, SENT.
bq.
bq. We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
bq.
bq. Current state, state to transition to.
bq. IMPORT -> IMPORT // *new* warn that this is an odd case.
bq. WRITING -> WRITING // *new* warn that this is an odd case.
bq. LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
bq. SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
bq. SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
bq. E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
bq. others -> others // other states are unexpected and remain in their state.
bq.
bq.
bq. This addresses bug flume-746.
bq. https://issues.apache.org/jira/browse/flume-746
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java e7d5c8b
bq. flume-core/src/test/java/com/cloudera/flume/agent/TestFlumeNodeWALNotifier.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/1584/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Tests pass but was run on top of other patches (FLUME-706, revert FLUME-656). This should be orthogonal to those changes. Full suite currently running with just this patch.
bq.
bq.
bq. Thanks,
bq.
bq. jmhsieh
bq.
bq.
> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
> Key: FLUME-746
> URL: https://issues.apache.org/jira/browse/FLUME-746
> Project: Flume
> Issue Type: Bug
> Components: Node
> Affects Versions: v0.9.4
> Reporter: Jonathan Hsieh
> Assignee: Jonathan Hsieh
> Labels: wal
> Fix For: v0.9.5
>
> Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT.
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new* warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
> others -> others // other states are unexpected and remain in their state.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FLUME-746) Correct the behavior and logging
messages about states transition of wal chunks on retry
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087285#comment-13087285 ]
jiraposter@reviews.apache.org commented on FLUME-746:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1584/#review1530
-----------------------------------------------------------
I have a question about how tolerant of illegal state transitions we should be. I'm in favor of failing here. Thoughts?
flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java
<https://reviews.apache.org/r/1584/#comment3482>
If we're doing odd state transitions, it seems like a bug and we should complain loudly (i.e. fast fast, fail big). What do you think about making these IllegalStateExceptions?
- Eric
On 2011-08-18 15:48:33, jmhsieh wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/1584/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-08-18 15:48:33)
bq.
bq.
bq. Review request for Flume, Arvind Prabhakar and Eric Sammer.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Flume logs often have scary looking log messages that look like this:
bq.
bq. 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
bq. 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq. 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
bq. 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq. 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
bq. 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
bq.
bq. This because previously we only expected deal with three states:
bq.
bq. LOGGED, SENDING, SENT.
bq.
bq. We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
bq.
bq. Current state, state to transition to.
bq. IMPORT -> IMPORT // *new* warn that this is an odd case.
bq. WRITING -> WRITING // *new* warn that this is an odd case.
bq. LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
bq. SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
bq. SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
bq. E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
bq. others -> others // other states are unexpected and remain in their state.
bq.
bq.
bq. This addresses bug flume-746.
bq. https://issues.apache.org/jira/browse/flume-746
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALManager.java e7d5c8b
bq. flume-core/src/test/java/com/cloudera/flume/agent/TestFlumeNodeWALNotifier.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/1584/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Tests pass but was run on top of other patches (FLUME-706, revert FLUME-656). This should be orthogonal to those changes. Full suite currently running with just this patch.
bq.
bq.
bq. Thanks,
bq.
bq. jmhsieh
bq.
bq.
> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
> Key: FLUME-746
> URL: https://issues.apache.org/jira/browse/FLUME-746
> Project: Flume
> Issue Type: Bug
> Components: Node
> Affects Versions: v0.9.4
> Reporter: Jonathan Hsieh
> Assignee: Jonathan Hsieh
> Labels: wal
> Fix For: v0.9.5
>
> Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT.
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new* warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
> others -> others // other states are unexpected and remain in their state.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (FLUME-746) Correct the behavior and logging
messages about states transition of wal chunks on retry
Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Hsieh updated FLUME-746:
---------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
> Key: FLUME-746
> URL: https://issues.apache.org/jira/browse/FLUME-746
> Project: Flume
> Issue Type: Bug
> Components: Node
> Affects Versions: v0.9.4
> Reporter: Jonathan Hsieh
> Assignee: Jonathan Hsieh
> Labels: wal
> Fix For: v0.9.5
>
> Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch, 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT.
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new* warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
> others -> others // other states are unexpected and remain in their state.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (FLUME-746) Correct the behavior and logging
messages about states transition of wal chunks on retry
Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/FLUME-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Hsieh updated FLUME-746:
---------------------------------
Assignee: Jonathan Hsieh
Status: Patch Available (was: Open)
review here https://reviews.apache.org/r/1584/
> Correct the behavior and logging messages about states transition of wal chunks on retry
> ----------------------------------------------------------------------------------------
>
> Key: FLUME-746
> URL: https://issues.apache.org/jira/browse/FLUME-746
> Project: Flume
> Issue Type: Bug
> Components: Node
> Affects Versions: v0.9.4
> Reporter: Jonathan Hsieh
> Assignee: Jonathan Hsieh
> Labels: wal
> Fix For: v0.9.5
>
> Attachments: 0001-FLUME-746-Correct-the-behavior-and-logging-messages-.patch
>
>
> Flume logs often have scary looking log messages that look like this:
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> 2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
> 2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
> This because previously we only expected deal with three states:
> LOGGED, SENDING, SENT.
> We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
> Current state, state to transition to.
> IMPORT -> IMPORT // *new* warn that this is an odd case.
> WRITING -> WRITING // *new* warn that this is an odd case.
> LOGGED -> LOGGED // *This is a change, used to be considered race* -- This is legal -- f it is log, it is slated for retry so stay put.
> SENDING -> SENDING // *This is the change, used to be considered race* -- This is legal -- if we are sending the chunk already, keep sending it, no need to retry
> SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
> E2EACKED -> E2EACKED // *new* acked already means it is good. No need to retry.
> others -> others // other states are unexpected and remain in their state.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira