You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flume.apache.org by "Prasad Mujumdar (Created) (JIRA)" <ji...@apache.org> on 2011/12/15 20:00:31 UTC

[jira] [Created] (FLUME-883) Flume E2E sink could send incorrect ACKs if there are HDFS file close errors

Flume E2E sink could send incorrect ACKs if there are HDFS file close errors 
-----------------------------------------------------------------------------

                 Key: FLUME-883
                 URL: https://issues.apache.org/jira/browse/FLUME-883
             Project: Flume
          Issue Type: Bug
          Components: Sinks+Sources
    Affects Versions: v0.9.4
            Reporter: Prasad Mujumdar
            Assignee: Prasad Mujumdar
             Fix For: v0.9.5


The E2E collector sink saves the batch tags as the batches are passed to the downstream sinks. The ACKs are flushed when the roller close the file. Currently for the HDFS sink, the close is the only operation that  guarantees that data is safely stored. Hence the acks are sent on close. If for some reason, the writes fail then we don't send the acks assuming the data is lost. The E2E mechanism then resends the data.
The problem is that if the close fails then we don't clear the accumulated acks for that current rolltag. Hence its possible that the next successful roll could send those acks and hence the batch will not be resent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (FLUME-883) Flume E2E sink could send incorrect ACKs if there are HDFS file close errors

Posted by "Prasad Mujumdar (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/FLUME-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prasad Mujumdar updated FLUME-883:
----------------------------------

    Attachment: Flume-883.patch.1
    
> Flume E2E sink could send incorrect ACKs if there are HDFS file close errors 
> -----------------------------------------------------------------------------
>
>                 Key: FLUME-883
>                 URL: https://issues.apache.org/jira/browse/FLUME-883
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Prasad Mujumdar
>            Assignee: Prasad Mujumdar
>             Fix For: v0.9.5
>
>         Attachments: Flume-883.patch.1
>
>
> The E2E collector sink saves the batch tags as the batches are passed to the downstream sinks. The ACKs are flushed when the roller close the file. Currently for the HDFS sink, the close is the only operation that  guarantees that data is safely stored. Hence the acks are sent on close. If for some reason, the writes fail then we don't send the acks assuming the data is lost. The E2E mechanism then resends the data.
> The problem is that if the close fails then we don't clear the accumulated acks for that current rolltag. Hence its possible that the next successful roll could send those acks and hence the batch will not be resent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (FLUME-883) Flume E2E sink could send incorrect ACKs if there are HDFS file close errors

Posted by "Prasad Mujumdar (Resolved) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/FLUME-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prasad Mujumdar resolved FLUME-883.
-----------------------------------

    Resolution: Fixed

Patch committed to trunk
                
> Flume E2E sink could send incorrect ACKs if there are HDFS file close errors 
> -----------------------------------------------------------------------------
>
>                 Key: FLUME-883
>                 URL: https://issues.apache.org/jira/browse/FLUME-883
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Prasad Mujumdar
>            Assignee: Prasad Mujumdar
>             Fix For: v0.9.5
>
>         Attachments: Flume-883.patch.1
>
>
> The E2E collector sink saves the batch tags as the batches are passed to the downstream sinks. The ACKs are flushed when the roller close the file. Currently for the HDFS sink, the close is the only operation that  guarantees that data is safely stored. Hence the acks are sent on close. If for some reason, the writes fail then we don't send the acks assuming the data is lost. The E2E mechanism then resends the data.
> The problem is that if the close fails then we don't clear the accumulated acks for that current rolltag. Hence its possible that the next successful roll could send those acks and hence the batch will not be resent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-883) Flume E2E sink could send incorrect ACKs if there are HDFS file close errors

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/FLUME-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170641#comment-13170641 ] 

jiraposter@reviews.apache.org commented on FLUME-883:
-----------------------------------------------------

bq.  On 2011-12-16 00:33:38, Eric Sammer wrote:
bq.  > flume-core/src/main/java/com/cloudera/flume/collector/CollectorSink.java, line 193
bq.  > <https://reviews.apache.org/r/3214/diff/2/?file=64831#file64831line193>
bq.  >
bq.  >     Mark private?

will do. thanks

bq.  On 2011-12-16 00:33:38, Eric Sammer wrote:
bq.  > flume-core/src/main/java/com/cloudera/flume/collector/CollectorSink.java, line 196
bq.  > <https://reviews.apache.org/r/3214/diff/2/?file=64831#file64831line196>
bq.  >
bq.  >     I've mentally paged out the lock ordering of this code so I can't definitively state there's no deadlock here. I have to defer to you on this one. Just something to double (or triple) check.

hmm .. I can't think of a deadlock case, anyway the patch has not changed the locking logic. At least I am not introducing a new deadlock :)

- Prasad

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3214/#review3935
-----------------------------------------------------------

On 2011-12-15 21:24:54, Prasad Mujumdar wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3214/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-12-15 21:24:54)
bq.  
bq.  
bq.  Review request for Eric Sammer.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  The E2E collector sink saves the batch tags as the batches are passed to the downstream sinks. The ACKs are flushed when the roller close the file. Currently for the HDFS sink, the close is the only operation that guarantees that data is safely stored. Hence the acks are sent on close. If for some reason, the writes fail then we don't send the acks assuming the data is lost. The E2E mechanism then resends the data.
bq.  The problem is that if the close fails then we don't clear the accumulated acks for that current rolltag. Hence its possible that the next successful roll could send those acks and hence the batch will not be resent. 
bq.  
bq.  The fix is to clear the unsent acks when there's an IOException in close. Also added a config property to disable the behavior for sinks where different close semantics apply.
bq.  
bq.  
bq.  This addresses bug FLUME-883.
bq.      https://issues.apache.org/jira/browse/FLUME-883
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    flume-core/src/main/java/com/cloudera/flume/collector/CollectorSink.java 20f60c6 
bq.    flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java aeceb15 
bq.    flume-core/src/test/java/com/cloudera/flume/collector/TestCollectorSink.java e735f38 
bq.  
bq.  Diff: https://reviews.apache.org/r/3214/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Added new test case.
bq.  Ran CollectorSink tests, will run rest of the regression tests.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Prasad
bq.  
bq.

> Flume E2E sink could send incorrect ACKs if there are HDFS file close errors 
> -----------------------------------------------------------------------------
>
>                 Key: FLUME-883
>                 URL: https://issues.apache.org/jira/browse/FLUME-883
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Prasad Mujumdar
>            Assignee: Prasad Mujumdar
>             Fix For: v0.9.5
>
>
> The E2E collector sink saves the batch tags as the batches are passed to the downstream sinks. The ACKs are flushed when the roller close the file. Currently for the HDFS sink, the close is the only operation that  guarantees that data is safely stored. Hence the acks are sent on close. If for some reason, the writes fail then we don't send the acks assuming the data is lost. The E2E mechanism then resends the data.
> The problem is that if the close fails then we don't clear the accumulated acks for that current rolltag. Hence its possible that the next successful roll could send those acks and hence the batch will not be resent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-883) Flume E2E sink could send incorrect ACKs if there are HDFS file close errors

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/FLUME-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170629#comment-13170629 ] 

jiraposter@reviews.apache.org commented on FLUME-883:
-----------------------------------------------------

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3214/#review3935
-----------------------------------------------------------

Ship it!

Looks right to me, save for my lack of memory of lock acquisition ordering.

flume-core/src/main/java/com/cloudera/flume/collector/CollectorSink.java
<https://reviews.apache.org/r/3214/#comment8875>

    Super nit: if (cleanupOnClose) is the same as if (cleanupOnClose == true). ;)

flume-core/src/main/java/com/cloudera/flume/collector/CollectorSink.java
<https://reviews.apache.org/r/3214/#comment8877>

    Mark private?

flume-core/src/main/java/com/cloudera/flume/collector/CollectorSink.java
<https://reviews.apache.org/r/3214/#comment8878>

    I've mentally paged out the lock ordering of this code so I can't definitively state there's no deadlock here. I have to defer to you on this one. Just something to double (or triple) check.

flume-core/src/main/java/com/cloudera/flume/collector/CollectorSink.java
<https://reviews.apache.org/r/3214/#comment8876>

    Same super nit on if (... == true).

- Eric

On 2011-12-15 21:24:54, Prasad Mujumdar wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3214/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-12-15 21:24:54)
bq.  
bq.  
bq.  Review request for Eric Sammer.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  The E2E collector sink saves the batch tags as the batches are passed to the downstream sinks. The ACKs are flushed when the roller close the file. Currently for the HDFS sink, the close is the only operation that guarantees that data is safely stored. Hence the acks are sent on close. If for some reason, the writes fail then we don't send the acks assuming the data is lost. The E2E mechanism then resends the data.
bq.  The problem is that if the close fails then we don't clear the accumulated acks for that current rolltag. Hence its possible that the next successful roll could send those acks and hence the batch will not be resent. 
bq.  
bq.  The fix is to clear the unsent acks when there's an IOException in close. Also added a config property to disable the behavior for sinks where different close semantics apply.
bq.  
bq.  
bq.  This addresses bug FLUME-883.
bq.      https://issues.apache.org/jira/browse/FLUME-883
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    flume-core/src/main/java/com/cloudera/flume/collector/CollectorSink.java 20f60c6 
bq.    flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java aeceb15 
bq.    flume-core/src/test/java/com/cloudera/flume/collector/TestCollectorSink.java e735f38 
bq.  
bq.  Diff: https://reviews.apache.org/r/3214/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Added new test case.
bq.  Ran CollectorSink tests, will run rest of the regression tests.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Prasad
bq.  
bq.

> Flume E2E sink could send incorrect ACKs if there are HDFS file close errors 
> -----------------------------------------------------------------------------
>
>                 Key: FLUME-883
>                 URL: https://issues.apache.org/jira/browse/FLUME-883
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Prasad Mujumdar
>            Assignee: Prasad Mujumdar
>             Fix For: v0.9.5
>
>
> The E2E collector sink saves the batch tags as the batches are passed to the downstream sinks. The ACKs are flushed when the roller close the file. Currently for the HDFS sink, the close is the only operation that  guarantees that data is safely stored. Hence the acks are sent on close. If for some reason, the writes fail then we don't send the acks assuming the data is lost. The E2E mechanism then resends the data.
> The problem is that if the close fails then we don't clear the accumulated acks for that current rolltag. Hence its possible that the next successful roll could send those acks and hence the batch will not be resent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-883) Flume E2E sink could send incorrect ACKs if there are HDFS file close errors

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/FLUME-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170500#comment-13170500 ] 

jiraposter@reviews.apache.org commented on FLUME-883:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3214/
-----------------------------------------------------------

Review request for Eric Sammer.


Summary
-------

The E2E collector sink saves the batch tags as the batches are passed to the downstream sinks. The ACKs are flushed when the roller close the file. Currently for the HDFS sink, the close is the only operation that guarantees that data is safely stored. Hence the acks are sent on close. If for some reason, the writes fail then we don't send the acks assuming the data is lost. The E2E mechanism then resends the data.
The problem is that if the close fails then we don't clear the accumulated acks for that current rolltag. Hence its possible that the next successful roll could send those acks and hence the batch will not be resent. 

The fix is to clear the unsent acks when there's an IOException in close. Also added a config property to disable the behavior for sinks where different close semantics apply.


This addresses bug FLUME-883.
    https://issues.apache.org/jira/browse/FLUME-883


Diffs
-----

  flume-core/src/main/java/com/cloudera/flume/collector/CollectorSink.java 20f60c6 
  flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java aeceb15 
  flume-core/src/test/java/com/cloudera/flume/collector/TestCollectorSink.java e735f38 

Diff: https://reviews.apache.org/r/3214/diff


Testing
-------

Added new test case.
Ran CollectorSink tests, will run rest of the regression tests.


Thanks,

Prasad


                
> Flume E2E sink could send incorrect ACKs if there are HDFS file close errors 
> -----------------------------------------------------------------------------
>
>                 Key: FLUME-883
>                 URL: https://issues.apache.org/jira/browse/FLUME-883
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Prasad Mujumdar
>            Assignee: Prasad Mujumdar
>             Fix For: v0.9.5
>
>
> The E2E collector sink saves the batch tags as the batches are passed to the downstream sinks. The ACKs are flushed when the roller close the file. Currently for the HDFS sink, the close is the only operation that  guarantees that data is safely stored. Hence the acks are sent on close. If for some reason, the writes fail then we don't send the acks assuming the data is lost. The E2E mechanism then resends the data.
> The problem is that if the close fails then we don't clear the accumulated acks for that current rolltag. Hence its possible that the next successful roll could send those acks and hence the batch will not be resent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira