You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Brock Noland (Created) (JIRA)" <ji...@apache.org> on 2012/02/21 19:41:48 UTC

[jira] [Created] (FLUME-985) All HDFS Operations in HDFSEventSink should have a timeout

All HDFS Operations in HDFSEventSink should have a timeout
----------------------------------------------------------

                 Key: FLUME-985
                 URL: https://issues.apache.org/jira/browse/FLUME-985
             Project: Flume
          Issue Type: Improvement
          Components: Sinks+Sources
    Affects Versions: v1.0.0
            Reporter: Brock Noland
            Assignee: Brock Noland


In FLUME-871 appends were made asynchronous so we could time them out. All HDFS Operations should be done this same way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-985) All HDFS Operations in HDFSEventSink should have a timeout

Posted by "Brock Noland (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brock Noland updated FLUME-985:
-------------------------------

    Status: Patch Available  (was: Open)

Marking "Patch Available"
                
> All HDFS Operations in HDFSEventSink should have a timeout
> ----------------------------------------------------------
>
>                 Key: FLUME-985
>                 URL: https://issues.apache.org/jira/browse/FLUME-985
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.0.0
>            Reporter: Brock Noland
>            Assignee: Brock Noland
>         Attachments: FLUME-985-0.patch
>
>
> In FLUME-871 appends were made asynchronous so we could time them out. All HDFS Operations should be done this same way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-985) All HDFS Operations in HDFSEventSink should have a timeout

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237206#comment-13237206 ] 

Hudson commented on FLUME-985:
------------------------------

Integrated in flume-trunk #143 (See [https://builds.apache.org/job/flume-trunk/143/])
    FLUME-985. All HDFS Operations should have a timeout.

(Brock Noland via Arvind Prabhakar) (Revision 1304600)

     Result = SUCCESS
arvind : http://svn.apache.org/viewvc/?view=rev&rev=1304600
Files : 
* /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/pom.xml
* /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java
* /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java
* /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java
* /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java
* /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadWriterFactory.java
* /incubator/flume/trunk/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java

                
> All HDFS Operations in HDFSEventSink should have a timeout
> ----------------------------------------------------------
>
>                 Key: FLUME-985
>                 URL: https://issues.apache.org/jira/browse/FLUME-985
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.0.0
>            Reporter: Brock Noland
>            Assignee: Brock Noland
>             Fix For: v1.2.0
>
>         Attachments: FLUME-985-0.patch, FLUME-985-1.patch
>
>
> In FLUME-871 appends were made asynchronous so we could time them out. All HDFS Operations should be done this same way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-985) All HDFS Operations in HDFSEventSink should have a timeout

Posted by "Brock Noland (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brock Noland updated FLUME-985:
-------------------------------

    Attachment: FLUME-985-1.patch

Rebased patch is attached.
                
> All HDFS Operations in HDFSEventSink should have a timeout
> ----------------------------------------------------------
>
>                 Key: FLUME-985
>                 URL: https://issues.apache.org/jira/browse/FLUME-985
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.0.0
>            Reporter: Brock Noland
>            Assignee: Brock Noland
>         Attachments: FLUME-985-0.patch, FLUME-985-1.patch
>
>
> In FLUME-871 appends were made asynchronous so we could time them out. All HDFS Operations should be done this same way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-985) All HDFS Operations in HDFSEventSink should have a timeout

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237125#comment-13237125 ] 

jiraposter@reviews.apache.org commented on FLUME-985:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3988/#review6311
-----------------------------------------------------------

Ship it!


+1

- Arvind


On 2012-03-23 20:55:21, Brock Noland wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3988/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-23 20:55:21)
bq.  
bq.  
bq.  Review request for Flume.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  1) All HDFS actions are now done in async mode
bq.  2) If an HDFS append timesout, the file is closed and reopened.
bq.  3) Batching is now handled by BucketWriter which was always aware of the batch size.
bq.  
bq.  
bq.  This addresses bug FLUME-985.
bq.      https://issues.apache.org/jira/browse/FLUME-985
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    flume-ng-sinks/flume-hdfs-sink/pom.xml bef2ca7 
bq.    flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java 45769f6 
bq.    flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java 1fdaddd 
bq.    flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java 19b2559 
bq.    flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java 8a6740f 
bq.    flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadWriterFactory.java b067c00 
bq.    flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java 8fa72a1 
bq.  
bq.  Diff: https://reviews.apache.org/r/3988/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  1) Unit tests were added for close/reopen scenario.
bq.  2) All unit tests pass
bq.  3) I manually verified this patch improved FlumeNG's behavior when the datanode it's writing to is restarted. In the past FlumeNG had to be restarted, now Flume moves on and starts writing to a new file.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Brock
bq.  
bq.


                
> All HDFS Operations in HDFSEventSink should have a timeout
> ----------------------------------------------------------
>
>                 Key: FLUME-985
>                 URL: https://issues.apache.org/jira/browse/FLUME-985
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.0.0
>            Reporter: Brock Noland
>            Assignee: Brock Noland
>             Fix For: v1.2.0
>
>         Attachments: FLUME-985-0.patch, FLUME-985-1.patch
>
>
> In FLUME-871 appends were made asynchronous so we could time them out. All HDFS Operations should be done this same way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-985) All HDFS Operations in HDFSEventSink should have a timeout

Posted by "Arvind Prabhakar (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arvind Prabhakar updated FLUME-985:
-----------------------------------

       Resolution: Fixed
    Fix Version/s: v1.2.0
           Status: Resolved  (was: Patch Available)

Patch committed. Thanks Brock!
                
> All HDFS Operations in HDFSEventSink should have a timeout
> ----------------------------------------------------------
>
>                 Key: FLUME-985
>                 URL: https://issues.apache.org/jira/browse/FLUME-985
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.0.0
>            Reporter: Brock Noland
>            Assignee: Brock Noland
>             Fix For: v1.2.0
>
>         Attachments: FLUME-985-0.patch, FLUME-985-1.patch
>
>
> In FLUME-871 appends were made asynchronous so we could time them out. All HDFS Operations should be done this same way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-985) All HDFS Operations in HDFSEventSink should have a timeout

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212997#comment-13212997 ] 

jiraposter@reviews.apache.org commented on FLUME-985:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3988/
-----------------------------------------------------------

Review request for Flume.


Summary
-------

1) All HDFS actions are now done in async mode
2) If an HDFS append timesout, the file is closed and reopened.
3) Batching is now handled by BucketWriter which was always aware of the batch size.


This addresses bug FLUME-985.
    https://issues.apache.org/jira/browse/FLUME-985


Diffs
-----

  flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java 19b2559 
  flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java 8a6740f 
  flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java 7d8ee8a 
  flume-ng-sinks/flume-hdfs-sink/pom.xml f27851e 
  flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java 45769f6 
  flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java 3da90a5 

Diff: https://reviews.apache.org/r/3988/diff


Testing
-------

1) Unit tests were added for close/reopen scenario.
2) All unit tests pass
3) I manually verified this patch improved FlumeNG's behavior when the datanode it's writing to is restarted. In the past FlumeNG had to be restarted, now Flume moves on and starts writing to a new file.


Thanks,

Brock


                
> All HDFS Operations in HDFSEventSink should have a timeout
> ----------------------------------------------------------
>
>                 Key: FLUME-985
>                 URL: https://issues.apache.org/jira/browse/FLUME-985
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.0.0
>            Reporter: Brock Noland
>            Assignee: Brock Noland
>         Attachments: FLUME-985-0.patch
>
>
> In FLUME-871 appends were made asynchronous so we could time them out. All HDFS Operations should be done this same way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-985) All HDFS Operations in HDFSEventSink should have a timeout

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237112#comment-13237112 ] 

jiraposter@reviews.apache.org commented on FLUME-985:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3988/
-----------------------------------------------------------

(Updated 2012-03-23 20:55:21.762184)


Review request for Flume.


Changes
-------

Rebased patch attached. Attaching to JIRA for commit.


Summary
-------

1) All HDFS actions are now done in async mode
2) If an HDFS append timesout, the file is closed and reopened.
3) Batching is now handled by BucketWriter which was always aware of the batch size.


This addresses bug FLUME-985.
    https://issues.apache.org/jira/browse/FLUME-985


Diffs (updated)
-----

  flume-ng-sinks/flume-hdfs-sink/pom.xml bef2ca7 
  flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java 45769f6 
  flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java 1fdaddd 
  flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java 19b2559 
  flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java 8a6740f 
  flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadWriterFactory.java b067c00 
  flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java 8fa72a1 

Diff: https://reviews.apache.org/r/3988/diff


Testing
-------

1) Unit tests were added for close/reopen scenario.
2) All unit tests pass
3) I manually verified this patch improved FlumeNG's behavior when the datanode it's writing to is restarted. In the past FlumeNG had to be restarted, now Flume moves on and starts writing to a new file.


Thanks,

Brock


                
> All HDFS Operations in HDFSEventSink should have a timeout
> ----------------------------------------------------------
>
>                 Key: FLUME-985
>                 URL: https://issues.apache.org/jira/browse/FLUME-985
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.0.0
>            Reporter: Brock Noland
>            Assignee: Brock Noland
>         Attachments: FLUME-985-0.patch, FLUME-985-1.patch
>
>
> In FLUME-871 appends were made asynchronous so we could time them out. All HDFS Operations should be done this same way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-985) All HDFS Operations in HDFSEventSink should have a timeout

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235465#comment-13235465 ] 

jiraposter@reviews.apache.org commented on FLUME-985:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3988/#review6220
-----------------------------------------------------------

Ship it!


sorry I didn't look at this earlier. 
Looks fine to me. Please see if the code needs to be rebased.

- Prasad


On 2012-02-21 21:51:32, Brock Noland wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3988/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-21 21:51:32)
bq.  
bq.  
bq.  Review request for Flume.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  1) All HDFS actions are now done in async mode
bq.  2) If an HDFS append timesout, the file is closed and reopened.
bq.  3) Batching is now handled by BucketWriter which was always aware of the batch size.
bq.  
bq.  
bq.  This addresses bug FLUME-985.
bq.      https://issues.apache.org/jira/browse/FLUME-985
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java 19b2559 
bq.    flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java 8a6740f 
bq.    flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java 7d8ee8a 
bq.    flume-ng-sinks/flume-hdfs-sink/pom.xml f27851e 
bq.    flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java 45769f6 
bq.    flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java 3da90a5 
bq.  
bq.  Diff: https://reviews.apache.org/r/3988/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  1) Unit tests were added for close/reopen scenario.
bq.  2) All unit tests pass
bq.  3) I manually verified this patch improved FlumeNG's behavior when the datanode it's writing to is restarted. In the past FlumeNG had to be restarted, now Flume moves on and starts writing to a new file.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Brock
bq.  
bq.


                
> All HDFS Operations in HDFSEventSink should have a timeout
> ----------------------------------------------------------
>
>                 Key: FLUME-985
>                 URL: https://issues.apache.org/jira/browse/FLUME-985
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.0.0
>            Reporter: Brock Noland
>            Assignee: Brock Noland
>         Attachments: FLUME-985-0.patch
>
>
> In FLUME-871 appends were made asynchronous so we could time them out. All HDFS Operations should be done this same way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-985) All HDFS Operations in HDFSEventSink should have a timeout

Posted by "Brock Noland (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brock Noland updated FLUME-985:
-------------------------------

    Attachment: FLUME-985-0.patch

attaching current patch.
                
> All HDFS Operations in HDFSEventSink should have a timeout
> ----------------------------------------------------------
>
>                 Key: FLUME-985
>                 URL: https://issues.apache.org/jira/browse/FLUME-985
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.0.0
>            Reporter: Brock Noland
>            Assignee: Brock Noland
>         Attachments: FLUME-985-0.patch
>
>
> In FLUME-871 appends were made asynchronous so we could time them out. All HDFS Operations should be done this same way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-985) All HDFS Operations in HDFSEventSink should have a timeout

Posted by "Brock Noland (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237128#comment-13237128 ] 

Brock Noland commented on FLUME-985:
------------------------------------

Thanks!
                
> All HDFS Operations in HDFSEventSink should have a timeout
> ----------------------------------------------------------
>
>                 Key: FLUME-985
>                 URL: https://issues.apache.org/jira/browse/FLUME-985
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.0.0
>            Reporter: Brock Noland
>            Assignee: Brock Noland
>             Fix For: v1.2.0
>
>         Attachments: FLUME-985-0.patch, FLUME-985-1.patch
>
>
> In FLUME-871 appends were made asynchronous so we could time them out. All HDFS Operations should be done this same way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira