You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Brock Noland <br...@cloudera.com> on 2012/02/21 22:51:32 UTC

Review Request: FLUME-985 All HDFS Operations in HDFSEventSink should have a timeout

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3988/
-----------------------------------------------------------

Review request for Flume.


Summary
-------

1) All HDFS actions are now done in async mode
2) If an HDFS append timesout, the file is closed and reopened.
3) Batching is now handled by BucketWriter which was always aware of the batch size.


This addresses bug FLUME-985.
    https://issues.apache.org/jira/browse/FLUME-985


Diffs
-----

  flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java 19b2559 
  flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java 8a6740f 
  flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java 7d8ee8a 
  flume-ng-sinks/flume-hdfs-sink/pom.xml f27851e 
  flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java 45769f6 
  flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java 3da90a5 

Diff: https://reviews.apache.org/r/3988/diff


Testing
-------

1) Unit tests were added for close/reopen scenario.
2) All unit tests pass
3) I manually verified this patch improved FlumeNG's behavior when the datanode it's writing to is restarted. In the past FlumeNG had to be restarted, now Flume moves on and starts writing to a new file.


Thanks,

Brock


Re: Review Request: FLUME-985 All HDFS Operations in HDFSEventSink should have a timeout

Posted by Arvind Prabhakar <ar...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3988/#review6311
-----------------------------------------------------------

Ship it!


+1

- Arvind


On 2012-03-23 20:55:21, Brock Noland wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3988/
> -----------------------------------------------------------
> 
> (Updated 2012-03-23 20:55:21)
> 
> 
> Review request for Flume.
> 
> 
> Summary
> -------
> 
> 1) All HDFS actions are now done in async mode
> 2) If an HDFS append timesout, the file is closed and reopened.
> 3) Batching is now handled by BucketWriter which was always aware of the batch size.
> 
> 
> This addresses bug FLUME-985.
>     https://issues.apache.org/jira/browse/FLUME-985
> 
> 
> Diffs
> -----
> 
>   flume-ng-sinks/flume-hdfs-sink/pom.xml bef2ca7 
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java 45769f6 
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java 1fdaddd 
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java 19b2559 
>   flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java 8a6740f 
>   flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadWriterFactory.java b067c00 
>   flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java 8fa72a1 
> 
> Diff: https://reviews.apache.org/r/3988/diff
> 
> 
> Testing
> -------
> 
> 1) Unit tests were added for close/reopen scenario.
> 2) All unit tests pass
> 3) I manually verified this patch improved FlumeNG's behavior when the datanode it's writing to is restarted. In the past FlumeNG had to be restarted, now Flume moves on and starts writing to a new file.
> 
> 
> Thanks,
> 
> Brock
> 
>


Re: Review Request: FLUME-985 All HDFS Operations in HDFSEventSink should have a timeout

Posted by Brock Noland <br...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3988/
-----------------------------------------------------------

(Updated 2012-03-23 20:55:21.762184)


Review request for Flume.


Changes
-------

Rebased patch attached. Attaching to JIRA for commit.


Summary
-------

1) All HDFS actions are now done in async mode
2) If an HDFS append timesout, the file is closed and reopened.
3) Batching is now handled by BucketWriter which was always aware of the batch size.


This addresses bug FLUME-985.
    https://issues.apache.org/jira/browse/FLUME-985


Diffs (updated)
-----

  flume-ng-sinks/flume-hdfs-sink/pom.xml bef2ca7 
  flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java 45769f6 
  flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java 1fdaddd 
  flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java 19b2559 
  flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java 8a6740f 
  flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadWriterFactory.java b067c00 
  flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java 8fa72a1 

Diff: https://reviews.apache.org/r/3988/diff


Testing
-------

1) Unit tests were added for close/reopen scenario.
2) All unit tests pass
3) I manually verified this patch improved FlumeNG's behavior when the datanode it's writing to is restarted. In the past FlumeNG had to be restarted, now Flume moves on and starts writing to a new file.


Thanks,

Brock


Re: Review Request: FLUME-985 All HDFS Operations in HDFSEventSink should have a timeout

Posted by Prasad Mujumdar <pr...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3988/#review6220
-----------------------------------------------------------

Ship it!


sorry I didn't look at this earlier. 
Looks fine to me. Please see if the code needs to be rebased.

- Prasad


On 2012-02-21 21:51:32, Brock Noland wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3988/
> -----------------------------------------------------------
> 
> (Updated 2012-02-21 21:51:32)
> 
> 
> Review request for Flume.
> 
> 
> Summary
> -------
> 
> 1) All HDFS actions are now done in async mode
> 2) If an HDFS append timesout, the file is closed and reopened.
> 3) Batching is now handled by BucketWriter which was always aware of the batch size.
> 
> 
> This addresses bug FLUME-985.
>     https://issues.apache.org/jira/browse/FLUME-985
> 
> 
> Diffs
> -----
> 
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java 19b2559 
>   flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java 8a6740f 
>   flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java 7d8ee8a 
>   flume-ng-sinks/flume-hdfs-sink/pom.xml f27851e 
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java 45769f6 
>   flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java 3da90a5 
> 
> Diff: https://reviews.apache.org/r/3988/diff
> 
> 
> Testing
> -------
> 
> 1) Unit tests were added for close/reopen scenario.
> 2) All unit tests pass
> 3) I manually verified this patch improved FlumeNG's behavior when the datanode it's writing to is restarted. In the past FlumeNG had to be restarted, now Flume moves on and starts writing to a new file.
> 
> 
> Thanks,
> 
> Brock
> 
>