You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Johny Rufus John <jr...@cloudera.com> on 2015/03/02 23:26:43 UTC

Review Request 31643: S3 Source

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/
-----------------------------------------------------------

Review request for Flume.


Bugs: FLUME-2437
    https://issues.apache.org/jira/browse/FLUME-2437


Repository: flume-git


Description
-------

S3 Source, initial version


Diffs
-----

  flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION 
  flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION 
  flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION 
  flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb 
  flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION 
  flume-ng-dist/pom.xml a083fe2 
  flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION 
  flume-ng-sources/pom.xml ab8eca4 
  pom.xml ea7ffe3 

Diff: https://reviews.apache.org/r/31643/diff/


Testing
-------

TestResettableGenericInputStream and 
TestS3Source 

Manual testing of scenarios:
1. Created multiple files in S3 Bucket - make sure the source processes all the files
2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked


Thanks,

Johny Rufus John


Re: Review Request 31643: S3 Source

Posted by Johny Rufus John <jr...@cloudera.com>.

> On April 10, 2015, 1:04 a.m., Ashish Paliwal wrote:
> > flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java, line 107
> > <https://reviews.apache.org/r/31643/diff/1/?file=882380#file882380line107>
> >
> >     Sorry don't understand why new stream needs to be created. Also couldn't find the usage as mentioned, can you point to the line numbers?

@ Lines 405 and 486, we recreate the stream


- Johny Rufus


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/#review79616
-----------------------------------------------------------


On March 2, 2015, 10:26 p.m., Johny Rufus John wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31643/
> -----------------------------------------------------------
> 
> (Updated March 2, 2015, 10:26 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2437
>     https://issues.apache.org/jira/browse/FLUME-2437
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> S3 Source, initial version
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION 
>   flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb 
>   flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION 
>   flume-ng-dist/pom.xml a083fe2 
>   flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION 
>   flume-ng-sources/pom.xml ab8eca4 
>   pom.xml ea7ffe3 
> 
> Diff: https://reviews.apache.org/r/31643/diff/
> 
> 
> Testing
> -------
> 
> TestResettableGenericInputStream and 
> TestS3Source 
> 
> Manual testing of scenarios:
> 1. Created multiple files in S3 Bucket - make sure the source processes all the files
> 2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
> 3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
> 4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
> 
> 
> Thanks,
> 
> Johny Rufus John
> 
>


Re: Review Request 31643: S3 Source

Posted by Ashish Paliwal <pa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/#review79616
-----------------------------------------------------------



flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java
<https://reviews.apache.org/r/31643/#comment129117>

    Sorry don't understand why new stream needs to be created. Also couldn't find the usage as mentioned, can you point to the line numbers?


- Ashish Paliwal


On March 2, 2015, 10:26 p.m., Johny Rufus John wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31643/
> -----------------------------------------------------------
> 
> (Updated March 2, 2015, 10:26 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2437
>     https://issues.apache.org/jira/browse/FLUME-2437
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> S3 Source, initial version
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION 
>   flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb 
>   flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION 
>   flume-ng-dist/pom.xml a083fe2 
>   flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION 
>   flume-ng-sources/pom.xml ab8eca4 
>   pom.xml ea7ffe3 
> 
> Diff: https://reviews.apache.org/r/31643/diff/
> 
> 
> Testing
> -------
> 
> TestResettableGenericInputStream and 
> TestS3Source 
> 
> Manual testing of scenarios:
> 1. Created multiple files in S3 Bucket - make sure the source processes all the files
> 2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
> 3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
> 4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
> 
> 
> Thanks,
> 
> Johny Rufus John
> 
>


Re: Review Request 31643: S3 Source

Posted by Johny Rufus John <jr...@cloudera.com>.

> On March 15, 2015, 10:49 a.m., Pawel wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java, line 31
> > <https://reviews.apache.org/r/31643/diff/1/?file=882386#file882386line31>
> >
> >     We want to have MapDB instead of eg. zookeeper because zookeeper is optional, right, or there are other reasons?
> >     Maybe also worth to have something like ZookeeperMetadataBackingStore (in a separate task)?

As of now, we maintain the state locally using MapDb, we definitely can have a seperate task to have the ZookekerBased MetadataStore


> On March 15, 2015, 10:49 a.m., Pawel wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java, line 265
> > <https://reviews.apache.org/r/31643/diff/1/?file=882390#file882390line265>
> >
> >     Do we know how objects are sorted? Date of creation asc?
> >     
> >     To make sure, maybe it's worth to sort list of objectSummaries by "getLastModified()"

S3 api returns objects in a bucket sorted by their names, and we use the same order


- Johny Rufus


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/#review76504
-----------------------------------------------------------


On March 2, 2015, 10:26 p.m., Johny Rufus John wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31643/
> -----------------------------------------------------------
> 
> (Updated March 2, 2015, 10:26 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2437
>     https://issues.apache.org/jira/browse/FLUME-2437
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> S3 Source, initial version
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION 
>   flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb 
>   flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION 
>   flume-ng-dist/pom.xml a083fe2 
>   flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION 
>   flume-ng-sources/pom.xml ab8eca4 
>   pom.xml ea7ffe3 
> 
> Diff: https://reviews.apache.org/r/31643/diff/
> 
> 
> Testing
> -------
> 
> TestResettableGenericInputStream and 
> TestS3Source 
> 
> Manual testing of scenarios:
> 1. Created multiple files in S3 Bucket - make sure the source processes all the files
> 2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
> 3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
> 4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
> 
> 
> Thanks,
> 
> Johny Rufus John
> 
>


Re: Review Request 31643: S3 Source

Posted by Pawel <pa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/#review76504
-----------------------------------------------------------



flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java
<https://reviews.apache.org/r/31643/#comment124057>

    We want to have MapDB instead of eg. zookeeper because zookeeper is optional, right, or there are other reasons?
    Maybe also worth to have something like ZookeeperMetadataBackingStore (in a separate task)?



flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java
<https://reviews.apache.org/r/31643/#comment124058>

    Do we know how objects are sorted? Date of creation asc?
    
    To make sure, maybe it's worth to sort list of objectSummaries by "getLastModified()"


- Pawel


On March 2, 2015, 10:26 p.m., Johny Rufus John wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31643/
> -----------------------------------------------------------
> 
> (Updated March 2, 2015, 10:26 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2437
>     https://issues.apache.org/jira/browse/FLUME-2437
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> S3 Source, initial version
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION 
>   flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb 
>   flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION 
>   flume-ng-dist/pom.xml a083fe2 
>   flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION 
>   flume-ng-sources/pom.xml ab8eca4 
>   pom.xml ea7ffe3 
> 
> Diff: https://reviews.apache.org/r/31643/diff/
> 
> 
> Testing
> -------
> 
> TestResettableGenericInputStream and 
> TestS3Source 
> 
> Manual testing of scenarios:
> 1. Created multiple files in S3 Bucket - make sure the source processes all the files
> 2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
> 3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
> 4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
> 
> 
> Thanks,
> 
> Johny Rufus John
> 
>


Re: Review Request 31643: S3 Source

Posted by Johny Rufus John <jr...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/
-----------------------------------------------------------

(Updated April 10, 2015, 10:19 p.m.)


Review request for Flume.


Bugs: FLUME-2437
    https://issues.apache.org/jira/browse/FLUME-2437


Repository: flume-git


Description
-------

S3 Source, initial version


Diffs (updated)
-----

  flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION 
  flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION 
  flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION 
  flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb 
  flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION 
  flume-ng-dist/pom.xml 9f7c4f6 
  flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreType.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION 
  flume-ng-sources/pom.xml ab8eca4 
  pom.xml fe7242f 

Diff: https://reviews.apache.org/r/31643/diff/


Testing
-------

TestResettableGenericInputStream and 
TestS3Source 

Manual testing of scenarios:
1. Created multiple files in S3 Bucket - make sure the source processes all the files
2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked


Thanks,

Johny Rufus John


Re: Review Request 31643: S3 Source

Posted by Johny Rufus John <jr...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/
-----------------------------------------------------------

(Updated April 10, 2015, 8:50 p.m.)


Review request for Flume.


Bugs: FLUME-2437
    https://issues.apache.org/jira/browse/FLUME-2437


Repository: flume-git


Description
-------

S3 Source, initial version


Diffs (updated)
-----

  flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION 
  flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION 
  flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION 
  flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb 
  flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION 
  flume-ng-dist/pom.xml 9f7c4f6 
  flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreType.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION 
  flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION 
  flume-ng-sources/pom.xml ab8eca4 
  pom.xml fe7242f 

Diff: https://reviews.apache.org/r/31643/diff/


Testing
-------

TestResettableGenericInputStream and 
TestS3Source 

Manual testing of scenarios:
1. Created multiple files in S3 Bucket - make sure the source processes all the files
2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked


Thanks,

Johny Rufus John


Re: Review Request 31643: S3 Source

Posted by Johny Rufus John <jr...@cloudera.com>.

> On March 15, 2015, 11:04 a.m., Pawel wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java, line 220
> > <https://reviews.apache.org/r/31643/diff/1/?file=882391#file882391line220>
> >
> >     Hmmm... It seems that source runnable will read only all already existing files in Bucket and will not track new ones which appear

New files are tracked, in readEvents(), under the comment - "// Check if new files have arrived since last call", getNextFile() takes care of this


- Johny Rufus


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/#review76506
-----------------------------------------------------------


On March 2, 2015, 10:26 p.m., Johny Rufus John wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31643/
> -----------------------------------------------------------
> 
> (Updated March 2, 2015, 10:26 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2437
>     https://issues.apache.org/jira/browse/FLUME-2437
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> S3 Source, initial version
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION 
>   flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb 
>   flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION 
>   flume-ng-dist/pom.xml a083fe2 
>   flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION 
>   flume-ng-sources/pom.xml ab8eca4 
>   pom.xml ea7ffe3 
> 
> Diff: https://reviews.apache.org/r/31643/diff/
> 
> 
> Testing
> -------
> 
> TestResettableGenericInputStream and 
> TestS3Source 
> 
> Manual testing of scenarios:
> 1. Created multiple files in S3 Bucket - make sure the source processes all the files
> 2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
> 3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
> 4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
> 
> 
> Thanks,
> 
> Johny Rufus John
> 
>


Re: Review Request 31643: S3 Source

Posted by Pawel <pa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/#review76506
-----------------------------------------------------------



flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java
<https://reviews.apache.org/r/31643/#comment124061>

    Hmmm... It seems that source runnable will read only all already existing files in Bucket and will not track new ones which appear


- Pawel


On March 2, 2015, 10:26 p.m., Johny Rufus John wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31643/
> -----------------------------------------------------------
> 
> (Updated March 2, 2015, 10:26 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2437
>     https://issues.apache.org/jira/browse/FLUME-2437
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> S3 Source, initial version
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION 
>   flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb 
>   flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION 
>   flume-ng-dist/pom.xml a083fe2 
>   flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION 
>   flume-ng-sources/pom.xml ab8eca4 
>   pom.xml ea7ffe3 
> 
> Diff: https://reviews.apache.org/r/31643/diff/
> 
> 
> Testing
> -------
> 
> TestResettableGenericInputStream and 
> TestS3Source 
> 
> Manual testing of scenarios:
> 1. Created multiple files in S3 Bucket - make sure the source processes all the files
> 2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
> 3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
> 4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
> 
> 
> Thanks,
> 
> Johny Rufus John
> 
>


Re: Review Request 31643: S3 Source

Posted by Johny Rufus John <jr...@cloudera.com>.

> On March 15, 2015, 7:40 a.m., Ashish Paliwal wrote:
> > flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java, line 107
> > <https://reviews.apache.org/r/31643/diff/1/?file=882380#file882380line107>
> >
> >     Can we use InputStream instead of StreamCreator? It makes it easy to read the code and perhaps this class doen't need to know about how the stream is created and can work directly on InputStream. The using class can hide the details on how the stream is created.

The streamCreator is needed in this class, because the seek() and markPosition() uses it to create a new stream, if the request to seek is to a position before the buffering we have. markPosition() also uses it to mark a position that is way before what we have buffered up.


> On March 15, 2015, 7:40 a.m., Ashish Paliwal wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java, line 23
> > <https://reviews.apache.org/r/31643/diff/1/?file=882388#file882388line23>
> >
> >     Would be worth adding an init() or start() API, where the initialization can take place like for Zookeeper store, connection can be initialized.

Can we defer this to after pulling in the first commit, and may be track it by a seperate Jira ?


> On March 15, 2015, 7:40 a.m., Ashish Paliwal wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java, line 30
> > <https://reviews.apache.org/r/31643/diff/1/?file=882388#file882388line30>
> >
> >     Would be worth adding null checks in the base class and expose mark protected abstract API's to do the real job

Can we defer this to after pulling in the first commit, and if needed track it by a seperate Jira ?


> On March 15, 2015, 7:40 a.m., Ashish Paliwal wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java, line 23
> > <https://reviews.apache.org/r/31643/diff/1/?file=882389#file882389line23>
> >
> >     enum style factory that we use across Flume is recommended. This would enable supporting a User supplied custom backing store

Can we defer this to after pulling in the first commit, and if needed track it by a seperate Jira ?


> On March 15, 2015, 7:40 a.m., Ashish Paliwal wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java, line 146
> > <https://reviews.apache.org/r/31643/diff/1/?file=882390#file882390line146>
> >
> >     Do we still need this? Should't it be abstracted with MetadataStore

Currently the MetadataStore handles just the metadata info on the objects processed so far. It does make sense to store the position tracker as part of the MetaDataBackingStore.
Can we defer this to after pulling in the first commit, and if needed track it by a seperate Jira ?


- Johny Rufus


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/#review76501
-----------------------------------------------------------


On March 2, 2015, 10:26 p.m., Johny Rufus John wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31643/
> -----------------------------------------------------------
> 
> (Updated March 2, 2015, 10:26 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2437
>     https://issues.apache.org/jira/browse/FLUME-2437
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> S3 Source, initial version
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION 
>   flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb 
>   flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION 
>   flume-ng-dist/pom.xml a083fe2 
>   flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION 
>   flume-ng-sources/pom.xml ab8eca4 
>   pom.xml ea7ffe3 
> 
> Diff: https://reviews.apache.org/r/31643/diff/
> 
> 
> Testing
> -------
> 
> TestResettableGenericInputStream and 
> TestS3Source 
> 
> Manual testing of scenarios:
> 1. Created multiple files in S3 Bucket - make sure the source processes all the files
> 2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
> 3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
> 4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
> 
> 
> Thanks,
> 
> Johny Rufus John
> 
>


Re: Review Request 31643: S3 Source

Posted by Ashish Paliwal <pa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/#review76501
-----------------------------------------------------------



flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java
<https://reviews.apache.org/r/31643/#comment124034>

    Can we use InputStream instead of StreamCreator? It makes it easy to read the code and perhaps this class doen't need to know about how the stream is created and can work directly on InputStream. The using class can hide the details on how the stream is created.



flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java
<https://reviews.apache.org/r/31643/#comment124035>

    Would be worth adding an init() or start() API, where the initialization can take place like for Zookeeper store, connection can be initialized.



flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java
<https://reviews.apache.org/r/31643/#comment124036>

    Would be worth adding null checks in the base class and expose mark protected abstract API's to do the real job



flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java
<https://reviews.apache.org/r/31643/#comment124037>

    enum style factory that we use across Flume is recommended. This would enable supporting a User supplied custom backing store



flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java
<https://reviews.apache.org/r/31643/#comment124038>

    Do we still need this? Should't it be abstracted with MetadataStore



flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java
<https://reviews.apache.org/r/31643/#comment124039>

    Do we still need this? Should't it be abstracted with MetadataStore


- Ashish Paliwal


On March 2, 2015, 10:26 p.m., Johny Rufus John wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31643/
> -----------------------------------------------------------
> 
> (Updated March 2, 2015, 10:26 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2437
>     https://issues.apache.org/jira/browse/FLUME-2437
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> S3 Source, initial version
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION 
>   flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb 
>   flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION 
>   flume-ng-dist/pom.xml a083fe2 
>   flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION 
>   flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION 
>   flume-ng-sources/pom.xml ab8eca4 
>   pom.xml ea7ffe3 
> 
> Diff: https://reviews.apache.org/r/31643/diff/
> 
> 
> Testing
> -------
> 
> TestResettableGenericInputStream and 
> TestS3Source 
> 
> Manual testing of scenarios:
> 1. Created multiple files in S3 Bucket - make sure the source processes all the files
> 2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
> 3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
> 4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
> 
> 
> Thanks,
> 
> Johny Rufus John
> 
>