You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Johny Rufus John <jr...@cloudera.com> on 2015/03/02 23:26:43 UTC
Review Request 31643: S3 Source
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/
-----------------------------------------------------------
Review request for Flume.
Bugs: FLUME-2437
https://issues.apache.org/jira/browse/FLUME-2437
Repository: flume-git
Description
-------
S3 Source, initial version
Diffs
-----
flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION
flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION
flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION
flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb
flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION
flume-ng-dist/pom.xml a083fe2
flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION
flume-ng-sources/pom.xml ab8eca4
pom.xml ea7ffe3
Diff: https://reviews.apache.org/r/31643/diff/
Testing
-------
TestResettableGenericInputStream and
TestS3Source
Manual testing of scenarios:
1. Created multiple files in S3 Bucket - make sure the source processes all the files
2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
Thanks,
Johny Rufus John
Re: Review Request 31643: S3 Source
Posted by Johny Rufus John <jr...@cloudera.com>.
> On April 10, 2015, 1:04 a.m., Ashish Paliwal wrote:
> > flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java, line 107
> > <https://reviews.apache.org/r/31643/diff/1/?file=882380#file882380line107>
> >
> > Sorry don't understand why new stream needs to be created. Also couldn't find the usage as mentioned, can you point to the line numbers?
@ Lines 405 and 486, we recreate the stream
- Johny Rufus
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/#review79616
-----------------------------------------------------------
On March 2, 2015, 10:26 p.m., Johny Rufus John wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31643/
> -----------------------------------------------------------
>
> (Updated March 2, 2015, 10:26 p.m.)
>
>
> Review request for Flume.
>
>
> Bugs: FLUME-2437
> https://issues.apache.org/jira/browse/FLUME-2437
>
>
> Repository: flume-git
>
>
> Description
> -------
>
> S3 Source, initial version
>
>
> Diffs
> -----
>
> flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION
> flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION
> flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION
> flume-ng-dist/pom.xml a083fe2
> flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION
> flume-ng-sources/pom.xml ab8eca4
> pom.xml ea7ffe3
>
> Diff: https://reviews.apache.org/r/31643/diff/
>
>
> Testing
> -------
>
> TestResettableGenericInputStream and
> TestS3Source
>
> Manual testing of scenarios:
> 1. Created multiple files in S3 Bucket - make sure the source processes all the files
> 2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
> 3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
> 4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
>
>
> Thanks,
>
> Johny Rufus John
>
>
Re: Review Request 31643: S3 Source
Posted by Ashish Paliwal <pa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/#review79616
-----------------------------------------------------------
flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java
<https://reviews.apache.org/r/31643/#comment129117>
Sorry don't understand why new stream needs to be created. Also couldn't find the usage as mentioned, can you point to the line numbers?
- Ashish Paliwal
On March 2, 2015, 10:26 p.m., Johny Rufus John wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31643/
> -----------------------------------------------------------
>
> (Updated March 2, 2015, 10:26 p.m.)
>
>
> Review request for Flume.
>
>
> Bugs: FLUME-2437
> https://issues.apache.org/jira/browse/FLUME-2437
>
>
> Repository: flume-git
>
>
> Description
> -------
>
> S3 Source, initial version
>
>
> Diffs
> -----
>
> flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION
> flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION
> flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION
> flume-ng-dist/pom.xml a083fe2
> flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION
> flume-ng-sources/pom.xml ab8eca4
> pom.xml ea7ffe3
>
> Diff: https://reviews.apache.org/r/31643/diff/
>
>
> Testing
> -------
>
> TestResettableGenericInputStream and
> TestS3Source
>
> Manual testing of scenarios:
> 1. Created multiple files in S3 Bucket - make sure the source processes all the files
> 2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
> 3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
> 4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
>
>
> Thanks,
>
> Johny Rufus John
>
>
Re: Review Request 31643: S3 Source
Posted by Johny Rufus John <jr...@cloudera.com>.
> On March 15, 2015, 10:49 a.m., Pawel wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java, line 31
> > <https://reviews.apache.org/r/31643/diff/1/?file=882386#file882386line31>
> >
> > We want to have MapDB instead of eg. zookeeper because zookeeper is optional, right, or there are other reasons?
> > Maybe also worth to have something like ZookeeperMetadataBackingStore (in a separate task)?
As of now, we maintain the state locally using MapDb, we definitely can have a seperate task to have the ZookekerBased MetadataStore
> On March 15, 2015, 10:49 a.m., Pawel wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java, line 265
> > <https://reviews.apache.org/r/31643/diff/1/?file=882390#file882390line265>
> >
> > Do we know how objects are sorted? Date of creation asc?
> >
> > To make sure, maybe it's worth to sort list of objectSummaries by "getLastModified()"
S3 api returns objects in a bucket sorted by their names, and we use the same order
- Johny Rufus
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/#review76504
-----------------------------------------------------------
On March 2, 2015, 10:26 p.m., Johny Rufus John wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31643/
> -----------------------------------------------------------
>
> (Updated March 2, 2015, 10:26 p.m.)
>
>
> Review request for Flume.
>
>
> Bugs: FLUME-2437
> https://issues.apache.org/jira/browse/FLUME-2437
>
>
> Repository: flume-git
>
>
> Description
> -------
>
> S3 Source, initial version
>
>
> Diffs
> -----
>
> flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION
> flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION
> flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION
> flume-ng-dist/pom.xml a083fe2
> flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION
> flume-ng-sources/pom.xml ab8eca4
> pom.xml ea7ffe3
>
> Diff: https://reviews.apache.org/r/31643/diff/
>
>
> Testing
> -------
>
> TestResettableGenericInputStream and
> TestS3Source
>
> Manual testing of scenarios:
> 1. Created multiple files in S3 Bucket - make sure the source processes all the files
> 2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
> 3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
> 4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
>
>
> Thanks,
>
> Johny Rufus John
>
>
Re: Review Request 31643: S3 Source
Posted by Pawel <pa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/#review76504
-----------------------------------------------------------
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java
<https://reviews.apache.org/r/31643/#comment124057>
We want to have MapDB instead of eg. zookeeper because zookeeper is optional, right, or there are other reasons?
Maybe also worth to have something like ZookeeperMetadataBackingStore (in a separate task)?
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java
<https://reviews.apache.org/r/31643/#comment124058>
Do we know how objects are sorted? Date of creation asc?
To make sure, maybe it's worth to sort list of objectSummaries by "getLastModified()"
- Pawel
On March 2, 2015, 10:26 p.m., Johny Rufus John wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31643/
> -----------------------------------------------------------
>
> (Updated March 2, 2015, 10:26 p.m.)
>
>
> Review request for Flume.
>
>
> Bugs: FLUME-2437
> https://issues.apache.org/jira/browse/FLUME-2437
>
>
> Repository: flume-git
>
>
> Description
> -------
>
> S3 Source, initial version
>
>
> Diffs
> -----
>
> flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION
> flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION
> flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION
> flume-ng-dist/pom.xml a083fe2
> flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION
> flume-ng-sources/pom.xml ab8eca4
> pom.xml ea7ffe3
>
> Diff: https://reviews.apache.org/r/31643/diff/
>
>
> Testing
> -------
>
> TestResettableGenericInputStream and
> TestS3Source
>
> Manual testing of scenarios:
> 1. Created multiple files in S3 Bucket - make sure the source processes all the files
> 2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
> 3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
> 4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
>
>
> Thanks,
>
> Johny Rufus John
>
>
Re: Review Request 31643: S3 Source
Posted by Johny Rufus John <jr...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/
-----------------------------------------------------------
(Updated April 10, 2015, 10:19 p.m.)
Review request for Flume.
Bugs: FLUME-2437
https://issues.apache.org/jira/browse/FLUME-2437
Repository: flume-git
Description
-------
S3 Source, initial version
Diffs (updated)
-----
flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION
flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION
flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION
flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb
flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION
flume-ng-dist/pom.xml 9f7c4f6
flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreType.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION
flume-ng-sources/pom.xml ab8eca4
pom.xml fe7242f
Diff: https://reviews.apache.org/r/31643/diff/
Testing
-------
TestResettableGenericInputStream and
TestS3Source
Manual testing of scenarios:
1. Created multiple files in S3 Bucket - make sure the source processes all the files
2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
Thanks,
Johny Rufus John
Re: Review Request 31643: S3 Source
Posted by Johny Rufus John <jr...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/
-----------------------------------------------------------
(Updated April 10, 2015, 8:50 p.m.)
Review request for Flume.
Bugs: FLUME-2437
https://issues.apache.org/jira/browse/FLUME-2437
Repository: flume-git
Description
-------
S3 Source, initial version
Diffs (updated)
-----
flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION
flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION
flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION
flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb
flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION
flume-ng-dist/pom.xml 9f7c4f6
flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreType.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION
flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION
flume-ng-sources/pom.xml ab8eca4
pom.xml fe7242f
Diff: https://reviews.apache.org/r/31643/diff/
Testing
-------
TestResettableGenericInputStream and
TestS3Source
Manual testing of scenarios:
1. Created multiple files in S3 Bucket - make sure the source processes all the files
2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
Thanks,
Johny Rufus John
Re: Review Request 31643: S3 Source
Posted by Johny Rufus John <jr...@cloudera.com>.
> On March 15, 2015, 11:04 a.m., Pawel wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java, line 220
> > <https://reviews.apache.org/r/31643/diff/1/?file=882391#file882391line220>
> >
> > Hmmm... It seems that source runnable will read only all already existing files in Bucket and will not track new ones which appear
New files are tracked, in readEvents(), under the comment - "// Check if new files have arrived since last call", getNextFile() takes care of this
- Johny Rufus
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/#review76506
-----------------------------------------------------------
On March 2, 2015, 10:26 p.m., Johny Rufus John wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31643/
> -----------------------------------------------------------
>
> (Updated March 2, 2015, 10:26 p.m.)
>
>
> Review request for Flume.
>
>
> Bugs: FLUME-2437
> https://issues.apache.org/jira/browse/FLUME-2437
>
>
> Repository: flume-git
>
>
> Description
> -------
>
> S3 Source, initial version
>
>
> Diffs
> -----
>
> flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION
> flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION
> flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION
> flume-ng-dist/pom.xml a083fe2
> flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION
> flume-ng-sources/pom.xml ab8eca4
> pom.xml ea7ffe3
>
> Diff: https://reviews.apache.org/r/31643/diff/
>
>
> Testing
> -------
>
> TestResettableGenericInputStream and
> TestS3Source
>
> Manual testing of scenarios:
> 1. Created multiple files in S3 Bucket - make sure the source processes all the files
> 2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
> 3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
> 4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
>
>
> Thanks,
>
> Johny Rufus John
>
>
Re: Review Request 31643: S3 Source
Posted by Pawel <pa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/#review76506
-----------------------------------------------------------
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java
<https://reviews.apache.org/r/31643/#comment124061>
Hmmm... It seems that source runnable will read only all already existing files in Bucket and will not track new ones which appear
- Pawel
On March 2, 2015, 10:26 p.m., Johny Rufus John wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31643/
> -----------------------------------------------------------
>
> (Updated March 2, 2015, 10:26 p.m.)
>
>
> Review request for Flume.
>
>
> Bugs: FLUME-2437
> https://issues.apache.org/jira/browse/FLUME-2437
>
>
> Repository: flume-git
>
>
> Description
> -------
>
> S3 Source, initial version
>
>
> Diffs
> -----
>
> flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION
> flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION
> flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION
> flume-ng-dist/pom.xml a083fe2
> flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION
> flume-ng-sources/pom.xml ab8eca4
> pom.xml ea7ffe3
>
> Diff: https://reviews.apache.org/r/31643/diff/
>
>
> Testing
> -------
>
> TestResettableGenericInputStream and
> TestS3Source
>
> Manual testing of scenarios:
> 1. Created multiple files in S3 Bucket - make sure the source processes all the files
> 2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
> 3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
> 4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
>
>
> Thanks,
>
> Johny Rufus John
>
>
Re: Review Request 31643: S3 Source
Posted by Johny Rufus John <jr...@cloudera.com>.
> On March 15, 2015, 7:40 a.m., Ashish Paliwal wrote:
> > flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java, line 107
> > <https://reviews.apache.org/r/31643/diff/1/?file=882380#file882380line107>
> >
> > Can we use InputStream instead of StreamCreator? It makes it easy to read the code and perhaps this class doen't need to know about how the stream is created and can work directly on InputStream. The using class can hide the details on how the stream is created.
The streamCreator is needed in this class, because the seek() and markPosition() uses it to create a new stream, if the request to seek is to a position before the buffering we have. markPosition() also uses it to mark a position that is way before what we have buffered up.
> On March 15, 2015, 7:40 a.m., Ashish Paliwal wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java, line 23
> > <https://reviews.apache.org/r/31643/diff/1/?file=882388#file882388line23>
> >
> > Would be worth adding an init() or start() API, where the initialization can take place like for Zookeeper store, connection can be initialized.
Can we defer this to after pulling in the first commit, and may be track it by a seperate Jira ?
> On March 15, 2015, 7:40 a.m., Ashish Paliwal wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java, line 30
> > <https://reviews.apache.org/r/31643/diff/1/?file=882388#file882388line30>
> >
> > Would be worth adding null checks in the base class and expose mark protected abstract API's to do the real job
Can we defer this to after pulling in the first commit, and if needed track it by a seperate Jira ?
> On March 15, 2015, 7:40 a.m., Ashish Paliwal wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java, line 23
> > <https://reviews.apache.org/r/31643/diff/1/?file=882389#file882389line23>
> >
> > enum style factory that we use across Flume is recommended. This would enable supporting a User supplied custom backing store
Can we defer this to after pulling in the first commit, and if needed track it by a seperate Jira ?
> On March 15, 2015, 7:40 a.m., Ashish Paliwal wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java, line 146
> > <https://reviews.apache.org/r/31643/diff/1/?file=882390#file882390line146>
> >
> > Do we still need this? Should't it be abstracted with MetadataStore
Currently the MetadataStore handles just the metadata info on the objects processed so far. It does make sense to store the position tracker as part of the MetaDataBackingStore.
Can we defer this to after pulling in the first commit, and if needed track it by a seperate Jira ?
- Johny Rufus
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/#review76501
-----------------------------------------------------------
On March 2, 2015, 10:26 p.m., Johny Rufus John wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31643/
> -----------------------------------------------------------
>
> (Updated March 2, 2015, 10:26 p.m.)
>
>
> Review request for Flume.
>
>
> Bugs: FLUME-2437
> https://issues.apache.org/jira/browse/FLUME-2437
>
>
> Repository: flume-git
>
>
> Description
> -------
>
> S3 Source, initial version
>
>
> Diffs
> -----
>
> flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION
> flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION
> flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION
> flume-ng-dist/pom.xml a083fe2
> flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION
> flume-ng-sources/pom.xml ab8eca4
> pom.xml ea7ffe3
>
> Diff: https://reviews.apache.org/r/31643/diff/
>
>
> Testing
> -------
>
> TestResettableGenericInputStream and
> TestS3Source
>
> Manual testing of scenarios:
> 1. Created multiple files in S3 Bucket - make sure the source processes all the files
> 2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
> 3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
> 4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
>
>
> Thanks,
>
> Johny Rufus John
>
>
Re: Review Request 31643: S3 Source
Posted by Ashish Paliwal <pa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/#review76501
-----------------------------------------------------------
flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java
<https://reviews.apache.org/r/31643/#comment124034>
Can we use InputStream instead of StreamCreator? It makes it easy to read the code and perhaps this class doen't need to know about how the stream is created and can work directly on InputStream. The using class can hide the details on how the stream is created.
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java
<https://reviews.apache.org/r/31643/#comment124035>
Would be worth adding an init() or start() API, where the initialization can take place like for Zookeeper store, connection can be initialized.
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java
<https://reviews.apache.org/r/31643/#comment124036>
Would be worth adding null checks in the base class and expose mark protected abstract API's to do the real job
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java
<https://reviews.apache.org/r/31643/#comment124037>
enum style factory that we use across Flume is recommended. This would enable supporting a User supplied custom backing store
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java
<https://reviews.apache.org/r/31643/#comment124038>
Do we still need this? Should't it be abstracted with MetadataStore
flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java
<https://reviews.apache.org/r/31643/#comment124039>
Do we still need this? Should't it be abstracted with MetadataStore
- Ashish Paliwal
On March 2, 2015, 10:26 p.m., Johny Rufus John wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31643/
> -----------------------------------------------------------
>
> (Updated March 2, 2015, 10:26 p.m.)
>
>
> Review request for Flume.
>
>
> Bugs: FLUME-2437
> https://issues.apache.org/jira/browse/FLUME-2437
>
>
> Repository: flume-git
>
>
> Description
> -------
>
> S3 Source, initial version
>
>
> Diffs
> -----
>
> flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java PRE-CREATION
> flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java PRE-CREATION
> flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java PRE-CREATION
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java d1240fb
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java PRE-CREATION
> flume-ng-dist/pom.xml a083fe2
> flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java PRE-CREATION
> flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java PRE-CREATION
> flume-ng-sources/pom.xml ab8eca4
> pom.xml ea7ffe3
>
> Diff: https://reviews.apache.org/r/31643/diff/
>
>
> Testing
> -------
>
> TestResettableGenericInputStream and
> TestS3Source
>
> Manual testing of scenarios:
> 1. Created multiple files in S3 Bucket - make sure the source processes all the files
> 2. Add more files, after the S3 source starts - make sure the newly created S3 objects are processed
> 3. Stop the source after a few files are processed - make sure on restart, the source only processes the rest of the unprocessed files
> 4. Stop the source in the middle of processing a file - make sure the postion tracker is read on re-start and processing of the partial file continues from where it was last marked
>
>
> Thanks,
>
> Johny Rufus John
>
>