You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/08/03 10:16:20 UTC
[jira] [Commented] (APEXMALHAR-2174) S3 File Reader reading more
data than expected
[ https://issues.apache.org/jira/browse/APEXMALHAR-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15405674#comment-15405674 ]
ASF GitHub Bot commented on APEXMALHAR-2174:
--------------------------------------------
GitHub user chaithu14 opened a pull request:
https://github.com/apache/apex-malhar/pull/360
APEXMALHAR-2174-S3-ReaderIssue Fixed the S3 reader issue
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/chaithu14/incubator-apex-malhar APEXMALHAR-2174-S3-ReaderIssue
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/apex-malhar/pull/360.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #360
----
commit 01b0f42d1d0ab2e6030e390e10e1dafba72f3302
Author: Chaitanya <ch...@datatorrent.com>
Date: 2016-08-03T10:13:30Z
APEXMALHAR-2174-S3-ReaderIssue Fixed the S3 reader issue
----
> S3 File Reader reading more data than expected
> ----------------------------------------------
>
> Key: APEXMALHAR-2174
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2174
> Project: Apache Apex Malhar
> Issue Type: Bug
> Reporter: Chaitanya
> Assignee: Chaitanya
>
> This is observed through the AWS billing.
> Issue might be the S3InputStream.read() which is used in readEntity().
> Reading the block can be achieved through the AmazonS3 api's. So, I am proposing the following solution:
> ```
> GetObjectRequest rangeObjectRequest = new GetObjectRequest(
> bucketName, key);
> rangeObjectRequest.setRange(startByte, noOfBytes);
> S3Object objectPortion = s3Client.getObject(rangeObjectRequest);
> S3ObjectInputStream wrappedStream = objectPortion.getObjectContent();
> byte[] record = ByteStreams.toByteArray(wrappedStream);
> Advantages of this solution: Parallel read will work for all types of s3 file systems.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)