You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2016/02/25 07:05:18 UTC

[jira] [Updated] (HADOOP-12444) Consider implementing lazy seek in S3AInputStream

     [ https://issues.apache.org/jira/browse/HADOOP-12444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rajesh Balamohan updated HADOOP-12444:
--------------------------------------
    Attachment: HADOOP-12444.3.patch

Thanks [~thodemoor].  Attaching the revised patch.  I will upload the test report  shortly.

There are the 2 tests which fail in both master and with-patch.

AWS tests without patch (“mvn clean package” from hadoop/hadoop-tools/hadoop-aws):
======================================================



Results :
========

Failed tests:
  TestS3Credentials.noSecretShouldThrow Expected exception: java.lang.IllegalArgumentException
  TestS3Credentials.noAccessIdShouldThrow Expected exception: java.lang.IllegalArgumentException

Tests in error:
  TestS3AContractRootDir>AbstractContractRootDirectoryTest.testListEmptyRootDirectory:134 » FileNotFound
  TestS3AConfiguration.TestAutomaticProxyPortSelection:138 » AmazonS3 Forbidden ...

Tests run: 220, Failures: 2, Errors: 2, Skipped: 6


AWS tests with patch
================

Results :
========

Failed tests:
  TestS3Credentials.noSecretShouldThrow Expected exception: java.lang.IllegalArgumentException
  TestS3Credentials.noAccessIdShouldThrow Expected exception: java.lang.IllegalArgumentException

Tests in error:
  TestS3AContractRootDir>AbstractContractRootDirectoryTest.testListEmptyRootDirectory:134 » FileNotFound
  TestS3AConfiguration.TestAutomaticProxyPortSelection:138 » AmazonS3 Forbidden ...

Tests run: 220, Failures: 2, Errors: 2, Skipped: 6


{noformat}
Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.75 sec <<< FAILURE! - in org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir
testListEmptyRootDirectory(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir)  Time elapsed: 1.633 sec  <<< ERROR!
java.io.FileNotFoundException: No such file or directory: /
        at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1000)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:738)
        at org.apache.hadoop.fs.contract.AbstractContractRootDirectoryTest.testListEmptyRootDirectory(AbstractContractRootDirectoryTest.java:134)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

TestAutomaticProxyPortSelection(org.apache.hadoop.fs.s3a.TestS3AConfiguration)  Time elapsed: 620.356 sec  <<< ERROR!
com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: null)
        at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1182)
        at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:770)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3738)
        at com.amazonaws.services.s3.AmazonS3Client.listMultipartUploads(AmazonS3Client.java:2796)
        at com.amazonaws.services.s3.transfer.TransferManager.abortMultipartUploads(TransferManager.java:1217)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:313)
        at org.apache.hadoop.fs.s3a.S3ATestUtils.createTestFileSystem(S3ATestUtils.java:51)
        at org.apache.hadoop.fs.s3a.TestS3AConfiguration.TestAutomaticProxyPortSelection(TestS3AConfiguration.java:138)
{noformat}

> Consider implementing lazy seek in S3AInputStream
> -------------------------------------------------
>
>                 Key: HADOOP-12444
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12444
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.7.1
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: HADOOP-12444.1.patch, HADOOP-12444.2.patch, HADOOP-12444.3.patch, HADOOP-12444.WIP.patch
>
>
> - Currently, "read(long position, byte[] buffer, int offset, int length)" is not implemented in S3AInputStream (unlike DFSInputStream). So, "readFully(long position, byte[] buffer, int offset, int length)" in S3AInputStream goes through the default implementation of seek(), read(), seek() in FSInputStream. 
> - However, seek() in S3AInputStream involves re-opening of connection to S3 everytime (https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L115).  
> - It would be good to consider having a lazy seek implementation to reduce connection overheads to S3. (e.g Presto implements lazy seek. https://github.com/facebook/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/PrestoS3FileSystem.java#L623)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)