You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Vishnu Vardhan (JIRA)" <ji...@apache.org> on 2017/03/02 22:43:45 UTC

[jira] [Created] (HADOOP-14142) S3A - Adding unexpected prefix

Vishnu Vardhan created HADOOP-14142:
---------------------------------------

             Summary: S3A - Adding unexpected prefix
                 Key: HADOOP-14142
                 URL: https://issues.apache.org/jira/browse/HADOOP-14142
             Project: Hadoop Common
          Issue Type: Bug
            Reporter: Vishnu Vardhan
            Priority: Critical


Hi:

S3A seems to prefix unexpected prefix to my s3 path

Specifically, in the debug log below the following line is unexpected

>  GET /myBkt8/?max-keys=1&prefix=user%2Fvardhan%2F&delimiter=%2F HTTP/1.1

It is not clear where the "prefix" is coming from and why.


I executed the following commands

sc.setLogLevel("DEBUG")
sc.hadoopConfiguration.set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")
sc.hadoopConfiguration.set("fs.s3a.endpoint","webscaledemo.netapp.com:8082")
sc.hadoopConfiguration.set("fs.s3a.access.key","")
sc.hadoopConfiguration.set("fs.s3a.secret.key","")
sc.hadoopConfiguration.set("fs.s3a.path.style.access","false")
val s3Rdd = sc.textFile("s3a://myBkt98")
s3Rdd.count()


----


debug log is below


application/x-www-form-urlencoded; charset=utf-8
Thu, 02 Mar 2017 22:40:25 GMT
/myBkt8/"
17/03/02 14:40:25 DEBUG request: Sending Request: GET https://webscaledemo.netapp.com:8082 /myBkt8/ Parameters: (max-keys: 1, prefix: user/vardhan/, delimiter: /, ) Headers: (Authorization: AWS 2SNAJYEMQU45YPVYC89D:M8GbLXUuAJ2w5pGx4WJ6hJF3324=, User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.12.3 Java_HotSpot(TM)_64-Bit_Server_VM/25.60-b23/1.8.0_60, Date: Thu, 02 Mar 2017 22:40:25 GMT, Content-Type: application/x-www-form-urlencoded; charset=utf-8, ) 
17/03/02 14:40:25 DEBUG PoolingClientConnectionManager: Connection request: [route: {s}->https://webscaledemo.netapp.com:8082][total kept alive: 0; route allocated: 0 of 15; total allocated: 0 of 15]
17/03/02 14:40:25 DEBUG PoolingClientConnectionManager: Connection leased: [id: 10][route: {s}->https://webscaledemo.netapp.com:8082][total kept alive: 0; route allocated: 1 of 15; total allocated: 1 of 15]
17/03/02 14:40:25 DEBUG DefaultClientConnectionOperator: Connecting to webscaledemo.netapp.com:8082
17/03/02 14:40:25 DEBUG PoolingClientConnectionManager: Closing connections idle longer than 60 SECONDS
17/03/02 14:40:25 DEBUG PoolingClientConnectionManager: Closing connections idle longer than 60 SECONDS
17/03/02 14:40:26 DEBUG RequestAddCookies: CookieSpec selected: default
17/03/02 14:40:26 DEBUG RequestAuthCache: Auth cache not set in the context
17/03/02 14:40:26 DEBUG RequestProxyAuthentication: Proxy auth state: UNCHALLENGED
17/03/02 14:40:26 DEBUG SdkHttpClient: Attempt 1 to execute request
17/03/02 14:40:26 DEBUG DefaultClientConnection: Sending request: GET /myBkt8/?max-keys=1&prefix=user%2Fvardhan%2F&delimiter=%2F HTTP/1.1
17/03/02 14:40:26 DEBUG wire:  >> "GET /myBkt8/?max-keys=1&prefix=user%2Fvardhan%2F&delimiter=%2F HTTP/1.1[\r][\n]"
17/03/02 14:40:26 DEBUG wire:  >> "Host: webscaledemo.netapp.com:8082[\r][\n]"
17/03/02 14:40:26 DEBUG wire:  >> "Authorization: AWS 2SNAJYEMQU45YPVYC89D:M8GbLXUuAJ2w5pGx4WJ6hJF3324=[\r][\n]"
17/03/02 14:40:26 DEBUG wire:  >> "User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.12.3 Java_HotSpot(TM)_64-Bit_Server_VM/25.60-b23/1.8.0_60[\r][\n]"
17/03/02 14:40:26 DEBUG wire:  >> "Date: Thu, 02 Mar 2017 22:40:25 GMT[\r][\n]"
17/03/02 14:40:26 DEBUG wire:  >> "Content-Type: application/x-www-form-urlencoded; charset=utf-8[\r][\n]"
17/03/02 14:40:26 DEBUG wire:  >> "Connection: Keep-Alive[\r][\n]"
17/03/02 14:40:26 DEBUG wire:  >> "[\r][\n]"
17/03/02 14:40:26 DEBUG headers: >> GET /myBkt8/?max-keys=1&prefix=user%2Fvardhan%2F&delimiter=%2F HTTP/1.1
17/03/02 14:40:26 DEBUG headers: >> Host: webscaledemo.netapp.com:8082
17/03/02 14:40:26 DEBUG headers: >> Authorization: AWS 2SNAJYEMQU45YPVYC89D:M8GbLXUuAJ2w5pGx4WJ6hJF3324=
17/03/02 14:40:26 DEBUG headers: >> User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.12.3 Java_HotSpot(TM)_64-Bit_Server_VM/25.60-b23/1.8.0_60
17/03/02 14:40:26 DEBUG headers: >> Date: Thu, 02 Mar 2017 22:40:25 GMT
17/03/02 14:40:26 DEBUG headers: >> Content-Type: application/x-www-form-urlencoded; charset=utf-8
17/03/02 14:40:26 DEBUG headers: >> Connection: Keep-Alive
17/03/02 14:40:26 DEBUG wire:  << "HTTP/1.1 200 OK[\r][\n]"
17/03/02 14:40:26 DEBUG wire:  << "Date: Thu, 02 Mar 2017 22:40:26 GMT[\r][\n]"
17/03/02 14:40:26 DEBUG wire:  << "Connection: KEEP-ALIVE[\r][\n]"
17/03/02 14:40:26 DEBUG wire:  << "Server: StorageGRID/10.3.0.1[\r][\n]"
17/03/02 14:40:26 DEBUG wire:  << "x-amz-request-id: 563477649[\r][\n]"
17/03/02 14:40:26 DEBUG wire:  << "Content-Length: 266[\r][\n]"
17/03/02 14:40:26 DEBUG wire:  << "Content-Type: application/xml[\r][\n]"
17/03/02 14:40:26 DEBUG wire:  << "[\r][\n]"
17/03/02 14:40:26 DEBUG DefaultClientConnection: Receiving response: HTTP/1.1 200 OK
17/03/02 14:40:26 DEBUG headers: << HTTP/1.1 200 OK
17/03/02 14:40:26 DEBUG headers: << Date: Thu, 02 Mar 2017 22:40:26 GMT
17/03/02 14:40:26 DEBUG headers: << Connection: KEEP-ALIVE
17/03/02 14:40:26 DEBUG headers: << Server: StorageGRID/10.3.0.1
17/03/02 14:40:26 DEBUG headers: << x-amz-request-id: 563477649
17/03/02 14:40:26 DEBUG headers: << Content-Length: 266
17/03/02 14:40:26 DEBUG headers: << Content-Type: application/xml
17/03/02 14:40:26 DEBUG SdkHttpClient: Connection can be kept alive indefinitely
17/03/02 14:40:26 DEBUG XmlResponsesSaxParser: Sanitizing XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
17/03/02 14:40:26 DEBUG wire:  << "<?xml version="1.0" encoding="UTF-8"?>[\n]"
17/03/02 14:40:26 DEBUG wire:  << "<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>myBkt8</Name><Prefix>user/vardhan/</Prefix><Marker></Marker><MaxKeys>1</MaxKeys><Delimiter>/</Delimiter><IsTruncated>false</IsTruncated></ListBucketResult>"
17/03/02 14:40:26 DEBUG PoolingClientConnectionManager: Connection [id: 10][route: {s}->https://webscaledemo.netapp.com:8082] can be kept alive indefinitely
17/03/02 14:40:26 DEBUG PoolingClientConnectionManager: Connection released: [id: 10][route: {s}->https://webscaledemo.netapp.com:8082][total kept alive: 1; route allocated: 1 of 15; total allocated: 1 of 15]
17/03/02 14:40:26 DEBUG XmlResponsesSaxParser: Parsing XML response document with handler: class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
17/03/02 14:40:26 DEBUG XmlResponsesSaxParser: Examining listing for bucket: myBkt8
17/03/02 14:40:26 DEBUG request: Received successful response: 200, AWS Request ID: 563477649
17/03/02 14:40:26 DEBUG S3AFileSystem: Not Found: s3a://myBkt8/user/vardhan
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3a://myBkt8
  at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
  at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
  at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
  at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
  at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
  ... 53 elided




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org