You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Zhang, Jingyu" <ji...@news.com.au> on 2016/05/03 02:53:32 UTC

Error from reading S3 in Scala

Hi All,

I am using Eclipse with Maven for developing Spark applications. I got a
error for Reading from S3 in Scala but it works fine in Java when I run
them in the same project in Eclipse. The Scala/Java code and the error in
following


Scala

val uri = URI.create("s3a://" + key + ":" + seckey + "@" +
"graphclustering/config.properties");
val pt = new Path("s3a://" + key + ":" + seckey + "@" +
"graphclustering/config.properties");
val fs = FileSystem.get(uri,ctx.hadoopConfiguration);
val  inputStream:InputStream = fs.open(pt);

----Exception: on aws-java-1.7.4 and hadoop-aws-2.6.1----

Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception:
Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden;
Request ID: 8A56DC7BF0BFF09A), S3 Extended Request ID

at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(
AmazonHttpClient.java:1160)

at com.amazonaws.http.AmazonHttpClient.executeOneRequest(
AmazonHttpClient.java:748)

at com.amazonaws.http.AmazonHttpClient.executeHelper(
AmazonHttpClient.java:467)

at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:302)

at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785)

at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(
AmazonS3Client.java:1050)

at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(
AmazonS3Client.java:1027)

at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(
S3AFileSystem.java:688)

at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:222)

at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)

at com.news.report.graph.GraphCluster$.main(GraphCluster.scala:53)

at com.news.report.graph.GraphCluster.main(GraphCluster.scala)

16/05/03 10:49:17 INFO SparkContext: Invoking stop() from shutdown hook

16/05/03 10:49:17 INFO SparkUI: Stopped Spark web UI at
http://10.65.80.125:4040

16/05/03 10:49:17 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!

16/05/03 10:49:17 INFO MemoryStore: MemoryStore cleared

16/05/03 10:49:17 INFO BlockManager: BlockManager stopped

----Exception: on aws-java-1.7.4 and hadoop-aws-2.7.2----

16/05/03 10:23:40 INFO Slf4jLogger: Slf4jLogger started

16/05/03 10:23:40 INFO Remoting: Starting remoting

16/05/03 10:23:40 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkDriverActorSystem@10.65.80.125:61860]

16/05/03 10:23:40 INFO Utils: Successfully started service
'sparkDriverActorSystem' on port 61860.

16/05/03 10:23:40 INFO SparkEnv: Registering MapOutputTracker

16/05/03 10:23:40 INFO SparkEnv: Registering BlockManagerMaster

16/05/03 10:23:40 INFO DiskBlockManager: Created local directory at
/private/var/folders/sc/tdmkbvr1705b8p70xqj1kqks5l9p

16/05/03 10:23:40 INFO MemoryStore: MemoryStore started with capacity
1140.4 MB

16/05/03 10:23:40 INFO SparkEnv: Registering OutputCommitCoordinator

16/05/03 10:23:40 INFO Utils: Successfully started service 'SparkUI' on
port 4040.

16/05/03 10:23:40 INFO SparkUI: Started SparkUI at http://10.65.80.125:4040

16/05/03 10:23:40 INFO Executor: Starting executor ID driver on host
localhost

16/05/03 10:23:40 INFO Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 61861.

16/05/03 10:23:40 INFO NettyBlockTransferService: Server created on 61861

16/05/03 10:23:40 INFO BlockManagerMaster: Trying to register BlockManager

16/05/03 10:23:40 INFO BlockManagerMasterEndpoint: Registering block
manager localhost:61861 with 1140.4 MB RAM, BlockManagerId(driver,
localhost, 61861)

16/05/03 10:23:40 INFO BlockManagerMaster: Registered BlockManager

Exception in thread "main" java.lang.NoSuchMethodError:
com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V

at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:285)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)

at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)

at com.news.report.graph.GraphCluster$.main(GraphCluster.scala:52)

at com.news.report.graph.GraphCluster.main(GraphCluster.scala)

16/05/03 10:23:51 INFO SparkContext: Invoking stop() from shutdown hook

16/05/03 10:23:51 INFO SparkUI: Stopped Spark web UI at
http://10.65.80.125:4040

16/05/03 10:23:51 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!

16/05/03 10:23:51 INFO MemoryStore: MemoryStore cleared

16/05/03 10:23:51 INFO BlockManager: BlockManager stopped

16/05/03 10:23:51 INFO BlockManagerMaster: BlockManagerMaster stopped

16/05/03 10:23:51 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!

16/05/03 10:23:51 INFO SparkContext: Successfully stopped SparkContext

16/05/03 10:23:51 INFO ShutdownHookManager: Shutdown hook called

16/05/03 10:23:51 INFO ShutdownHookManager: Deleting directory
/private/var/folders/sc/tdmkbvr1705b8p70xqj1kqks5l9pk9/T/spark-53cf244a-2947-48c9-ba97-7302c9985f35

16/05/03 10:49:17 INFO S3AFileSystem: Caught an AmazonServiceException,
which means your request made it to Amazon S3, but was rejected with an
error response for some reason.

16/05/03 10:49:17 INFO S3AFileSystem: Error Message: Forbidden (Service:
Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:
8A56DC7BF0BFF09A)

16/05/03 10:49:17 INFO S3AFileSystem: HTTP Status Code: 403

16/05/03 10:49:17 INFO S3AFileSystem: AWS Error Code: 403 Forbidden

16/05/03 10:49:17 INFO S3AFileSystem: Error Type: Client

16/05/03 10:49:17 INFO S3AFileSystem: Request ID: 8A56DC7BF0BFF09A

16/05/03 10:49:17 INFO S3AFileSystem: Class Name:
com.amazonaws.services.s3.model.AmazonS3Exception


But, Java code works without error

URI uri = URI.create("s3a://" + key + ":" + seckey + "@" +
"graphclustering/config.properties");
Path pt = new Path("s3a://" + key + ":" + seckey + "@" +
"graphclustering/config.properties");
FileSystem fs = FileSystem.get(uri,ctx.hadoopConfiguration());
inputStream = fs.open(pt);

Thanks,

Jingyu

-- 
This message and its attachments may contain legally privileged or 
confidential information. It is intended solely for the named addressee. If 
you are not the addressee indicated in this message or responsible for 
delivery of the message to the addressee, you may not copy or deliver this 
message or its attachments to anyone. Rather, you should permanently delete 
this message and its attachments and kindly notify the sender by reply 
e-mail. Any content of this message and its attachments which does not 
relate to the official business of the sending company must be taken not to 
have been sent or endorsed by that company or any of its related entities. 
No warranty is made that the e-mail or attachments are free from computer 
virus or other defect.

Re: Error from reading S3 in Scala

Posted by Steve Loughran <st...@hortonworks.com>.
On 4 May 2016, at 13:52, Zhang, Jingyu <ji...@news.com.au>> wrote:

Thanks everyone,

One reason to use "s3a//" is because  I use "s3a//" in my development env (Eclipse) on a desktop. I will debug and test on my desktop then put jar file on EMR Cluster. I do not think "s3//" will works on a desktop.


s3n will work, it's just slower, and has a real performance problem if you close, say, a 2GB file while only 6 bytes in, as it will read to the end of the file first.


With helping from AWS suport, this bug is cause by the version of Joda-Time in my pom file is not match with aws-SDK.jar because AWS authentication requires a valid Date or x-amz-date header. It will work after update to joda-time 2.8.1, aws SDK 1.10.x and amazon-hadoop 2.6.1.


and Java 8u60, right?


But, it will shown exception on amazon-hadoop 2.7.2. The reason for using amazon-hadoop 2.7.2 is because in EMR 4.6.0 the supported version are Hadoop 2.7.2, Spark 1.6.1.


oh, that's this problem.


https://issues.apache.org/jira/browse/HADOOP-13044
https://issues.apache.org/jira/browse/HADOOP-13050

the quickest fix for you is to check out Hadoop branch-2.7 and rebuild it with the AWS sdk library version bumped up to 10.10.60, httpclient also updated in sync. That may break some other things, that being the problem of mass-transitive-classpath-updates.

you could also provide a patch https://issues.apache.org/jira/browse/HADOOP-13062, using introspection for some of the AWS binding, so that you could then take a 2.7.3+ release and drop in whichever AWS JAR you wanted. That would be appreciated by many



Please let me know if you have a better idea to set up the development environment for debug and test.

Regards,

Jingyu





On 4 May 2016 at 20:32, James Hammerton <ja...@gluru.co>> wrote:


On 3 May 2016 at 17:22, Gourav Sengupta <go...@gmail.com>> wrote:
Hi,

The best thing to do is start the EMR clusters with proper permissions in the roles that way you do not need to worry about the keys at all.

Another thing, why are we using s3a// instead of s3:// ?

Probably because of what's said about s3:// and s3n:// here (which is why I use s3a://):

https://wiki.apache.org/hadoop/AmazonS3

Regards,

James


Besides that you can increase s3 speeds using the instructions mentioned here: https://aws.amazon.com/blogs/aws/aws-storage-update-amazon-s3-transfer-acceleration-larger-snowballs-in-more-regions/


Regards,
Gourav

On Tue, May 3, 2016 at 12:04 PM, Steve Loughran <st...@hortonworks.com>> wrote:
don't put your secret in the URI, it'll only creep out in the logs.

Use the specific properties coverd in http://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html, which you can set in your spark context by prefixing them with spark.hadoop.

you can also set the env vars, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY; SparkEnv will pick these up and set the relevant spark context keys for you


On 3 May 2016, at 01:53, Zhang, Jingyu <ji...@news.com.au>> wrote:

Hi All,

I am using Eclipse with Maven for developing Spark applications. I got a error for Reading from S3 in Scala but it works fine in Java when I run them in the same project in Eclipse. The Scala/Java code and the error in following


Scala

val uri = URI.create("s3a://" + key + ":" + seckey + "@" + "graphclustering/config.properties");
val pt = new Path("s3a://" + key + ":" + seckey + "@" + "graphclustering/config.properties");
val fs = FileSystem.get(uri,ctx.hadoopConfiguration);
val  inputStream:InputStream = fs.open(pt);


----Exception: on aws-java-1.7.4 and hadoop-aws-2.6.1----

Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 8A56DC7BF0BFF09A), S3 Extended Request ID

at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1160)

at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:748)

at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:467)

at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:302)

at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785)

at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1050)

at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1027)

at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:688)

at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:222)

at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)

at com.news.report.graph.GraphCluster$.main(GraphCluster.scala:53)

at com.news.report.graph.GraphCluster.main(GraphCluster.scala)

16/05/03 10:49:17 INFO SparkContext: Invoking stop() from shutdown hook

16/05/03 10:49:17 INFO SparkUI: Stopped Spark web UI at http://10.65.80.125:4040<http://10.65.80.125:4040/>

16/05/03 10:49:17 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!

16/05/03 10:49:17 INFO MemoryStore: MemoryStore cleared

16/05/03 10:49:17 INFO BlockManager: BlockManager stopped

----Exception: on aws-java-1.7.4 and hadoop-aws-2.7.2----

16/05/03 10:23:40 INFO Slf4jLogger: Slf4jLogger started

16/05/03 10:23:40 INFO Remoting: Starting remoting

16/05/03 10:23:40 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.65.80.125:61860<http://sparkDriverActorSystem@10.65.80.125:61860/>]

16/05/03 10:23:40 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 61860.

16/05/03 10:23:40 INFO SparkEnv: Registering MapOutputTracker

16/05/03 10:23:40 INFO SparkEnv: Registering BlockManagerMaster

16/05/03 10:23:40 INFO DiskBlockManager: Created local directory at /private/var/folders/sc/tdmkbvr1705b8p70xqj1kqks5l9p

16/05/03 10:23:40 INFO MemoryStore: MemoryStore started with capacity 1140.4 MB

16/05/03 10:23:40 INFO SparkEnv: Registering OutputCommitCoordinator

16/05/03 10:23:40 INFO Utils: Successfully started service 'SparkUI' on port 4040.

16/05/03 10:23:40 INFO SparkUI: Started SparkUI at http://10.65.80.125:4040<http://10.65.80.125:4040/>

16/05/03 10:23:40 INFO Executor: Starting executor ID driver on host localhost

16/05/03 10:23:40 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 61861.

16/05/03 10:23:40 INFO NettyBlockTransferService: Server created on 61861

16/05/03 10:23:40 INFO BlockManagerMaster: Trying to register BlockManager

16/05/03 10:23:40 INFO BlockManagerMasterEndpoint: Registering block manager localhost:61861 with 1140.4 MB RAM, BlockManagerId(driver, localhost, 61861)

16/05/03 10:23:40 INFO BlockManagerMaster: Registered BlockManager

Exception in thread "main" java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V

at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:285)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)

at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)

at com.news.report.graph.GraphCluster$.main(GraphCluster.scala:52)

at com.news.report.graph.GraphCluster.main(GraphCluster.scala)

16/05/03 10:23:51 INFO SparkContext: Invoking stop() from shutdown hook

16/05/03 10:23:51 INFO SparkUI: Stopped Spark web UI at http://10.65.80.125:4040<http://10.65.80.125:4040/>

16/05/03 10:23:51 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!

16/05/03 10:23:51 INFO MemoryStore: MemoryStore cleared

16/05/03 10:23:51 INFO BlockManager: BlockManager stopped

16/05/03 10:23:51 INFO BlockManagerMaster: BlockManagerMaster stopped

16/05/03 10:23:51 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!

16/05/03 10:23:51 INFO SparkContext: Successfully stopped SparkContext

16/05/03 10:23:51 INFO ShutdownHookManager: Shutdown hook called

16/05/03 10:23:51 INFO ShutdownHookManager: Deleting directory /private/var/folders/sc/tdmkbvr1705b8p70xqj1kqks5l9pk9/T/spark-53cf244a-2947-48c9-ba97-7302c9985f35

16/05/03 10:49:17 INFO S3AFileSystem: Caught an AmazonServiceException, which means your request made it to Amazon S3, but was rejected with an error response for some reason.

16/05/03 10:49:17 INFO S3AFileSystem: Error Message: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 8A56DC7BF0BFF09A)

16/05/03 10:49:17 INFO S3AFileSystem: HTTP Status Code: 403

16/05/03 10:49:17 INFO S3AFileSystem: AWS Error Code: 403 Forbidden

16/05/03 10:49:17 INFO S3AFileSystem: Error Type: Client

16/05/03 10:49:17 INFO S3AFileSystem: Request ID: 8A56DC7BF0BFF09A

16/05/03 10:49:17 INFO S3AFileSystem: Class Name: com.amazonaws.services.s3.model.AmazonS3Exception


But, Java code works without error

URI uri = URI.create("s3a://" + key + ":" + seckey + "@" + "graphclustering/config.properties");
Path pt = new Path("s3a://" + key + ":" + seckey + "@" + "graphclustering/config.properties");
FileSystem fs = FileSystem.get(uri,ctx.hadoopConfiguration());
inputStream = fs.open(pt);

Thanks,

Jingyu

This message and its attachments may contain legally privileged or confidential information. It is intended solely for the named addressee. If you are not the addressee indicated in this message or responsible for delivery of the message to the addressee, you may not copy or deliver this message or its attachments to anyone. Rather, you should permanently delete this message and its attachments and kindly notify the sender by reply e-mail. Any content of this message and its attachments which does not relate to the official business of the sending company must be taken not to have been sent or endorsed by that company or any of its related entities. No warranty is made that the e-mail or attachments are free from computer virus or other defect.





This message and its attachments may contain legally privileged or confidential information. It is intended solely for the named addressee. If you are not the addressee indicated in this message or responsible for delivery of the message to the addressee, you may not copy or deliver this message or its attachments to anyone. Rather, you should permanently delete this message and its attachments and kindly notify the sender by reply e-mail. Any content of this message and its attachments which does not relate to the official business of the sending company must be taken not to have been sent or endorsed by that company or any of its related entities. No warranty is made that the e-mail or attachments are free from computer virus or other defect.


Re: Error from reading S3 in Scala

Posted by "Zhang, Jingyu" <ji...@news.com.au>.
Thanks everyone,

One reason to use "s3a//" is because  I use "s3a//" in my development env
(Eclipse) on a desktop. I will debug and test on my desktop then put jar
file on EMR Cluster. I do not think "s3//" will works on a desktop.

With helping from AWS suport, this bug is cause by the version of Joda-Time
in my pom file is not match with aws-SDK.jar because AWS authentication
requires a valid Date or x-amz-date header. It will work after update to
joda-time 2.8.1, aws SDK 1.10.x and amazon-hadoop 2.6.1.

But, it will shown exception on amazon-hadoop 2.7.2. The reason for
using amazon-hadoop
2.7.2 is because in EMR 4.6.0 the supported version are Hadoop 2.7.2, Spark
1.6.1.

Please let me know if you have a better idea to set up the development
environment for debug and test.

Regards,

Jingyu





On 4 May 2016 at 20:32, James Hammerton <ja...@gluru.co> wrote:

>
>
> On 3 May 2016 at 17:22, Gourav Sengupta <go...@gmail.com> wrote:
>
>> Hi,
>>
>> The best thing to do is start the EMR clusters with proper permissions in
>> the roles that way you do not need to worry about the keys at all.
>>
>> Another thing, why are we using s3a// instead of s3:// ?
>>
>
> Probably because of what's said about s3:// and s3n:// here (which is why
> I use s3a://):
>
> https://wiki.apache.org/hadoop/AmazonS3
>
> Regards,
>
> James
>
>
>> Besides that you can increase s3 speeds using the instructions mentioned
>> here:
>> https://aws.amazon.com/blogs/aws/aws-storage-update-amazon-s3-transfer-acceleration-larger-snowballs-in-more-regions/
>>
>>
>> Regards,
>> Gourav
>>
>> On Tue, May 3, 2016 at 12:04 PM, Steve Loughran <st...@hortonworks.com>
>> wrote:
>>
>>> don't put your secret in the URI, it'll only creep out in the logs.
>>>
>>> Use the specific properties coverd in
>>> http://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html,
>>> which you can set in your spark context by prefixing them with spark.hadoop.
>>>
>>> you can also set the env vars, AWS_ACCESS_KEY_ID and
>>> AWS_SECRET_ACCESS_KEY; SparkEnv will pick these up and set the relevant
>>> spark context keys for you
>>>
>>>
>>> On 3 May 2016, at 01:53, Zhang, Jingyu <ji...@news.com.au> wrote:
>>>
>>> Hi All,
>>>
>>> I am using Eclipse with Maven for developing Spark applications. I got a
>>> error for Reading from S3 in Scala but it works fine in Java when I run
>>> them in the same project in Eclipse. The Scala/Java code and the error in
>>> following
>>>
>>>
>>> Scala
>>>
>>> val uri = URI.create("s3a://" + key + ":" + seckey + "@" +
>>> "graphclustering/config.properties");
>>> val pt = new Path("s3a://" + key + ":" + seckey + "@" +
>>> "graphclustering/config.properties");
>>> val fs = FileSystem.get(uri,ctx.hadoopConfiguration);
>>> val  inputStream:InputStream = fs.open(pt);
>>>
>>> ----Exception: on aws-java-1.7.4 and hadoop-aws-2.6.1----
>>>
>>> Exception in thread "main"
>>> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service:
>>> Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:
>>> 8A56DC7BF0BFF09A), S3 Extended Request ID
>>>
>>> at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(
>>> AmazonHttpClient.java:1160)
>>>
>>> at com.amazonaws.http.AmazonHttpClient.executeOneRequest(
>>> AmazonHttpClient.java:748)
>>>
>>> at com.amazonaws.http.AmazonHttpClient.executeHelper(
>>> AmazonHttpClient.java:467)
>>>
>>> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:302
>>> )
>>>
>>> at com.amazonaws.services.s3.AmazonS3Client.invoke(
>>> AmazonS3Client.java:3785)
>>>
>>> at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(
>>> AmazonS3Client.java:1050)
>>>
>>> at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(
>>> AmazonS3Client.java:1027)
>>>
>>> at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(
>>> S3AFileSystem.java:688)
>>>
>>> at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:222)
>>>
>>> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
>>>
>>> at com.news.report.graph.GraphCluster$.main(GraphCluster.scala:53)
>>>
>>> at com.news.report.graph.GraphCluster.main(GraphCluster.scala)
>>>
>>> 16/05/03 10:49:17 INFO SparkContext: Invoking stop() from shutdown hook
>>>
>>> 16/05/03 10:49:17 INFO SparkUI: Stopped Spark web UI at
>>> http://10.65.80.125:4040
>>>
>>> 16/05/03 10:49:17 INFO MapOutputTrackerMasterEndpoint:
>>> MapOutputTrackerMasterEndpoint stopped!
>>>
>>> 16/05/03 10:49:17 INFO MemoryStore: MemoryStore cleared
>>>
>>> 16/05/03 10:49:17 INFO BlockManager: BlockManager stopped
>>>
>>> ----Exception: on aws-java-1.7.4 and hadoop-aws-2.7.2----
>>>
>>> 16/05/03 10:23:40 INFO Slf4jLogger: Slf4jLogger started
>>>
>>> 16/05/03 10:23:40 INFO Remoting: Starting remoting
>>>
>>> 16/05/03 10:23:40 INFO Remoting: Remoting started; listening on
>>> addresses :[akka.tcp://sparkDriverActorSystem@10.65.80.125:61860]
>>>
>>> 16/05/03 10:23:40 INFO Utils: Successfully started service
>>> 'sparkDriverActorSystem' on port 61860.
>>>
>>> 16/05/03 10:23:40 INFO SparkEnv: Registering MapOutputTracker
>>>
>>> 16/05/03 10:23:40 INFO SparkEnv: Registering BlockManagerMaster
>>>
>>> 16/05/03 10:23:40 INFO DiskBlockManager: Created local directory at
>>> /private/var/folders/sc/tdmkbvr1705b8p70xqj1kqks5l9p
>>>
>>> 16/05/03 10:23:40 INFO MemoryStore: MemoryStore started with capacity
>>> 1140.4 MB
>>>
>>> 16/05/03 10:23:40 INFO SparkEnv: Registering OutputCommitCoordinator
>>>
>>> 16/05/03 10:23:40 INFO Utils: Successfully started service 'SparkUI' on
>>> port 4040.
>>>
>>> 16/05/03 10:23:40 INFO SparkUI: Started SparkUI at
>>> http://10.65.80.125:4040
>>>
>>> 16/05/03 10:23:40 INFO Executor: Starting executor ID driver on host
>>> localhost
>>>
>>> 16/05/03 10:23:40 INFO Utils: Successfully started service
>>> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 61861.
>>>
>>> 16/05/03 10:23:40 INFO NettyBlockTransferService: Server created on 61861
>>>
>>> 16/05/03 10:23:40 INFO BlockManagerMaster: Trying to register
>>> BlockManager
>>>
>>> 16/05/03 10:23:40 INFO BlockManagerMasterEndpoint: Registering block
>>> manager localhost:61861 with 1140.4 MB RAM, BlockManagerId(driver,
>>> localhost, 61861)
>>>
>>> 16/05/03 10:23:40 INFO BlockManagerMaster: Registered BlockManager
>>>
>>> Exception in thread "main" java.lang.NoSuchMethodError:
>>> com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V
>>>
>>> at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(
>>> S3AFileSystem.java:285)
>>>
>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596
>>> )
>>>
>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
>>>
>>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
>>> FileSystem.java:2630)
>>>
>>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
>>>
>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
>>>
>>> at com.news.report.graph.GraphCluster$.main(GraphCluster.scala:52)
>>>
>>> at com.news.report.graph.GraphCluster.main(GraphCluster.scala)
>>>
>>> 16/05/03 10:23:51 INFO SparkContext: Invoking stop() from shutdown hook
>>>
>>> 16/05/03 10:23:51 INFO SparkUI: Stopped Spark web UI at
>>> http://10.65.80.125:4040
>>>
>>> 16/05/03 10:23:51 INFO MapOutputTrackerMasterEndpoint:
>>> MapOutputTrackerMasterEndpoint stopped!
>>>
>>> 16/05/03 10:23:51 INFO MemoryStore: MemoryStore cleared
>>>
>>> 16/05/03 10:23:51 INFO BlockManager: BlockManager stopped
>>>
>>> 16/05/03 10:23:51 INFO BlockManagerMaster: BlockManagerMaster stopped
>>>
>>> 16/05/03 10:23:51 INFO
>>> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
>>> OutputCommitCoordinator stopped!
>>>
>>> 16/05/03 10:23:51 INFO SparkContext: Successfully stopped SparkContext
>>>
>>> 16/05/03 10:23:51 INFO ShutdownHookManager: Shutdown hook called
>>>
>>> 16/05/03 10:23:51 INFO ShutdownHookManager: Deleting directory
>>> /private/var/folders/sc/tdmkbvr1705b8p70xqj1kqks5l9pk9/T/spark-53cf244a-2947-48c9-ba97-7302c9985f35
>>>
>>> 16/05/03 10:49:17 INFO S3AFileSystem: Caught an AmazonServiceException,
>>> which means your request made it to Amazon S3, but was rejected with an
>>> error response for some reason.
>>>
>>> 16/05/03 10:49:17 INFO S3AFileSystem: Error Message: Forbidden (Service:
>>> Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:
>>> 8A56DC7BF0BFF09A)
>>>
>>> 16/05/03 10:49:17 INFO S3AFileSystem: HTTP Status Code: 403
>>>
>>> 16/05/03 10:49:17 INFO S3AFileSystem: AWS Error Code: 403 Forbidden
>>>
>>> 16/05/03 10:49:17 INFO S3AFileSystem: Error Type: Client
>>>
>>> 16/05/03 10:49:17 INFO S3AFileSystem: Request ID: 8A56DC7BF0BFF09A
>>>
>>> 16/05/03 10:49:17 INFO S3AFileSystem: Class Name:
>>> com.amazonaws.services.s3.model.AmazonS3Exception
>>>
>>>
>>> But, Java code works without error
>>>
>>> URI uri = URI.create("s3a://" + key + ":" + seckey + "@" +
>>> "graphclustering/config.properties");
>>> Path pt = new Path("s3a://" + key + ":" + seckey + "@" +
>>> "graphclustering/config.properties");
>>> FileSystem fs = FileSystem.get(uri,ctx.hadoopConfiguration());
>>> inputStream = fs.open(pt);
>>>
>>> Thanks,
>>>
>>> Jingyu
>>>
>>> This message and its attachments may contain legally privileged or
>>> confidential information. It is intended solely for the named addressee. If
>>> you are not the addressee indicated in this message or responsible for
>>> delivery of the message to the addressee, you may not copy or deliver this
>>> message or its attachments to anyone. Rather, you should permanently delete
>>> this message and its attachments and kindly notify the sender by reply
>>> e-mail. Any content of this message and its attachments which does not
>>> relate to the official business of the sending company must be taken not to
>>> have been sent or endorsed by that company or any of its related entities.
>>> No warranty is made that the e-mail or attachments are free from computer
>>> virus or other defect.
>>>
>>>
>>>
>>
>

-- 
This message and its attachments may contain legally privileged or 
confidential information. It is intended solely for the named addressee. If 
you are not the addressee indicated in this message or responsible for 
delivery of the message to the addressee, you may not copy or deliver this 
message or its attachments to anyone. Rather, you should permanently delete 
this message and its attachments and kindly notify the sender by reply 
e-mail. Any content of this message and its attachments which does not 
relate to the official business of the sending company must be taken not to 
have been sent or endorsed by that company or any of its related entities. 
No warranty is made that the e-mail or attachments are free from computer 
virus or other defect.

Re: Error from reading S3 in Scala

Posted by James Hammerton <ja...@gluru.co>.
On 3 May 2016 at 17:22, Gourav Sengupta <go...@gmail.com> wrote:

> Hi,
>
> The best thing to do is start the EMR clusters with proper permissions in
> the roles that way you do not need to worry about the keys at all.
>
> Another thing, why are we using s3a// instead of s3:// ?
>

Probably because of what's said about s3:// and s3n:// here (which is why I
use s3a://):

https://wiki.apache.org/hadoop/AmazonS3

Regards,

James


> Besides that you can increase s3 speeds using the instructions mentioned
> here:
> https://aws.amazon.com/blogs/aws/aws-storage-update-amazon-s3-transfer-acceleration-larger-snowballs-in-more-regions/
>
>
> Regards,
> Gourav
>
> On Tue, May 3, 2016 at 12:04 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
>> don't put your secret in the URI, it'll only creep out in the logs.
>>
>> Use the specific properties coverd in
>> http://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html,
>> which you can set in your spark context by prefixing them with spark.hadoop.
>>
>> you can also set the env vars, AWS_ACCESS_KEY_ID and
>> AWS_SECRET_ACCESS_KEY; SparkEnv will pick these up and set the relevant
>> spark context keys for you
>>
>>
>> On 3 May 2016, at 01:53, Zhang, Jingyu <ji...@news.com.au> wrote:
>>
>> Hi All,
>>
>> I am using Eclipse with Maven for developing Spark applications. I got a
>> error for Reading from S3 in Scala but it works fine in Java when I run
>> them in the same project in Eclipse. The Scala/Java code and the error in
>> following
>>
>>
>> Scala
>>
>> val uri = URI.create("s3a://" + key + ":" + seckey + "@" +
>> "graphclustering/config.properties");
>> val pt = new Path("s3a://" + key + ":" + seckey + "@" +
>> "graphclustering/config.properties");
>> val fs = FileSystem.get(uri,ctx.hadoopConfiguration);
>> val  inputStream:InputStream = fs.open(pt);
>>
>> ----Exception: on aws-java-1.7.4 and hadoop-aws-2.6.1----
>>
>> Exception in thread "main"
>> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service:
>> Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:
>> 8A56DC7BF0BFF09A), S3 Extended Request ID
>>
>> at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(
>> AmazonHttpClient.java:1160)
>>
>> at com.amazonaws.http.AmazonHttpClient.executeOneRequest(
>> AmazonHttpClient.java:748)
>>
>> at com.amazonaws.http.AmazonHttpClient.executeHelper(
>> AmazonHttpClient.java:467)
>>
>> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:302)
>>
>> at com.amazonaws.services.s3.AmazonS3Client.invoke(
>> AmazonS3Client.java:3785)
>>
>> at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(
>> AmazonS3Client.java:1050)
>>
>> at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(
>> AmazonS3Client.java:1027)
>>
>> at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(
>> S3AFileSystem.java:688)
>>
>> at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:222)
>>
>> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
>>
>> at com.news.report.graph.GraphCluster$.main(GraphCluster.scala:53)
>>
>> at com.news.report.graph.GraphCluster.main(GraphCluster.scala)
>>
>> 16/05/03 10:49:17 INFO SparkContext: Invoking stop() from shutdown hook
>>
>> 16/05/03 10:49:17 INFO SparkUI: Stopped Spark web UI at
>> http://10.65.80.125:4040
>>
>> 16/05/03 10:49:17 INFO MapOutputTrackerMasterEndpoint:
>> MapOutputTrackerMasterEndpoint stopped!
>>
>> 16/05/03 10:49:17 INFO MemoryStore: MemoryStore cleared
>>
>> 16/05/03 10:49:17 INFO BlockManager: BlockManager stopped
>>
>> ----Exception: on aws-java-1.7.4 and hadoop-aws-2.7.2----
>>
>> 16/05/03 10:23:40 INFO Slf4jLogger: Slf4jLogger started
>>
>> 16/05/03 10:23:40 INFO Remoting: Starting remoting
>>
>> 16/05/03 10:23:40 INFO Remoting: Remoting started; listening on addresses
>> :[akka.tcp://sparkDriverActorSystem@10.65.80.125:61860]
>>
>> 16/05/03 10:23:40 INFO Utils: Successfully started service
>> 'sparkDriverActorSystem' on port 61860.
>>
>> 16/05/03 10:23:40 INFO SparkEnv: Registering MapOutputTracker
>>
>> 16/05/03 10:23:40 INFO SparkEnv: Registering BlockManagerMaster
>>
>> 16/05/03 10:23:40 INFO DiskBlockManager: Created local directory at
>> /private/var/folders/sc/tdmkbvr1705b8p70xqj1kqks5l9p
>>
>> 16/05/03 10:23:40 INFO MemoryStore: MemoryStore started with capacity
>> 1140.4 MB
>>
>> 16/05/03 10:23:40 INFO SparkEnv: Registering OutputCommitCoordinator
>>
>> 16/05/03 10:23:40 INFO Utils: Successfully started service 'SparkUI' on
>> port 4040.
>>
>> 16/05/03 10:23:40 INFO SparkUI: Started SparkUI at
>> http://10.65.80.125:4040
>>
>> 16/05/03 10:23:40 INFO Executor: Starting executor ID driver on host
>> localhost
>>
>> 16/05/03 10:23:40 INFO Utils: Successfully started service
>> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 61861.
>>
>> 16/05/03 10:23:40 INFO NettyBlockTransferService: Server created on 61861
>>
>> 16/05/03 10:23:40 INFO BlockManagerMaster: Trying to register BlockManager
>>
>> 16/05/03 10:23:40 INFO BlockManagerMasterEndpoint: Registering block
>> manager localhost:61861 with 1140.4 MB RAM, BlockManagerId(driver,
>> localhost, 61861)
>>
>> 16/05/03 10:23:40 INFO BlockManagerMaster: Registered BlockManager
>>
>> Exception in thread "main" java.lang.NoSuchMethodError:
>> com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V
>>
>> at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(
>> S3AFileSystem.java:285)
>>
>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
>>
>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
>>
>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630
>> )
>>
>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
>>
>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
>>
>> at com.news.report.graph.GraphCluster$.main(GraphCluster.scala:52)
>>
>> at com.news.report.graph.GraphCluster.main(GraphCluster.scala)
>>
>> 16/05/03 10:23:51 INFO SparkContext: Invoking stop() from shutdown hook
>>
>> 16/05/03 10:23:51 INFO SparkUI: Stopped Spark web UI at
>> http://10.65.80.125:4040
>>
>> 16/05/03 10:23:51 INFO MapOutputTrackerMasterEndpoint:
>> MapOutputTrackerMasterEndpoint stopped!
>>
>> 16/05/03 10:23:51 INFO MemoryStore: MemoryStore cleared
>>
>> 16/05/03 10:23:51 INFO BlockManager: BlockManager stopped
>>
>> 16/05/03 10:23:51 INFO BlockManagerMaster: BlockManagerMaster stopped
>>
>> 16/05/03 10:23:51 INFO
>> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
>> OutputCommitCoordinator stopped!
>>
>> 16/05/03 10:23:51 INFO SparkContext: Successfully stopped SparkContext
>>
>> 16/05/03 10:23:51 INFO ShutdownHookManager: Shutdown hook called
>>
>> 16/05/03 10:23:51 INFO ShutdownHookManager: Deleting directory
>> /private/var/folders/sc/tdmkbvr1705b8p70xqj1kqks5l9pk9/T/spark-53cf244a-2947-48c9-ba97-7302c9985f35
>>
>> 16/05/03 10:49:17 INFO S3AFileSystem: Caught an AmazonServiceException,
>> which means your request made it to Amazon S3, but was rejected with an
>> error response for some reason.
>>
>> 16/05/03 10:49:17 INFO S3AFileSystem: Error Message: Forbidden (Service:
>> Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:
>> 8A56DC7BF0BFF09A)
>>
>> 16/05/03 10:49:17 INFO S3AFileSystem: HTTP Status Code: 403
>>
>> 16/05/03 10:49:17 INFO S3AFileSystem: AWS Error Code: 403 Forbidden
>>
>> 16/05/03 10:49:17 INFO S3AFileSystem: Error Type: Client
>>
>> 16/05/03 10:49:17 INFO S3AFileSystem: Request ID: 8A56DC7BF0BFF09A
>>
>> 16/05/03 10:49:17 INFO S3AFileSystem: Class Name:
>> com.amazonaws.services.s3.model.AmazonS3Exception
>>
>>
>> But, Java code works without error
>>
>> URI uri = URI.create("s3a://" + key + ":" + seckey + "@" +
>> "graphclustering/config.properties");
>> Path pt = new Path("s3a://" + key + ":" + seckey + "@" +
>> "graphclustering/config.properties");
>> FileSystem fs = FileSystem.get(uri,ctx.hadoopConfiguration());
>> inputStream = fs.open(pt);
>>
>> Thanks,
>>
>> Jingyu
>>
>> This message and its attachments may contain legally privileged or
>> confidential information. It is intended solely for the named addressee. If
>> you are not the addressee indicated in this message or responsible for
>> delivery of the message to the addressee, you may not copy or deliver this
>> message or its attachments to anyone. Rather, you should permanently delete
>> this message and its attachments and kindly notify the sender by reply
>> e-mail. Any content of this message and its attachments which does not
>> relate to the official business of the sending company must be taken not to
>> have been sent or endorsed by that company or any of its related entities.
>> No warranty is made that the e-mail or attachments are free from computer
>> virus or other defect.
>>
>>
>>
>

Re: Error from reading S3 in Scala

Posted by Gourav Sengupta <go...@gmail.com>.
Hi,

The best thing to do is start the EMR clusters with proper permissions in
the roles that way you do not need to worry about the keys at all.

Another thing, why are we using s3a// instead of s3:// ?

Besides that you can increase s3 speeds using the instructions mentioned
here:
https://aws.amazon.com/blogs/aws/aws-storage-update-amazon-s3-transfer-acceleration-larger-snowballs-in-more-regions/


Regards,
Gourav

On Tue, May 3, 2016 at 12:04 PM, Steve Loughran <st...@hortonworks.com>
wrote:

> don't put your secret in the URI, it'll only creep out in the logs.
>
> Use the specific properties coverd in
> http://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html,
> which you can set in your spark context by prefixing them with spark.hadoop.
>
> you can also set the env vars, AWS_ACCESS_KEY_ID and
> AWS_SECRET_ACCESS_KEY; SparkEnv will pick these up and set the relevant
> spark context keys for you
>
>
> On 3 May 2016, at 01:53, Zhang, Jingyu <ji...@news.com.au> wrote:
>
> Hi All,
>
> I am using Eclipse with Maven for developing Spark applications. I got a
> error for Reading from S3 in Scala but it works fine in Java when I run
> them in the same project in Eclipse. The Scala/Java code and the error in
> following
>
>
> Scala
>
> val uri = URI.create("s3a://" + key + ":" + seckey + "@" +
> "graphclustering/config.properties");
> val pt = new Path("s3a://" + key + ":" + seckey + "@" +
> "graphclustering/config.properties");
> val fs = FileSystem.get(uri,ctx.hadoopConfiguration);
> val  inputStream:InputStream = fs.open(pt);
>
> ----Exception: on aws-java-1.7.4 and hadoop-aws-2.6.1----
>
> Exception in thread "main"
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service:
> Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:
> 8A56DC7BF0BFF09A), S3 Extended Request ID
>
> at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(
> AmazonHttpClient.java:1160)
>
> at com.amazonaws.http.AmazonHttpClient.executeOneRequest(
> AmazonHttpClient.java:748)
>
> at com.amazonaws.http.AmazonHttpClient.executeHelper(
> AmazonHttpClient.java:467)
>
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:302)
>
> at com.amazonaws.services.s3.AmazonS3Client.invoke(
> AmazonS3Client.java:3785)
>
> at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(
> AmazonS3Client.java:1050)
>
> at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(
> AmazonS3Client.java:1027)
>
> at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(
> S3AFileSystem.java:688)
>
> at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:222)
>
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
>
> at com.news.report.graph.GraphCluster$.main(GraphCluster.scala:53)
>
> at com.news.report.graph.GraphCluster.main(GraphCluster.scala)
>
> 16/05/03 10:49:17 INFO SparkContext: Invoking stop() from shutdown hook
>
> 16/05/03 10:49:17 INFO SparkUI: Stopped Spark web UI at
> http://10.65.80.125:4040
>
> 16/05/03 10:49:17 INFO MapOutputTrackerMasterEndpoint:
> MapOutputTrackerMasterEndpoint stopped!
>
> 16/05/03 10:49:17 INFO MemoryStore: MemoryStore cleared
>
> 16/05/03 10:49:17 INFO BlockManager: BlockManager stopped
>
> ----Exception: on aws-java-1.7.4 and hadoop-aws-2.7.2----
>
> 16/05/03 10:23:40 INFO Slf4jLogger: Slf4jLogger started
>
> 16/05/03 10:23:40 INFO Remoting: Starting remoting
>
> 16/05/03 10:23:40 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://sparkDriverActorSystem@10.65.80.125:61860]
>
> 16/05/03 10:23:40 INFO Utils: Successfully started service
> 'sparkDriverActorSystem' on port 61860.
>
> 16/05/03 10:23:40 INFO SparkEnv: Registering MapOutputTracker
>
> 16/05/03 10:23:40 INFO SparkEnv: Registering BlockManagerMaster
>
> 16/05/03 10:23:40 INFO DiskBlockManager: Created local directory at
> /private/var/folders/sc/tdmkbvr1705b8p70xqj1kqks5l9p
>
> 16/05/03 10:23:40 INFO MemoryStore: MemoryStore started with capacity
> 1140.4 MB
>
> 16/05/03 10:23:40 INFO SparkEnv: Registering OutputCommitCoordinator
>
> 16/05/03 10:23:40 INFO Utils: Successfully started service 'SparkUI' on
> port 4040.
>
> 16/05/03 10:23:40 INFO SparkUI: Started SparkUI at
> http://10.65.80.125:4040
>
> 16/05/03 10:23:40 INFO Executor: Starting executor ID driver on host
> localhost
>
> 16/05/03 10:23:40 INFO Utils: Successfully started service
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 61861.
>
> 16/05/03 10:23:40 INFO NettyBlockTransferService: Server created on 61861
>
> 16/05/03 10:23:40 INFO BlockManagerMaster: Trying to register BlockManager
>
> 16/05/03 10:23:40 INFO BlockManagerMasterEndpoint: Registering block
> manager localhost:61861 with 1140.4 MB RAM, BlockManagerId(driver,
> localhost, 61861)
>
> 16/05/03 10:23:40 INFO BlockManagerMaster: Registered BlockManager
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V
>
> at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(
> S3AFileSystem.java:285)
>
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
>
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
>
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
>
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
>
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
>
> at com.news.report.graph.GraphCluster$.main(GraphCluster.scala:52)
>
> at com.news.report.graph.GraphCluster.main(GraphCluster.scala)
>
> 16/05/03 10:23:51 INFO SparkContext: Invoking stop() from shutdown hook
>
> 16/05/03 10:23:51 INFO SparkUI: Stopped Spark web UI at
> http://10.65.80.125:4040
>
> 16/05/03 10:23:51 INFO MapOutputTrackerMasterEndpoint:
> MapOutputTrackerMasterEndpoint stopped!
>
> 16/05/03 10:23:51 INFO MemoryStore: MemoryStore cleared
>
> 16/05/03 10:23:51 INFO BlockManager: BlockManager stopped
>
> 16/05/03 10:23:51 INFO BlockManagerMaster: BlockManagerMaster stopped
>
> 16/05/03 10:23:51 INFO
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
> OutputCommitCoordinator stopped!
>
> 16/05/03 10:23:51 INFO SparkContext: Successfully stopped SparkContext
>
> 16/05/03 10:23:51 INFO ShutdownHookManager: Shutdown hook called
>
> 16/05/03 10:23:51 INFO ShutdownHookManager: Deleting directory
> /private/var/folders/sc/tdmkbvr1705b8p70xqj1kqks5l9pk9/T/spark-53cf244a-2947-48c9-ba97-7302c9985f35
>
> 16/05/03 10:49:17 INFO S3AFileSystem: Caught an AmazonServiceException,
> which means your request made it to Amazon S3, but was rejected with an
> error response for some reason.
>
> 16/05/03 10:49:17 INFO S3AFileSystem: Error Message: Forbidden (Service:
> Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:
> 8A56DC7BF0BFF09A)
>
> 16/05/03 10:49:17 INFO S3AFileSystem: HTTP Status Code: 403
>
> 16/05/03 10:49:17 INFO S3AFileSystem: AWS Error Code: 403 Forbidden
>
> 16/05/03 10:49:17 INFO S3AFileSystem: Error Type: Client
>
> 16/05/03 10:49:17 INFO S3AFileSystem: Request ID: 8A56DC7BF0BFF09A
>
> 16/05/03 10:49:17 INFO S3AFileSystem: Class Name:
> com.amazonaws.services.s3.model.AmazonS3Exception
>
>
> But, Java code works without error
>
> URI uri = URI.create("s3a://" + key + ":" + seckey + "@" +
> "graphclustering/config.properties");
> Path pt = new Path("s3a://" + key + ":" + seckey + "@" +
> "graphclustering/config.properties");
> FileSystem fs = FileSystem.get(uri,ctx.hadoopConfiguration());
> inputStream = fs.open(pt);
>
> Thanks,
>
> Jingyu
>
> This message and its attachments may contain legally privileged or
> confidential information. It is intended solely for the named addressee. If
> you are not the addressee indicated in this message or responsible for
> delivery of the message to the addressee, you may not copy or deliver this
> message or its attachments to anyone. Rather, you should permanently delete
> this message and its attachments and kindly notify the sender by reply
> e-mail. Any content of this message and its attachments which does not
> relate to the official business of the sending company must be taken not to
> have been sent or endorsed by that company or any of its related entities.
> No warranty is made that the e-mail or attachments are free from computer
> virus or other defect.
>
>
>

Re: Error from reading S3 in Scala

Posted by Steve Loughran <st...@hortonworks.com>.
don't put your secret in the URI, it'll only creep out in the logs.

Use the specific properties coverd in http://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html, which you can set in your spark context by prefixing them with spark.hadoop.

you can also set the env vars, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY; SparkEnv will pick these up and set the relevant spark context keys for you


On 3 May 2016, at 01:53, Zhang, Jingyu <ji...@news.com.au>> wrote:

Hi All,

I am using Eclipse with Maven for developing Spark applications. I got a error for Reading from S3 in Scala but it works fine in Java when I run them in the same project in Eclipse. The Scala/Java code and the error in following


Scala

val uri = URI.create("s3a://" + key + ":" + seckey + "@" + "graphclustering/config.properties");
val pt = new Path("s3a://" + key + ":" + seckey + "@" + "graphclustering/config.properties");
val fs = FileSystem.get(uri,ctx.hadoopConfiguration);
val  inputStream:InputStream = fs.open(pt);


----Exception: on aws-java-1.7.4 and hadoop-aws-2.6.1----

Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 8A56DC7BF0BFF09A), S3 Extended Request ID

at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1160)

at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:748)

at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:467)

at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:302)

at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785)

at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1050)

at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1027)

at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:688)

at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:222)

at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)

at com.news.report.graph.GraphCluster$.main(GraphCluster.scala:53)

at com.news.report.graph.GraphCluster.main(GraphCluster.scala)

16/05/03 10:49:17 INFO SparkContext: Invoking stop() from shutdown hook

16/05/03 10:49:17 INFO SparkUI: Stopped Spark web UI at http://10.65.80.125:4040<http://10.65.80.125:4040/>

16/05/03 10:49:17 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!

16/05/03 10:49:17 INFO MemoryStore: MemoryStore cleared

16/05/03 10:49:17 INFO BlockManager: BlockManager stopped

----Exception: on aws-java-1.7.4 and hadoop-aws-2.7.2----

16/05/03 10:23:40 INFO Slf4jLogger: Slf4jLogger started

16/05/03 10:23:40 INFO Remoting: Starting remoting

16/05/03 10:23:40 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.65.80.125:61860<http://sparkDriverActorSystem@10.65.80.125:61860/>]

16/05/03 10:23:40 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 61860.

16/05/03 10:23:40 INFO SparkEnv: Registering MapOutputTracker

16/05/03 10:23:40 INFO SparkEnv: Registering BlockManagerMaster

16/05/03 10:23:40 INFO DiskBlockManager: Created local directory at /private/var/folders/sc/tdmkbvr1705b8p70xqj1kqks5l9p

16/05/03 10:23:40 INFO MemoryStore: MemoryStore started with capacity 1140.4 MB

16/05/03 10:23:40 INFO SparkEnv: Registering OutputCommitCoordinator

16/05/03 10:23:40 INFO Utils: Successfully started service 'SparkUI' on port 4040.

16/05/03 10:23:40 INFO SparkUI: Started SparkUI at http://10.65.80.125:4040<http://10.65.80.125:4040/>

16/05/03 10:23:40 INFO Executor: Starting executor ID driver on host localhost

16/05/03 10:23:40 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 61861.

16/05/03 10:23:40 INFO NettyBlockTransferService: Server created on 61861

16/05/03 10:23:40 INFO BlockManagerMaster: Trying to register BlockManager

16/05/03 10:23:40 INFO BlockManagerMasterEndpoint: Registering block manager localhost:61861 with 1140.4 MB RAM, BlockManagerId(driver, localhost, 61861)

16/05/03 10:23:40 INFO BlockManagerMaster: Registered BlockManager

Exception in thread "main" java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V

at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:285)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)

at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)

at com.news.report.graph.GraphCluster$.main(GraphCluster.scala:52)

at com.news.report.graph.GraphCluster.main(GraphCluster.scala)

16/05/03 10:23:51 INFO SparkContext: Invoking stop() from shutdown hook

16/05/03 10:23:51 INFO SparkUI: Stopped Spark web UI at http://10.65.80.125:4040<http://10.65.80.125:4040/>

16/05/03 10:23:51 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!

16/05/03 10:23:51 INFO MemoryStore: MemoryStore cleared

16/05/03 10:23:51 INFO BlockManager: BlockManager stopped

16/05/03 10:23:51 INFO BlockManagerMaster: BlockManagerMaster stopped

16/05/03 10:23:51 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!

16/05/03 10:23:51 INFO SparkContext: Successfully stopped SparkContext

16/05/03 10:23:51 INFO ShutdownHookManager: Shutdown hook called

16/05/03 10:23:51 INFO ShutdownHookManager: Deleting directory /private/var/folders/sc/tdmkbvr1705b8p70xqj1kqks5l9pk9/T/spark-53cf244a-2947-48c9-ba97-7302c9985f35

16/05/03 10:49:17 INFO S3AFileSystem: Caught an AmazonServiceException, which means your request made it to Amazon S3, but was rejected with an error response for some reason.

16/05/03 10:49:17 INFO S3AFileSystem: Error Message: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 8A56DC7BF0BFF09A)

16/05/03 10:49:17 INFO S3AFileSystem: HTTP Status Code: 403

16/05/03 10:49:17 INFO S3AFileSystem: AWS Error Code: 403 Forbidden

16/05/03 10:49:17 INFO S3AFileSystem: Error Type: Client

16/05/03 10:49:17 INFO S3AFileSystem: Request ID: 8A56DC7BF0BFF09A

16/05/03 10:49:17 INFO S3AFileSystem: Class Name: com.amazonaws.services.s3.model.AmazonS3Exception


But, Java code works without error

URI uri = URI.create("s3a://" + key + ":" + seckey + "@" + "graphclustering/config.properties");
Path pt = new Path("s3a://" + key + ":" + seckey + "@" + "graphclustering/config.properties");
FileSystem fs = FileSystem.get(uri,ctx.hadoopConfiguration());
inputStream = fs.open(pt);

Thanks,

Jingyu

This message and its attachments may contain legally privileged or confidential information. It is intended solely for the named addressee. If you are not the addressee indicated in this message or responsible for delivery of the message to the addressee, you may not copy or deliver this message or its attachments to anyone. Rather, you should permanently delete this message and its attachments and kindly notify the sender by reply e-mail. Any content of this message and its attachments which does not relate to the official business of the sending company must be taken not to have been sent or endorsed by that company or any of its related entities. No warranty is made that the e-mail or attachments are free from computer virus or other defect.