You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jason Sleight (Jira)" <ji...@apache.org> on 2022/06/22 20:28:00 UTC

[jira] [Commented] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

    [ https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557682#comment-17557682 ] 

Jason Sleight commented on SPARK-38934:
---------------------------------------

After continuing to see some errors in a few edge cases (even without env variables) I recently noticed that [the default provider list|https://github.com/apache/hadoop/blob/release-3.3.1-RC3/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java#L595] is:
 # TemporaryAWSCredentialsProvider
 # SimpleAWSCredentialsProvider
 # EnvironmentVariableCredentialsProvider
 # IAMInstanceCredentialsProvider

Thus in principle explicitly setting the provider to be TemporaryAWSCredentialsProvider is unnecessary since that is the first default.  Weirdly if I just leave the provider unspecified then my errors disappeared.   I /think/ my spark session is using the TemporaryAWSCredentialsProvider from the default when unspecified but I'm actually not sure how to verify this since the spark ui is showing the provider as the entire default list.

Anyway, try not explicitly setting the provider and letting the default resolution pick TemporaryAWSCredentialsProvider.

> Provider TemporaryAWSCredentialsProvider has no credentials
> -----------------------------------------------------------
>
>                 Key: SPARK-38934
>                 URL: https://issues.apache.org/jira/browse/SPARK-38934
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.2.1
>            Reporter: Lily
>            Priority: Major
>
>  
> We are using Jupyter Hub on K8s as a notebook based development environment and Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 and Hadoop 3.3.1.
> When we run a code like the one below in the Jupyter Hub on K8s,
>  
> {code:java}
> val perm = ... // get AWS temporary credential by AWS STS from AWS assumed role
> // set AWS temporary credential
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", perm.credential.accessKeyID)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", perm.credential.secretAccessKey)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", perm.credential.sessionToken)
> // execute simple Spark action
> spark.read.format("parquet").load("s3a://<path>/*").show(1) {code}
>  
>  
> the first few executors left a warning like the one below in the first code execution, but we were able to get the proper result thanks to Spark task retry function. 
> {code:java}
> 22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) (10.197.5.15 executor 1): java.nio.file.AccessDeniedException: s3a://<path>/<file>.parquet: org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider TemporaryAWSCredentialsProvider has no credentials
> 	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
> 	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
> 	at org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
> 	at org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
> 	at scala.collection.immutable.Stream.map(Stream.scala:418)
> 	at org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
> 	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
> 	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:131)
> 	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
> 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
> 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> 	at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider TemporaryAWSCredentialsProvider has no credentials
> 	at org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
> 	at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
> 	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
> 	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
> 	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5445)
> 	at com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:6420)
> 	at com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:6393)
> 	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5430)
> 	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5392)
> 	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5386)
> 	at com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:971)
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$7(S3AFileSystem.java:2116)
> 	at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:489)
> 	at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:412)
> 	at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:375)
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:2107)
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem$ListingOperationCallbacksImpl.lambda$listObjectsAsync$0(S3AFileSystem.java:1750)
> 	at org.apache.hadoop.fs.s3a.impl.CallableSupplier.get(CallableSupplier.java:62)
> 	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
> 	... 3 more {code}
> Would you explain why we are having this warning and tell us how we can prevent  experiencing this issue again?
> Thank you in advance.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org