You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2022/04/25 20:25:00 UTC

[jira] [Updated] (SPARK-38954) Implement sharing of cloud credentials among driver and executors

     [ https://issues.apache.org/jira/browse/SPARK-38954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-38954:
----------------------------------
    Affects Version/s: 3.4.0
                           (was: 3.2.1)

> Implement sharing of cloud credentials among driver and executors
> -----------------------------------------------------------------
>
>                 Key: SPARK-38954
>                 URL: https://issues.apache.org/jira/browse/SPARK-38954
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.4.0
>            Reporter: Parth Chandra
>            Priority: Major
>
> Currently Spark uses external implementations (e.g. hadoop-aws) to access cloud services like S3. In order to access the actual service, these implementations use credentials provider implementations that obtain credentials to allow access to the cloud service.
> These credentials are typically session credentials, which means that they expire after a fixed time. Sometimes, this expiry can be only an hour and for a spark job that runs for many hours (or spark streaming job that runs continuously), the credentials have to be renewed periodically.
> In many organizations, the process of getting credentials may multi-step. The organization has an identity provider service that provides authentication for the user, while the cloud service provider provides authorization for the roles the user has access to. Once the user is authenticated and her role verified, the credentials are generated for a new session.
> In a large setup with hundreds of Spark jobs and thousands of executors, each executor is then spending a lot of time getting credentials and this may put unnecessary load on the backend authentication services.
> The alleviate this, we can use Spark's architecture to obtain the credentials once in the driver and push the credentials to the executors. In addition, the driver can check the expiry of the credentials and push updated credentials to the executors. This is relatively easy to do since the rpc mechanism to implement this is already in place and is used similarly for Kerberos delegation tokens.
>   



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org