You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jacek Lewandowski (JIRA)" <ji...@apache.org> on 2015/10/29 14:03:27 UTC

[jira] [Updated] (SPARK-11326) Split networking in standalone mode

     [ https://issues.apache.org/jira/browse/SPARK-11326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jacek Lewandowski updated SPARK-11326:
--------------------------------------
    Description: 
h3.The idea

Currently, in standalone mode, all components, for all network connections need to use the same secure token if they want to have any security ensured. 

This ticket is intended to split the communication in standalone mode to make it more like in Yarn mode - application internal communication, scheduler internal communication and communication between the client and scheduler. 

Such refactoring will allow for the scheduler (master, workers) to use a distinct secret, which will remain unknown for the users. Similarly, it will allow for better security in applications, because each application will be able to use a distinct secret as well. 

By providing Kerberos based SASL authentication/encryption for connections between a client (Client or AppClient) and Spark Master, it will be possible to introduce authentication and automatic generation of digest tokens and safe sharing them among the application processes. 

h3.User facing changes when running application

h4.General principles:
- conf: {{spark.authenticate.secret}} is *never sent* over the wire
- env: {{SPARK_AUTH_SECRET}} is *never sent* over the wire
- In all situations env variable will overwrite conf variable if present. 
- In all situations when a user has to pass secret, it is better (safer) to do this through env variable
- In work modes with multiple secrets we assume encrypted communication between client and master, between driver and master, between master and workers

----
h4.Work modes and descriptions
h5.Client mode, single secret
h6.Configuration
- env: {{SPARK_AUTH_SECRET=secret}} or conf: {{spark.authenticate.secret=secret}}

h6.Description
- The driver is running locally
- The driver will neither send env: {{SPARK_AUTH_SECRET}} nor conf: {{spark.authenticate.secret}}
- The driver will use either env: {{SPARK_AUTH_SECRET}} or conf: {{spark.authenticate.secret}} for connection to the master
- _ExecutorRunner_ will not find any secret in _ApplicationDescription_ so it will look for it in the worker configuration and it will find it there (its presence is implied). 

----
h5.Client mode, multiple secrets
h6.Configuration
- env: {{SPARK_APP_AUTH_SECRET=app_secret}} or conf: {{spark.app.authenticate.secret=secret}}
- env: {{SPARK_SUBMISSION_AUTH_SECRET=scheduler_secret}} or conf: {{spark.submission.authenticate.secret=scheduler_secret}}

h6.Description
- The driver is running locally
- The driver will use either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf: {{spark.submission.authenticate.secret}} to connect to the master
- The driver will neither send env: {{SPARK_SUBMISSION_AUTH_SECRET}} nor conf: {{spark.submission.authenticate.secret}}
- The driver will use either {{SPARK_APP_AUTH_SECRET}} or conf: {{spark.app.authenticate.secret}} for communication with the executors
- The driver will send {{spark.executorEnv.SPARK_AUTH_SECRET=app_secret}} so that the executors can use it to communicate with the driver
- _ExecutorRunner_ will find that secret in _ApplicationDescription_ and it will set it in env: {{SPARK_AUTH_SECRET}} which will be read by _ExecutorBackend_ afterwards and used for all the connections (with driver, other executors and external shuffle service).

----
h5.Cluster mode, single secret
h6.Configuration
- env: {{SPARK_AUTH_SECRET=secret}} or conf: {{spark.authenticate.secret=secret}}

h6.Description
- The driver is run by _DriverRunner_ which is is a part of the worker
- The client will neither send env: {{SPARK_AUTH_SECRET}} nor conf: {{spark.authenticate.secret}}
- The client will use either env: {{SPARK_AUTH_SECRET}} or conf: {{spark.authenticate.secret}} for connection to the master and submit the driver
- _DriverRunner_ will not find any secret in _DriverDescription_ so it will look for it in the worker configuration and it will find it there (its presence is implied)
- _DriverRunner_ will set the secret it found in env: {{SPARK_AUTH_SECRET}} so that the driver will find it and use it for all the connections
- The driver will use either env: {{SPARK_AUTH_SECRET}} or conf: {{spark.authenticate.secret}} for connection to the master
- _ExecutorRunner_ will not find any secret in _ApplicationDescription_ so it will look for it in the worker configuration and it will find it there (its presence is implied). 

----
h5.Cluster mode, multiple secrets
h6.Configuration
- env: {{SPARK_APP_AUTH_SECRET=app_secret}} or conf: {{spark.app.authenticate.secret=secret}}
- env: {{SPARK_SUBMISSION_AUTH_SECRET=scheduler_secret}} or conf: {{spark.submission.authenticate.secret=scheduler_secret}}

h6.Description
- The driver is run by _DriverRunner_ which is is a part of the worker
- The client will use either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf: {{spark.submission.authenticate.secret}} to connect to the master
- The client will send either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf: {{spark.submission.authenticate.secret}} as env: {{SPARK_SUBMISSION_AUTH_SECRET}} (to avoid passing secret as Java command line option)
- The client will send either env: {{SPARK_APP_AUTH_SECRET}} or conf: {{spark.app.authenticate.secret}} as env: {{SPARK_APP_AUTH_SECRET}} (to avoid passing secret as Java command line option)
- _DriverRunner_ will find env: {{SPARK_SUBMISSION_AUTH_SECRET}} and env: {{SPARK_APP_AUTH_SECRET}} and will pass them both to the driver
- The driver will use env: {{SPARK_SUBMISSION_AUTH_SECRET}}
- The driver will not send env: {{SPARK_SUBMISSION_AUTH_SECRET}}
- The driver will use {{SPARK_APP_AUTH_SECRET}} for communication with the executors
- The driver will send {{spark.executorEnv.SPARK_AUTH_SECRET=app_secret}} so that the executors can use it to communicate with the driver
- _ExecutorRunner_ will find that secret in _ApplicationDescription_ and it will set it in env: {{SPARK_AUTH_SECRET}} which will be read by _ExecutorBackend_ afterwards and used for all the connections (with driver, other executors and external shuffle service).

----
h4.Lifecycles
- env: {{SPARK_AUTH_SECRET}} and conf: {{spark.authenticate.secret}} are always lost, they are never transferred to other entities. They are just used in the entity which has them defined and die.
- env: {{SPARK_SUBMISSION_AUTH_SECRET}} is used by _Client_ to connect to the master. It is sent as env variable of the same name with _DriverDescription_ so that it is also present in the environment of the driver. Driver uses it to connect to the master and it will not send it to any other entity.
- conf: {{spark.submission.authenticate.secret}} is used by _Client_ to connect to the master unless env: {{SPARK_SUBMISSION_AUTH_SECRET}} is defined. If env: {{SPARK_SUBMISSION_AUTH_SECRET}} is not defined, conf: {{spark.submission.authenticate.secret}} is copied to env in _DriverDescription_ as {{SPARK_SUBMISSION_AUTH_SECRET}} and removed from conf to avoid passing it as Java command line argument when running the driver.
- env: {{SPARK_APP_AUTH_SECRET}} is sent as env variable of the same name with _DriverDescription_ so that it is also present in the environment of the driver. Driver uses it to connect to the executors and it will send it with _ApplicationDescription_ as env: {{SPARK_AUTH_SECRET}} so that _ExecutorRunner_ can put it into the executor environment. Then _ExecutorBackend_ can use it to communicate with the driver, other executors and external shuffle service.
- conf: {{spark.app.authenticate.secret}} - if env: {{SPARK_APP_AUTH_SECRET}} is not defined, conf: {{spark.app.authenticate.secret}} is copied to env in _DriverDescription_ as {{SPARK_APP_AUTH_SECRET}} and removed from conf to avoid passing it as Java command line argument when running the driver.


  was:
Currently, in standalone mode, all components, for all network connections need to use the same secure token if they want to have any security ensured. 

This ticket is intended to split the communication in standalone mode to make it more like in Yarn mode - application internal communication, scheduler internal communication and communication between the client and scheduler. 

Such refactoring will allow for the scheduler (master, workers) to use a distinct secret, which will remain unknown for the users. Similarly, it will allow for better security in applications, because each application will be able to use a distinct secret as well. 

By providing Kerberos based SASL authentication/encryption for connections between a client (Client or AppClient) and Spark Master, it will be possible to introduce authentication and automatic generation of digest tokens and safe sharing them among the application processes. 



> Split networking in standalone mode
> -----------------------------------
>
>                 Key: SPARK-11326
>                 URL: https://issues.apache.org/jira/browse/SPARK-11326
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Jacek Lewandowski
>
> h3.The idea
> Currently, in standalone mode, all components, for all network connections need to use the same secure token if they want to have any security ensured. 
> This ticket is intended to split the communication in standalone mode to make it more like in Yarn mode - application internal communication, scheduler internal communication and communication between the client and scheduler. 
> Such refactoring will allow for the scheduler (master, workers) to use a distinct secret, which will remain unknown for the users. Similarly, it will allow for better security in applications, because each application will be able to use a distinct secret as well. 
> By providing Kerberos based SASL authentication/encryption for connections between a client (Client or AppClient) and Spark Master, it will be possible to introduce authentication and automatic generation of digest tokens and safe sharing them among the application processes. 
> h3.User facing changes when running application
> h4.General principles:
> - conf: {{spark.authenticate.secret}} is *never sent* over the wire
> - env: {{SPARK_AUTH_SECRET}} is *never sent* over the wire
> - In all situations env variable will overwrite conf variable if present. 
> - In all situations when a user has to pass secret, it is better (safer) to do this through env variable
> - In work modes with multiple secrets we assume encrypted communication between client and master, between driver and master, between master and workers
> ----
> h4.Work modes and descriptions
> h5.Client mode, single secret
> h6.Configuration
> - env: {{SPARK_AUTH_SECRET=secret}} or conf: {{spark.authenticate.secret=secret}}
> h6.Description
> - The driver is running locally
> - The driver will neither send env: {{SPARK_AUTH_SECRET}} nor conf: {{spark.authenticate.secret}}
> - The driver will use either env: {{SPARK_AUTH_SECRET}} or conf: {{spark.authenticate.secret}} for connection to the master
> - _ExecutorRunner_ will not find any secret in _ApplicationDescription_ so it will look for it in the worker configuration and it will find it there (its presence is implied). 
> ----
> h5.Client mode, multiple secrets
> h6.Configuration
> - env: {{SPARK_APP_AUTH_SECRET=app_secret}} or conf: {{spark.app.authenticate.secret=secret}}
> - env: {{SPARK_SUBMISSION_AUTH_SECRET=scheduler_secret}} or conf: {{spark.submission.authenticate.secret=scheduler_secret}}
> h6.Description
> - The driver is running locally
> - The driver will use either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf: {{spark.submission.authenticate.secret}} to connect to the master
> - The driver will neither send env: {{SPARK_SUBMISSION_AUTH_SECRET}} nor conf: {{spark.submission.authenticate.secret}}
> - The driver will use either {{SPARK_APP_AUTH_SECRET}} or conf: {{spark.app.authenticate.secret}} for communication with the executors
> - The driver will send {{spark.executorEnv.SPARK_AUTH_SECRET=app_secret}} so that the executors can use it to communicate with the driver
> - _ExecutorRunner_ will find that secret in _ApplicationDescription_ and it will set it in env: {{SPARK_AUTH_SECRET}} which will be read by _ExecutorBackend_ afterwards and used for all the connections (with driver, other executors and external shuffle service).
> ----
> h5.Cluster mode, single secret
> h6.Configuration
> - env: {{SPARK_AUTH_SECRET=secret}} or conf: {{spark.authenticate.secret=secret}}
> h6.Description
> - The driver is run by _DriverRunner_ which is is a part of the worker
> - The client will neither send env: {{SPARK_AUTH_SECRET}} nor conf: {{spark.authenticate.secret}}
> - The client will use either env: {{SPARK_AUTH_SECRET}} or conf: {{spark.authenticate.secret}} for connection to the master and submit the driver
> - _DriverRunner_ will not find any secret in _DriverDescription_ so it will look for it in the worker configuration and it will find it there (its presence is implied)
> - _DriverRunner_ will set the secret it found in env: {{SPARK_AUTH_SECRET}} so that the driver will find it and use it for all the connections
> - The driver will use either env: {{SPARK_AUTH_SECRET}} or conf: {{spark.authenticate.secret}} for connection to the master
> - _ExecutorRunner_ will not find any secret in _ApplicationDescription_ so it will look for it in the worker configuration and it will find it there (its presence is implied). 
> ----
> h5.Cluster mode, multiple secrets
> h6.Configuration
> - env: {{SPARK_APP_AUTH_SECRET=app_secret}} or conf: {{spark.app.authenticate.secret=secret}}
> - env: {{SPARK_SUBMISSION_AUTH_SECRET=scheduler_secret}} or conf: {{spark.submission.authenticate.secret=scheduler_secret}}
> h6.Description
> - The driver is run by _DriverRunner_ which is is a part of the worker
> - The client will use either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf: {{spark.submission.authenticate.secret}} to connect to the master
> - The client will send either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf: {{spark.submission.authenticate.secret}} as env: {{SPARK_SUBMISSION_AUTH_SECRET}} (to avoid passing secret as Java command line option)
> - The client will send either env: {{SPARK_APP_AUTH_SECRET}} or conf: {{spark.app.authenticate.secret}} as env: {{SPARK_APP_AUTH_SECRET}} (to avoid passing secret as Java command line option)
> - _DriverRunner_ will find env: {{SPARK_SUBMISSION_AUTH_SECRET}} and env: {{SPARK_APP_AUTH_SECRET}} and will pass them both to the driver
> - The driver will use env: {{SPARK_SUBMISSION_AUTH_SECRET}}
> - The driver will not send env: {{SPARK_SUBMISSION_AUTH_SECRET}}
> - The driver will use {{SPARK_APP_AUTH_SECRET}} for communication with the executors
> - The driver will send {{spark.executorEnv.SPARK_AUTH_SECRET=app_secret}} so that the executors can use it to communicate with the driver
> - _ExecutorRunner_ will find that secret in _ApplicationDescription_ and it will set it in env: {{SPARK_AUTH_SECRET}} which will be read by _ExecutorBackend_ afterwards and used for all the connections (with driver, other executors and external shuffle service).
> ----
> h4.Lifecycles
> - env: {{SPARK_AUTH_SECRET}} and conf: {{spark.authenticate.secret}} are always lost, they are never transferred to other entities. They are just used in the entity which has them defined and die.
> - env: {{SPARK_SUBMISSION_AUTH_SECRET}} is used by _Client_ to connect to the master. It is sent as env variable of the same name with _DriverDescription_ so that it is also present in the environment of the driver. Driver uses it to connect to the master and it will not send it to any other entity.
> - conf: {{spark.submission.authenticate.secret}} is used by _Client_ to connect to the master unless env: {{SPARK_SUBMISSION_AUTH_SECRET}} is defined. If env: {{SPARK_SUBMISSION_AUTH_SECRET}} is not defined, conf: {{spark.submission.authenticate.secret}} is copied to env in _DriverDescription_ as {{SPARK_SUBMISSION_AUTH_SECRET}} and removed from conf to avoid passing it as Java command line argument when running the driver.
> - env: {{SPARK_APP_AUTH_SECRET}} is sent as env variable of the same name with _DriverDescription_ so that it is also present in the environment of the driver. Driver uses it to connect to the executors and it will send it with _ApplicationDescription_ as env: {{SPARK_AUTH_SECRET}} so that _ExecutorRunner_ can put it into the executor environment. Then _ExecutorBackend_ can use it to communicate with the driver, other executors and external shuffle service.
> - conf: {{spark.app.authenticate.secret}} - if env: {{SPARK_APP_AUTH_SECRET}} is not defined, conf: {{spark.app.authenticate.secret}} is copied to env in _DriverDescription_ as {{SPARK_APP_AUTH_SECRET}} and removed from conf to avoid passing it as Java command line argument when running the driver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org