You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/18 04:02:12 UTC

[GitHub] [arrow-datafusion] mingmwang opened a new issue #1862: Refactor ExecutionContext and related conf to support multi-tenancy configurations.

mingmwang opened a new issue #1862:
URL: https://github.com/apache/arrow-datafusion/issues/1862

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**

Fixe [1848](https://github.com/apache/arrow-datafusion/issues/1848)
Address [138](https://github.com/apache/arrow-datafusion/issues/138) and [682](https://github.com/apache/arrow-datafusion/issues/682)

(This section helps Arrow developers understand the context and *why* for this feature, in addition to the *what*)

**Describe the solution you'd like**
In order to support more extensible and multi-tenancy configurations , we need to introduce a session related context to isolate the user specific configurations. The configurations should be correctly propagate to the executor side along with tasks.

Here is the proprosal:

1. Rename ExecutionContext to SessionContext, ExecutionContext is still the main interface for executing queries with DataFusion. It stands for a connection/session between user and DataFusion/Ballista cluster.
2. Rename ExecutionContextState to SessionState to hold the session specific state like registered functions, catalogs list,
optimizers rule and SQL related configurations
3. Rename ExecutionConfig to SQLConfig to hold all SQL related configuration entries, for example, target partition count, batch_size etc.
4. Move RuntimeConfig and RuntimeEnv out of ExecutionConfig, use RuntimeConfig and RuntimeEnv to hold non-user specific/static configuration and env settings. For each executor/scheduler instance there is only one RuntimeConfig and RuntimeEnv instance. Once RuntimeEnv and RuntimeConfig were created, they can not be changed.
5. Add createSession, closeSession methods to SchedulerServer gRPC call. Have a unique ID to represent the current session.
ongoing ExecuteQueryParams will include the session id.
6. Add getSQLConf method to ExecutionPlan trait. Each ExecutionPlan will hold a reference to the SQLConfig from the user session. ExecutionPlan can access the SQLConf during plan time or execution time.
7. Users can set SQL configuration or change SQL configurations with SQL cmd or just issue execute_query with only configuration setting items.

I can work on the PRs. There will be two or three PRs to cover different parts.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org