You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/12 12:05:37 UTC

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1987: Rename `ExecutionContext` to `SessionContext`, `ExecutionContextState` to `SessionState`, add `TaskContext` to support multi-tenancy configurations - Part 1

alamb commented on a change in pull request #1987:
URL: https://github.com/apache/arrow-datafusion/pull/1987#discussion_r825289854



##########
File path: benchmarks/src/bin/tpch.rs
##########
@@ -584,13 +583,13 @@ fn get_query_sql(query: usize) -> Result<String> {
     }
 }
 
-fn create_logical_plan(ctx: &mut ExecutionContext, query: usize) -> Result<LogicalPlan> {
+fn create_logical_plan(ctx: &mut SessionContext, query: usize) -> Result<LogicalPlan> {
     let sql = get_query_sql(query)?;
     ctx.create_logical_plan(&sql)
 }
 
 async fn execute_query(
-    ctx: &mut ExecutionContext,
+    ctx: &SessionContext,

Review comment:
       it is a nice improvement to remove some of this `mut` 👍 

##########
File path: ballista/rust/executor/src/executor_server.rs
##########
@@ -177,6 +179,20 @@ impl<T: 'static + AsLogicalPlan, U: 'static + AsExecutionPlan> ExecutorServer<T,
         );
         info!("Start to run task {}", task_id_log);
 
+        let runtime = self.executor.ctx.runtime_env();
+
+        //TODO get session_id from TaskDefinition

Review comment:
       is this a TODO for a future PR?

##########
File path: datafusion/src/execution/context.rs
##########
@@ -1220,6 +1233,74 @@ impl FunctionRegistry for ExecutionContextState {
     }
 }
 
+/// Task Context Properties
+pub enum TaskProperties {

Review comment:
       It seems like `TaskProperties` might want *both* the session config as well as possibly key value pairs (rather than either / or)
   
   What about something like
   
   ```rust
   pub strut TaskProperties {
     config: SessionConfig,
     kv_pairs: Option<HashMap<String, String>>
   }
   ```
   
   ? 

##########
File path: datafusion/src/datasource/file_format/parquet.rs
##########
@@ -419,7 +423,8 @@ mod tests {
 
     #[tokio::test]
     async fn read_alltypes_plain_parquet() -> Result<()> {
-        let runtime = Arc::new(RuntimeEnv::default());
+        let session_ctx = SessionContext::new();
+        let task_ctx = Arc::new(TaskContext::from(&session_ctx));

Review comment:
       Since this is such a common pattern (create a Arc<TaskContext>` I wonder if it would make sense to create a function for it?
   
   like
   
   ```rust
   impl SessionContext {
     /// Get a new TaskContext to run in this session
     pub fn task_ctx(&self) -> Arc<TaskContext> {
       ...
     }
   }

##########
File path: datafusion/CHANGELOG.md
##########
@@ -56,7 +56,7 @@
 - Add `approx_quantile`  support [\#1538](https://github.com/apache/arrow-datafusion/issues/1538)
 - support sorting decimal data type [\#1522](https://github.com/apache/arrow-datafusion/issues/1522)
 - Keep all datafusion's packages up to date with Dependabot [\#1472](https://github.com/apache/arrow-datafusion/issues/1472)
-- ExecutionContext support init ExecutionContextState with `new(state: Arc<Mutex<ExecutionContextState>>)` method [\#1439](https://github.com/apache/arrow-datafusion/issues/1439)
+- SessionContext support init SessionState with `new(state: Arc<Mutex<SessionState>>)` method [\#1439](https://github.com/apache/arrow-datafusion/issues/1439)

Review comment:
       We probably should revert changes to the CHANGELOG for past releases

##########
File path: ballista/rust/core/src/execution_plans/unresolved_shuffle.rs
##########
@@ -21,7 +21,7 @@ use std::sync::Arc;
 use async_trait::async_trait;

Review comment:
       BTW nice job catching all instances of `ExecutionContext
   
   ```
   -*- mode: grep; default-directory: "~/Software/arrow-datafusion/" -*-
   Grep started at Sat Mar 12 06:43:13
   
   rg -n -H --no-heading -e 'ExecutionContext' $(git rev-parse --show-toplevel || pwd)
   
   Grep finished with no matches found at Sat Mar 12 06:43:13
   
   ```

##########
File path: ballista/rust/core/src/execution_plans/shuffle_writer.rs
##########
@@ -138,11 +138,11 @@ impl ShuffleWriterExec {
     pub async fn execute_shuffle_write(
         &self,
         input_partition: usize,
-        runtime: Arc<RuntimeEnv>,
+        context: Arc<TaskContext>,

Review comment:
       I think the idea of having a `TaskContext` that can have per-plan / per-task state (in addition to overall RuntimeEnv) is a significant improvement




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org