You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/16 15:27:26 UTC

[GitHub] [arrow-datafusion] thinkharderdev opened a new pull request #2024: Use ExecutionContext to parse Expr protobuf

thinkharderdev opened a new pull request #2024:
URL: https://github.com/apache/arrow-datafusion/pull/2024


   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Partially addresses #1882
   
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   Add serialization of UDF/UDAF to Ballista and `datafusion-proto`. We serialize the UDF/UDAF by name and then use the functions registered in the `ExecutionContextState` to deserialize. 
   
   Users can either initialize Ballista instances with an `ExecutionContext` registered with the relevant functions or use the plugin mechanism being built as part of #1881. 
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   Parsing logical `Expr` now requires an `ExecutionContext`. Additionally, extension codecs in `Ballista` now need an `ExecutionContext`. 
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   Yes, the signatures for `PhysicalExtensionCodec.try_decode` and `LogicalExtensionCodec.try_decode` now take a reference to an `ExecutionContext`. 
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] thinkharderdev commented on pull request #2024: Use SessionContext to parse Expr protobuf

Posted by GitBox <gi...@apache.org>.
thinkharderdev commented on pull request #2024:
URL: https://github.com/apache/arrow-datafusion/pull/2024#issuecomment-1072319467


   > <img alt="image" width="1242" src="https://user-images.githubusercontent.com/32193458/158924245-3239ff21-be47-428e-879a-76430fc6ccb9.png">
   > 
   > Does this place need to be modified ?
   
   No. The default `AsLogicalPlan` instance was updated to pass the context when deserializing. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] mingmwang commented on pull request #2024: Use SessionContext to parse Expr protobuf

Posted by GitBox <gi...@apache.org>.
mingmwang commented on pull request #2024:
URL: https://github.com/apache/arrow-datafusion/pull/2024#issuecomment-1075932478


   Sorry for the late response. Looks like this PR is a little problematic and it has conflicts with the multiple tenancy SessionContexts. The major problem is in the Executor side, Executor side also need to run those methods to parse the  Expr protobuf to physical plan, but Executor side should not have SessionContexts, the optimizers, planner and physical planner does't make sense in  Executor side.
   
   And for pluggable UDF/UDAF, I think we should have global level UDF/UDAF and session level UDF/UDAF.
   For example all the build-in UDF/UDAF should be globally available.  Different users can also upload/add his own functions 
   to his session.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] mingmwang edited a comment on pull request #2024: Use SessionContext to parse Expr protobuf

Posted by GitBox <gi...@apache.org>.
mingmwang edited a comment on pull request #2024:
URL: https://github.com/apache/arrow-datafusion/pull/2024#issuecomment-1076250450


   > > Hi, @thinkharderdev
   > > I think this PR doesn't resolve the issue. One of the key problem is, users register the UDF/UDFS with SessionContext.register_udf(), but how does the SessionContexts (ExecutionContext) in all the Executors know the registered UDF? The information is not propagate to executor side.
   > > For DataFusion, SessionContext.register_udf() is not a problem. But for Baliista, there are three different kind of SessionContext:
   > > 
   > > 1. the context created in Ballista client and uses the BallistaQueryPlanner to send logical plans to Scheduler, see the method create_df_ctx_with_ballista_query_planner()
   > > 2. the context created in Scheduler
   > > 3. the context created in Executor
   > > 
   > > I'm going to remove the context created in Executor because I think it does not make sense to have Executor hold a global SessionContext.
   > > Hi, @yjshen @alamb
   > > Please share your thoughts.
   > 
   > I think the idea is that there are two ways to register the UDF/UDAF in the executors:
   > 
   > 1. Build your own executor where you basically write your own entrypoint and setup the `SessionContext` with any extensions (codecs, udfs, udafs, optimizers, planners, etc) registered at startup.
   > 2. Use the mechanism being developed in [add udf/udaf plugin #1881](https://github.com/apache/arrow-datafusion/pull/1881) to load udf/udaf as plugins which are put in a plugin directory and dynamically linked into the out of the box scheduler/executor binary.
   > 
   > Based on the discussion in that PR we were thinking that the plugin mechanism could use the `SessionContext` to do the registration of plugins at application start so internally we could use the same mechanism for both approaches. That is, you can either use plugins or roll your own main method but you still end up with extensions registered in the `SessionContext` and we use that for deserializing udfs/udafs in the plan.
   > 
   > Ultimately I think we do need some sort of context in the executor if we are going to support all the extension points that DataFusion provides in Ballista.
   
   Hi, @thinkharderdev 
   
   I'm going to remove the SessionContext in Executor code and make the Executor itself implement FunctionRegistry trait
   and pass the ```Arc<dyn FunctionRegistry> ``` down to the ``` try_into_physical_plan() ```and other required methods. 
   The major reason is Executor side does not need optimizers, planner, actually Executor side should not run any planning logic. The planner logic should only runs in Scheduler side.  And to support multiple tenancy SessionContext, it is weird to keep a global SessionContext in Executor side.
   
   For udf/udaf registration in Executor side,  I did not get a chance to scan the code in https://github.com/apache/arrow-datafusion/pull/1881.  The two approaches you just mentioned should both work. And to support session level udf/udaf(temporary functions),  we also need a way to propagate the temp functions' meta data and lib url or lib files from Scheduler side to Executor side.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #2024: Use SessionContext to parse Expr protobuf

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #2024:
URL: https://github.com/apache/arrow-datafusion/pull/2024


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] mingmwang commented on pull request #2024: Use SessionContext to parse Expr protobuf

Posted by GitBox <gi...@apache.org>.
mingmwang commented on pull request #2024:
URL: https://github.com/apache/arrow-datafusion/pull/2024#issuecomment-1076250450


   > > Hi, @thinkharderdev
   > > I think this PR doesn't resolve the issue. One of the key problem is, users register the UDF/UDFS with SessionContext.register_udf(), but how does the SessionContexts (ExecutionContext) in all the Executors know the registered UDF? The information is not propagate to executor side.
   > > For DataFusion, SessionContext.register_udf() is not a problem. But for Baliista, there are three different kind of SessionContext:
   > > 
   > > 1. the context created in Ballista client and uses the BallistaQueryPlanner to send logical plans to Scheduler, see the method create_df_ctx_with_ballista_query_planner()
   > > 2. the context created in Scheduler
   > > 3. the context created in Executor
   > > 
   > > I'm going to remove the context created in Executor because I think it does not make sense to have Executor hold a global SessionContext.
   > > Hi, @yjshen @alamb
   > > Please share your thoughts.
   > 
   > I think the idea is that there are two ways to register the UDF/UDAF in the executors:
   > 
   > 1. Build your own executor where you basically write your own entrypoint and setup the `SessionContext` with any extensions (codecs, udfs, udafs, optimizers, planners, etc) registered at startup.
   > 2. Use the mechanism being developed in [add udf/udaf plugin #1881](https://github.com/apache/arrow-datafusion/pull/1881) to load udf/udaf as plugins which are put in a plugin directory and dynamically linked into the out of the box scheduler/executor binary.
   > 
   > Based on the discussion in that PR we were thinking that the plugin mechanism could use the `SessionContext` to do the registration of plugins at application start so internally we could use the same mechanism for both approaches. That is, you can either use plugins or roll your own main method but you still end up with extensions registered in the `SessionContext` and we use that for deserializing udfs/udafs in the plan.
   > 
   > Ultimately I think we do need some sort of context in the executor if we are going to support all the extension points that DataFusion provides in Ballista.
   
   Hi, @thinkharderdev 
   
   I'm going to remove the SessionContext in Executor code and make the Executor itself implement FunctionRegistry trait
   and pass the Arc<dyn FunctionRegistry> down to the try_into_physical_plan() and other required methods. 
   The major reason is Executor side does not need optimizers, planner, actually Executor side should not run any planning logic. The planner logic should only runs in Scheduler side.  And to support multiple tenancy SessionContext, it is weird to keep a global SessionContext in Executor side.
   
   For udf/udaf registration in Executor side,  I did not get a chance to scan the code in https://github.com/apache/arrow-datafusion/pull/1881.  The two approaches you just mentioned should both work. And to support session level udf/udaf(temporary functions),  we also need a way to propagate the temp functions' meta data and lib url or lib files from Scheduler side to Executor side.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] mingmwang commented on pull request #2024: Use SessionContext to parse Expr protobuf

Posted by GitBox <gi...@apache.org>.
mingmwang commented on pull request #2024:
URL: https://github.com/apache/arrow-datafusion/pull/2024#issuecomment-1076067762


   Hi, @thinkharderdev
   
   I think this PR doesn't resolve the issue. One of the key problem is, users register the UDF/UDFS with SessionContext.register_udf(), but how does the SessionContexts (ExecutionContext) in all the Executors know the registered UDF? The information is not propagate to executor side.
   
   For DataFusion, SessionContext.register_udf() is not a problem. But for Baliista, there are three different kind of SessionContext:
   1. the context created in Ballista client and uses the BallistaQueryPlanner to send logical plans to Scheduler
   2. the context created in Scheduler
   3. the context created in Executor
   
   I'm going to remove the context created in Executor because I think it does not make sense to have Executor hold a global SessionContext.
   
   Hi, @yjshen @alamb
   
   Please share your thoughts.
   
     
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] gaojun2048 commented on pull request #2024: Use SessionContext to parse Expr protobuf

Posted by GitBox <gi...@apache.org>.
gaojun2048 commented on pull request #2024:
URL: https://github.com/apache/arrow-datafusion/pull/2024#issuecomment-1071952968


   <img width="1242" alt="image" src="https://user-images.githubusercontent.com/32193458/158924245-3239ff21-be47-428e-879a-76430fc6ccb9.png">
   
   Does this place need to be modified ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] thinkharderdev commented on pull request #2024: Use SessionContext to parse Expr protobuf

Posted by GitBox <gi...@apache.org>.
thinkharderdev commented on pull request #2024:
URL: https://github.com/apache/arrow-datafusion/pull/2024#issuecomment-1072319467


   > <img alt="image" width="1242" src="https://user-images.githubusercontent.com/32193458/158924245-3239ff21-be47-428e-879a-76430fc6ccb9.png">
   > 
   > Does this place need to be modified ?
   
   No. The default `AsLogicalPlan` instance was updated to pass the context when deserializing. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] thinkharderdev commented on pull request #2024: Use SessionContext to parse Expr protobuf

Posted by GitBox <gi...@apache.org>.
thinkharderdev commented on pull request #2024:
URL: https://github.com/apache/arrow-datafusion/pull/2024#issuecomment-1076214948


   > Hi, @thinkharderdev
   > 
   > I think this PR doesn't resolve the issue. One of the key problem is, users register the UDF/UDFS with SessionContext.register_udf(), but how does the SessionContexts (ExecutionContext) in all the Executors know the registered UDF? The information is not propagate to executor side.
   > 
   > For DataFusion, SessionContext.register_udf() is not a problem. But for Baliista, there are three different kind of SessionContext:
   > 
   > 1. the context created in Ballista client and uses the BallistaQueryPlanner to send logical plans to Scheduler, see the method create_df_ctx_with_ballista_query_planner()
   > 2. the context created in Scheduler
   > 3. the context created in Executor
   > 
   > I'm going to remove the context created in Executor because I think it does not make sense to have Executor hold a global SessionContext.
   > 
   > Hi, @yjshen @alamb
   > 
   > Please share your thoughts.
   
   I think the idea is that there are two ways to register the UDF/UDAF in the executors:
   
   1. Build your own executor where you basically write your own entrypoint and setup the `SessionContext` with any extensions (codecs, udfs, udafs, optimizers, planners, etc) registered at startup. 
   2. Use the mechanism being developed in https://github.com/apache/arrow-datafusion/pull/1881 to load udf/udaf as plugins which are put in a plugin directory and dynamically linked into the out of the box scheduler/executor binary.  
   
   Based on the discussion in that PR we were thinking that the plugin mechanism could use the `SessionContext` to do the registration of plugins at application start so internally we could use the same mechanism for both approaches. That is, you can either use plugins or roll your own main method but you still end up with extensions registered in the `SessionContext` and we use that for deserializing udfs/udafs in the plan. 
   
   Ultimately I think we do need some sort of context in the executor if we are going to support all the extension points that DataFusion provides in Ballista. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] gaojun2048 commented on pull request #2024: Use SessionContext to parse Expr protobuf

Posted by GitBox <gi...@apache.org>.
gaojun2048 commented on pull request #2024:
URL: https://github.com/apache/arrow-datafusion/pull/2024#issuecomment-1071952968


   <img width="1242" alt="image" src="https://user-images.githubusercontent.com/32193458/158924245-3239ff21-be47-428e-879a-76430fc6ccb9.png">
   
   Does this place need to be modified ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] mingmwang edited a comment on pull request #2024: Use SessionContext to parse Expr protobuf

Posted by GitBox <gi...@apache.org>.
mingmwang edited a comment on pull request #2024:
URL: https://github.com/apache/arrow-datafusion/pull/2024#issuecomment-1075932478


   Sorry for the late response. Looks like this PR is a little problematic and it has conflicts with the multiple tenancy SessionContexts. The major problem is in the Executor side, Executor side also need to run those methods to parse the  Expr protobuf to physical plan, but Executor side should not have SessionContexts, the optimizers, planner and physical planner do not make sense in Executor side.
   
   And for the pluggable UDFs/UDAFs, I think we should have global level UDFs/UDAFs and session level UDFs/UDAFs.
   For example all the build-in UDFs/UDAFs should be globally available.  Different users can also upload/add his own functions 
   to his SessionContext.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #2024: Use SessionContext to parse Expr protobuf

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #2024:
URL: https://github.com/apache/arrow-datafusion/pull/2024#issuecomment-1076735854


   I don't have anything substantial to add to this conversation -- it sounds like the challenges are well understood and there are good ideas of how to do the mapping from "function name" to "UDF implementation" on the executor side. 
   
   It is great discussion to see.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] mingmwang edited a comment on pull request #2024: Use SessionContext to parse Expr protobuf

Posted by GitBox <gi...@apache.org>.
mingmwang edited a comment on pull request #2024:
URL: https://github.com/apache/arrow-datafusion/pull/2024#issuecomment-1076067762


   Hi, @thinkharderdev
   
   I think this PR doesn't resolve the issue. One of the key problem is, users register the UDF/UDFS with SessionContext.register_udf(), but how does the SessionContexts (ExecutionContext) in all the Executors know the registered UDF? The information is not propagate to executor side.
   
   For DataFusion, SessionContext.register_udf() is not a problem. But for Baliista, there are three different kind of SessionContext:
   1. the context created in Ballista client and uses the BallistaQueryPlanner to send logical plans to Scheduler, see the method create_df_ctx_with_ballista_query_planner()
   
   3. the context created in Scheduler
   4. the context created in Executor
   
   I'm going to remove the context created in Executor because I think it does not make sense to have Executor hold a global SessionContext.
   
   Hi, @yjshen @alamb
   
   Please share your thoughts.
   
     
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yjshen commented on pull request #2024: Use SessionContext to parse Expr protobuf

Posted by GitBox <gi...@apache.org>.
yjshen commented on pull request #2024:
URL: https://github.com/apache/arrow-datafusion/pull/2024#issuecomment-1069278167


   cc @yahoNanJing @mingmwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] mingmwang edited a comment on pull request #2024: Use SessionContext to parse Expr protobuf

Posted by GitBox <gi...@apache.org>.
mingmwang edited a comment on pull request #2024:
URL: https://github.com/apache/arrow-datafusion/pull/2024#issuecomment-1076067762


   Hi, @thinkharderdev
   
   I think this PR doesn't resolve the issue. One of the key problem is, users register the UDF/UDFS with SessionContext.register_udf(), but how does the SessionContexts (ExecutionContext) in all the Executors know the registered UDF? The information is not propagate to executor side.
   
   For DataFusion, SessionContext.register_udf() is not a problem. But for Baliista, there are three different kind of SessionContext:
   1. the context created in Ballista client and uses the BallistaQueryPlanner to send logical plans to Scheduler, see the method
   create_df_ctx_with_ballista_query_planner()
   
   3. the context created in Scheduler
   4. the context created in Executor
   
   I'm going to remove the context created in Executor because I think it does not make sense to have Executor hold a global SessionContext.
   
   Hi, @yjshen @alamb
   
   Please share your thoughts.
   
     
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] thinkharderdev commented on pull request #2024: Use SessionContext to parse Expr protobuf

Posted by GitBox <gi...@apache.org>.
thinkharderdev commented on pull request #2024:
URL: https://github.com/apache/arrow-datafusion/pull/2024#issuecomment-1076306290


   > > > Hi, @thinkharderdev
   > > > I think this PR doesn't resolve the issue. One of the key problem is, users register the UDF/UDFS with SessionContext.register_udf(), but how does the SessionContexts (ExecutionContext) in all the Executors know the registered UDF? The information is not propagate to executor side.
   > > > For DataFusion, SessionContext.register_udf() is not a problem. But for Baliista, there are three different kind of SessionContext:
   > > > 
   > > > 1. the context created in Ballista client and uses the BallistaQueryPlanner to send logical plans to Scheduler, see the method create_df_ctx_with_ballista_query_planner()
   > > > 2. the context created in Scheduler
   > > > 3. the context created in Executor
   > > > 
   > > > I'm going to remove the context created in Executor because I think it does not make sense to have Executor hold a global SessionContext.
   > > > Hi, @yjshen @alamb
   > > > Please share your thoughts.
   > > 
   > > 
   > > I think the idea is that there are two ways to register the UDF/UDAF in the executors:
   > > 
   > > 1. Build your own executor where you basically write your own entrypoint and setup the `SessionContext` with any extensions (codecs, udfs, udafs, optimizers, planners, etc) registered at startup.
   > > 2. Use the mechanism being developed in [add udf/udaf plugin #1881](https://github.com/apache/arrow-datafusion/pull/1881) to load udf/udaf as plugins which are put in a plugin directory and dynamically linked into the out of the box scheduler/executor binary.
   > > 
   > > Based on the discussion in that PR we were thinking that the plugin mechanism could use the `SessionContext` to do the registration of plugins at application start so internally we could use the same mechanism for both approaches. That is, you can either use plugins or roll your own main method but you still end up with extensions registered in the `SessionContext` and we use that for deserializing udfs/udafs in the plan.
   > > Ultimately I think we do need some sort of context in the executor if we are going to support all the extension points that DataFusion provides in Ballista.
   > 
   > Hi, @thinkharderdev
   > 
   > I'm going to remove the SessionContext in Executor code and make the Executor itself implement FunctionRegistry trait and pass the `Arc<dyn FunctionRegistry> ` down to the `try_into_physical_plan()`and other required methods. The major reason is Executor side does not need optimizers, planner, actually Executor side should not run any planning logic. The planner logic should only runs in Scheduler side. And to support multiple tenancy SessionContext, it is weird to keep a global SessionContext in Executor side.
   > 
   > For udf/udaf registration in Executor side, I did not get a chance to scan the code in #1881. The two approaches you just mentioned should both work. And to support session level udf/udaf(temporary functions), we also need a way to propagate the temp functions' meta data and lib url or lib files from Scheduler side to Executor side.
   
   Can I suggest using `TaskContext` instead. We can add the `FunctionRegistry` to the `TaskContext` and pass that in. We would still need the `ObjectStoreRegistry` to decode the physical plan protobuf and I think it is a good idea to have a more general way of adding extension points into the executor. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org