You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "hvanhovell (via GitHub)" <gi...@apache.org> on 2023/05/30 08:56:43 UTC

[GitHub] [spark] hvanhovell commented on pull request #41342: [SPARK-43829][CONNECT] Improve SparkConnectPlanner by reuse Dataset and avoid construct new Dataset

hvanhovell commented on PR #41342:
URL: https://github.com/apache/spark/pull/41342#issuecomment-1568041469

   @beliefer thanks for doing this. I will take a look :)...
   
   Architecturally I think we need to move to a situation where we do not create Dataset/Dataframes in the planner at all. It should all be logical plans. The only place where we may need a Dataset/Dataframe is for execution. The reason for this is that I want the two implementations share as much code as possible by making connect the primary API, and have the current API an extension of that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org