You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/09/16 11:41:38 UTC

[GitHub] [spark] HyukjinKwon commented on pull request #37710: [SPARK-40448][CONNECT] Spark Connect build as Driver Plugin with Shaded Dependencies

HyukjinKwon commented on PR #37710:
URL: https://github.com/apache/spark/pull/37710#issuecomment-1249261232

   This is ready for a look now.
   
   Since the whole feature and codes would be very large, we (explicitly I, @martin-g, @amaliujia, and @cloud-fan) discussed offline, and decided to propose to split this. This PR is basically the minimal working version  note that most of code lines here were generated from the protobuf.
   
   SPARK-39375 is a parent JIRA, and we described the current action items at this moment.
   More JIRAs will be filed accordingly to the plan below:
   
   ### High-level plan and design:
   
   - [High-Level Design Doc for Spark Connect](https://docs.google.com/document/d/17X6-P5H2522SnE-gF1BVwyildp_PDX8oXD-4l9vqQmA/edit?usp=sharing)
   - [Spark Connect API Testing Plan](https://docs.google.com/document/d/1n6EgS5vcmbwJUs5KGX4PzjKZVcSKd0qf0gLNZ6NFvOE/edit?usp=sharing)
   
   ### Low-level plan:
   
   **Short-term**
   - Extend test coverage for SparkConnectPlanner (right now at 76% line coverage)
   - Extend test coverage for Spark Connect Python client
   - Type annotations for Spark Connect Python client to re-enable mypy
   - Clean-up documentation in PySpark code for Spark Connect
   - Documentation for PySpark in README and doctests
   - Proto validation in server and/or client
   - Validation: 
     - Syntactic -> Parsing
     - Semantic -> Analysis 
   - Alternatively only return error class to clients upon failures.
   - Initial DSL framework for protobuf testing
   - Restructure the build structure to match with other components
     - Maven
     - SBT 
   
   **Long-term**
   - Testing with custom DSL 
   - `LocalRelation`
   - Better error handling for semantic failures
   - Spark and Session configurations
   - Scala Client
   - SBT incremental build and testing environment
   - DataSources
   - UDFs
   - Packaging / Releasing
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org