You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "beliefer (via GitHub)" <gi...@apache.org> on 2023/05/26 07:41:23 UTC

[GitHub] [spark] beliefer opened a new pull request, #41328: [WIP][SPARK-40586][CONNECT] Decouple plan transformation and validation on server side

beliefer opened a new pull request, #41328:
URL: https://github.com/apache/spark/pull/41328

   ### What changes were proposed in this pull request?
   Project connect, from some perspectives, can be thought as replacing the SQL parser to generate a parsed (but the difference that is unresolved) plan, then the plan is passed to the analyzer. This means that connect should also do validation on the proto as there are many in-validate parser cases that analyzer does not expect to see, which potentially could cause problems if connect only pass through the proto (of course have it translated) to analyzer.
   Meanwhile I think this is a good idea to decouple the validation and transformation so that we have two stages:
   stage 1: proto validation. For example validate if necessary fields are populated or not.
   stage 2: transformation, which convert the proto to a plan with assumption that the plan is valid parsed version of the plan.
   
   This PR checks before transformation through a simple proxy pattern.
   
   Because there are a lot of job to do, this PR only given an example(e.g. checkReadRel). So that acilitate communication and discussion.
   
   ### Why are the changes needed?
   Decouple plan transformation and validation on server side
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   Just update the inner implementation.
   
   
   ### How was this patch tested?
   Exists test cases.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #41328: [WIP][SPARK-40586][CONNECT] Decouple plan transformation and validation on server side

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #41328:
URL: https://github.com/apache/spark/pull/41328#issuecomment-1571423760

   @grundprinzip Thank you for your review. I agree your opinion that list three issues. This PR provides a proposal that references `Analyzer` and `CheckAnalysis`. As you said, `Analyzer` and `CheckAnalysis` exists the three issues too. If we can't accept this proposal, so do we need to review Analyzer` and `CheckAnalysis` ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #41328: [WIP][SPARK-40586][CONNECT] Decouple plan transformation and validation on server side

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on PR #41328:
URL: https://github.com/apache/spark/pull/41328#issuecomment-1565665299

   I personally like this direction which is separating validation and transforming. The main benefit is code readability and also structure the code in a more logical way. 
   
   
   cc @grundprinzip 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #41328: [WIP][SPARK-40586][CONNECT] Decouple plan transformation and validation on server side

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #41328: [WIP][SPARK-40586][CONNECT] Decouple plan transformation and validation on server side
URL: https://github.com/apache/spark/pull/41328


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on pull request #41328: [WIP][SPARK-40586][CONNECT] Decouple plan transformation and validation on server side

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on PR #41328:
URL: https://github.com/apache/spark/pull/41328#issuecomment-1569575227

   @beliefer First of all, thanks a lot for the proposal and the approach to improving the quality and readability of the Spark Connect code.
   
   Personally, I don't think this change is a good approach for the following reasons:
   
   * The logic in `checkRelation` essentially becomes a shadow path to the logic in the actual planner. For the reviewer of the code and the writer, this means they have to validate the code and the behavior in two places and make sure that the behaviors are congruent.
   * In addition, when reviewing these changes of the refactoring, it is already hard now to reason about the probability of regression because the refactoring is not just purely cosmetic but for example, changes the actual branches and conditions of the planning logic. I agree that some of the removed branches are unreachable after the checker is introduced but I wanted to raise the complication of the review itself.
   * Lastly, I think that this separation in general will reduce the reviewability of the code because the refactoring extracts non-trivial logical analysis that is easier understood close to where it's actually evaluated.
   
   That said, I think we definitely have room for improvement to increase the readability and style of the code to make it easier to contribute. Maybe there are slightly smaller and more incremental ways to improve it?
   
   Thanks again for trying the approach!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #41328: [WIP][SPARK-40586][CONNECT] Decouple plan transformation and validation on server side

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #41328:
URL: https://github.com/apache/spark/pull/41328#issuecomment-1565102808

   ping @HyukjinKwon @hvanhovell @zhengruifeng cc @amaliujia 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #41328: [WIP][SPARK-40586][CONNECT] Decouple plan transformation and validation on server side

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #41328:
URL: https://github.com/apache/spark/pull/41328#issuecomment-1712669139

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org