You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/05/29 04:38:20 UTC

[GitHub] [spark] cloud-fan opened a new pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs

cloud-fan opened a new pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs
URL: https://github.com/apache/spark/pull/24735
 
 
   ## What changes were proposed in this pull request?
   
   For simplicity, all `LambdaVariable`s are globally unique, to avoid any potential conflicts. However, this causes a perf problem: we can never hit codegen cache for encoder expressions that deal with collections (which means they contain `LambdaVariable`).
   
   To overcome this problem, `LambdaVariable` should have per-query unique IDs. This PR does 2 things:
   1. refactor `LambdaVariable` to carry an ID, so that it's easier to change the ID.
   2. add an optimizer rule to reassign `LambdaVariable` IDs, which are per-query unique.
   
   ## How was this patch tested?
   
   new tests
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org