You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/06 06:59:51 UTC

[GitHub] [spark] tanelk edited a comment on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs

tanelk edited a comment on pull request #29950:
URL: https://github.com/apache/spark/pull/29950#issuecomment-704073167


   Perhaps the max number of common expressions is not the best metric here?
   
   Lets compare two cases:
   1) On the lower project you have a `JsonToStructs` and on upper Project you get 3 fields from that struct. This would mean 2 redundant computations and the "metric" you are looking at is 3.
   
   2) On the lower project you have two `JsonToStructs` and on upper Project you get 2 fields from both stucts. This would also mean 2 redundant computations and the "metric" you are looking at is 2.
   
   Adding more `JsonToStructs` to the lower level would increase the number redundant computations without increasing the max value.
   So as an alternative I would propose "the number of redundant computations" (sum of values in the `exprMap` minus its size) as a metric to use.
   
   Although I must admit, that in that case we might cache more values for the number of extra computations we save.
   So both of them have their benefits.
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org