You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/03/22 00:39:09 UTC

[GitHub] [spark] j-esse commented on issue #23556: [SPARK-26626][SQL] Maximum size for repeatedly substituted aliases in SQL expressions

j-esse commented on issue #23556: [SPARK-26626][SQL] Maximum size for repeatedly substituted aliases in SQL expressions
URL: https://github.com/apache/spark/pull/23556#issuecomment-475453821
 
 
   @HyukjinKwon  I think there are a few more things:
   The issue doesn't just manifest in CollapseProject, it happens in `collectProjectsAndFilters` in `PhysicalOperation`  as well (https://github.com/apache/spark/pull/23556/files#diff-820e654df2a5133c0f86c17e2fc5512e), even when CollapseProject is excluded.  We're actually investigating another instance of this issue, which we think lies in `PushDownPredicate`, we might have to make another fix there.
   
   In response to your concerns:
   1. As I said above, we don't want to disable the rule - for most queries, the rule will be applied unchanged.  For some queries, the rule will be partially applied (some aliases that get overly large will stop being substituted).  But CollapseProject will never be fully disabled (except in the unlikely case that the original aliases have more than `spark.sql.maxRepeatedAliasSize` aliases)
   2. `spark.sql.maxRepeatedAliasSize` really just needs to be a high threshold to catch exponential alias expansion, we'd generally never expect anyone to need to change the value from the default. We can think about other heuristics to detect exponential alias expansion, if you're concerned about having a fixed value?
   3. Sorry, I'm not quite sure what you mean here? The issue isn't specifically around driver memory OOMs, that's just one resulting effect of the explosive alias expansion - other effects include slowness, hangs, etc.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org