You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/12 09:26:29 UTC

[GitHub] viirya commented on issue #23731: [SPARK-26572][SQL] fix aggregate codegen result evaluation

viirya commented on issue #23731: [SPARK-26572][SQL] fix aggregate codegen result evaluation
URL: https://github.com/apache/spark/pull/23731#issuecomment-462683165

> @cloud-fan @viirya I am not sure about fixing this in the join is a good idea. First of all we have many kind of joins, so likely we would need to impact all of them and there may be other operators which use loops other than joins. I don't think it is correct to delegate to the consumer the responsibility of computing variables if needed. It seems more reasonable to me to fix it in the aggregate honestly.

In whole-stage codegen, we have the optimization to defer variable evaluation as late as possible. An operator can avoid evaluating its output variables and let its parent operator to evaluate these variables if they are actually used. Unless we want to remove this optimization, I think we shouldn't force the evaluation in aggregate.

@rednaxelafx's fix looks fine to me. Actually I'm wondering why we have such non deterministic expression pushed down to aggregate...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org