You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/17 16:34:13 UTC

[GitHub] [spark] EnricoMi commented on issue #27377: [SPARK-30666][Core] Reliable single-stage accumulators

EnricoMi commented on issue #27377: [SPARK-30666][Core] Reliable single-stage accumulators
URL: https://github.com/apache/spark/pull/27377#issuecomment-600170844
 
 
   I have had a quick chat with @holdenk and we found two use cases where this approach will not work:
   
   1. partially computed partitions will lock the partial value of the aggregator, a subsequent complete computation will not update that partition's aggregator value
   2. building a Dataset on top of another one that contains an accumulator may produce two query plans where the aggregator is computed with differing partitioning
   
   I will look into these, so changing this to WIP. More feedback welcome.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org