You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/19 00:11:23 UTC

[GitHub] [spark] databricks-david-lewis commented on issue #27377: [SPARK-30666][Core] Reliable single-stage accumulators

databricks-david-lewis commented on issue #27377: [SPARK-30666][Core] Reliable single-stage accumulators
URL: https://github.com/apache/spark/pull/27377#issuecomment-587968381
 
 
   @EnricoMi Thank you for continuing to work on this! I appreciate all the time and thought you've put into it.
   
   I worry that your solution will lead to lots of duplicated work. Is there some way to move the `AccumulatorMode` logic out of the accumulator itself? It seems like most accumulators always want to do the correct thing, which is only act on the data that is passed on to the rest of the stages.
   
   The only exception I can think of is counting the total number of bytes read or written, which is unreliable anyway because certain failures mean that information never makes it back to the driver.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org