You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "mingmwang (via GitHub)" <gi...@apache.org> on 2023/02/08 07:19:45 UTC

[GitHub] [arrow-ballista] mingmwang commented on issue #650: Add Accumulator API

mingmwang commented on issue #650:
URL: https://github.com/apache/arrow-ballista/issues/650#issuecomment-1422131979

   If we just want cancel tasks early, protect systems from heavy queries/heavy scans, I think we do not need to introduce the
   Accumulator. Spark's Accumulator and Metrics system is very heavy and cause lots of memory issues and management burden to the Spark Driver, the Accumulator need to be registered in the Spark Driver and life cycle is managed by the Spark Driver.
   
   We can just leverage the current DataFusion Metrics system and TaskStatus update rpc and add necessary throttling/checking/aborting logic when we handle the Task finish event in the Ballista Scheduler. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org