You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/29 08:39:55 UTC

[GitHub] [arrow-datafusion] yahoNanJing opened a new issue #1703: [Ballista] Support to better manage cluster state, like alive executors, executor available task slots, etc

yahoNanJing opened a new issue #1703:
URL: https://github.com/apache/arrow-datafusion/issues/1703

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**

Currently all of the cluster state, like executor info, task info, are stored in the sled db. And a global lock is used for dealing with concurrency issue. Not only the serialization and deserialization cost will be large, but also the global lock will be a bottleneck when hundreds of thousands of tasks need to be dealt with.

**Describe the solution you'd like**

A better way is:
1. Firstly classify the cluster state
- which cluster state will be relatively stable, like executor metadata, execution plan for jobs,
- which cluster state will be changed frequently, like executor available task slots, task status
2. Secondly for different kinds of cluster state info, use corresponding suitable way to deal with them
- for stable info, we may still store them in the sled db as a ground truth. However, better to cache them in memory to reduce the serialization and deserialization cost.
- for volatile cluster state info, better not to store them in the db. Just keep one in memory. In case of using multiple schedulers, it's better to use other ways to deal with the state sync issue, like optimistic lock with compare and set, etc.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb closed issue #1703: [Ballista] Support to better manage cluster state, like alive executors, executor available task slots, etc

Posted by GitBox <gi...@apache.org>.

alamb closed issue #1703:
URL: https://github.com/apache/arrow-datafusion/issues/1703


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org