You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Julia (JIRA)" <ji...@apache.org> on 2017/10/09 22:39:00 UTC

[jira] [Created] (REEF-1895) REEF Bridge performance improvement for allocated evaluators

Julia created REEF-1895:
---------------------------

             Summary: REEF Bridge performance improvement for allocated evaluators
                 Key: REEF-1895
                 URL: https://issues.apache.org/jira/browse/REEF-1895
             Project: REEF
          Issue Type: Improvement
          Components: REEF, REEF Bridge
            Reporter: Julia


Recent scale tests shows there are a few places in the REEF code, mainly in bridge code that seriously impact the REEF performance and scalibility. Notably:

-Syncronized(this) in BridgeDriver in event handlers, especially Allocated Evaluator handlers. That make the events are handled in sequence. When requesting a few thousands evaluators, the slowness is dramatic. 
-A lock on Evaluators when receiving allocated evaluator in bridge, that increases the execution time in minutes. And the matching logic in this code is not used at all. 
-Some variables can be reused but they are computed for each evaluator especially cross bridge calls. When the number of evaluators reaches to a few thousands, the time spent is obvious.

After an evaluator is allocated, if YARN doesn't receive launch command within time out time, it will throw failed evaluator. With the current code, we can not even launch two thousand containers before timeout from .Net side.

This JIRA is to make improvement for allocated evaluators so that to increase the scalibility. 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)