You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Markus Weimer (JIRA)" <ji...@apache.org> on 2017/10/17 22:42:00 UTC

[jira] [Resolved] (REEF-1895) REEF Bridge performance improvement for allocated evaluators

     [ https://issues.apache.org/jira/browse/REEF-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Weimer resolved REEF-1895.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 0.17

Resolved via [#1385|https://github.com/apache/reef/pull/1385]

> REEF Bridge performance improvement for allocated evaluators
> ------------------------------------------------------------
>
>                 Key: REEF-1895
>                 URL: https://issues.apache.org/jira/browse/REEF-1895
>             Project: REEF
>          Issue Type: Improvement
>          Components: REEF, REEF Bridge
>            Reporter: Julia
>            Assignee: Julia
>             Fix For: 0.17
>
>
> Recent scale tests show there are a few places in the REEF code, mainly in bridge code that seriously impact the REEF performance and scalability. Notably:
> -Syncronized(this) in BridgeDriver in event handlers, especially Allocated Evaluator handlers. That make the events are handled in sequence. When requesting a few thousands evaluators, the slowness is dramatic. 
> -A lock on Evaluators when receiving allocated evaluator in bridge, that increases the execution time in minutes level. And the matching logic in this code is not used at all. 
> -Some variables can be reused but they are computed for each evaluator especially cross bridge calls. When the number of evaluators reaches to a few thousands, the time spent is obvious.
> After an evaluator is allocated, if YARN doesn't receive launch command within time out time, it will throw failed evaluator. With the current code, we can not even launch two thousand containers before timeout from .Net side.
> This JIRA is to make improvement for allocated evaluators so that to increase the scalability. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)