You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Tae-Geon Um (JIRA)" <ji...@apache.org> on 2015/02/17 07:51:11 UTC

[jira] [Commented] (REEF-151) Fix build issues on Windows

    [ https://issues.apache.org/jira/browse/REEF-151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323736#comment-14323736 ] 

Tae-Geon Um commented on REEF-151:
----------------------------------

Here is the link of failure log files: https://builds.apache.org/job/Reef-pull-request-windows/ws/reef-tests/target/TEST_EvaluatorFailureTest-1424062467656/ 

Here is my analysis:
1) In this test, an Evaluator intentionally throws a RuntimeException and it fails. 

2) Evaluator sends two type of control messages to Driver. Before failure, the Evaluator sends an ActiveContext control message to the Driver. After failure, it sends an Exception message to the Driver.

3) In a successful test, the Driver first handles the ActiveContext message and it calls  *_DefaultContextActiveHandler.onNext_* method, which tries to send an evaluatorControlProto message to the Evaluator (In *_EvaluatorControlHandler.send_* method ) 

After that, the Driver handles the Exception and it sets the associated Evaluator's status to *_Failed_* status. (In *_EvaluatorManager.onEvaluatorException_* ) 

3-1)  However, in a failed test, the Driver first handles the Exception and after that it handles the ActiveContext message. This can make a problem because when the Driver handles the ActiveContext message, it tries to send a message to the Evaluator, which was already *_Failed_* by the Exception handler. Because the Evaluator is Failed status, it throws an *_IllegalStateException_* in *_EvaluatorControlHandler.send_* method. This is why we've got build failures. 

It looks like this is kind of race condition. What do you think [~markus.weimer] ? 

> Fix build issues on Windows
> ---------------------------
>
>                 Key: REEF-151
>                 URL: https://issues.apache.org/jira/browse/REEF-151
>             Project: REEF
>          Issue Type: Bug
>          Components: Build infrastructure
>            Reporter: Markus Weimer
>            Assignee: Sergiy Matusevych
>
> Our builds on the windows build server reliably fail. They pass on Linux and on our Windows development machines. Hence, the fault is probably with the build server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)