You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by Tae-Geon Um <ta...@gmail.com> on 2017/03/06 14:23:25 UTC

Handling CI failures on .NET side

Hi all, 
 
For 0.16 release, we should handle the blocking issues:  transient test failures on Java and .NET side. 
However, no one is assigned to the .NET side issues. 

We have four remaining issues on .NET side (https://issues.apache.org/jira/browse/REEF-1462)
- [REEF-1462] Fix TestEvaluatorWithActiveContextImmediatePoison failures with unattached context <https://issues.apache.org/jira/browse/REEF-1406> 
- [REEF-1473] Fix CanRunClrBridgeExampleOnLocalRuntime failures in AppVeyor <https://issues.apache.org/jira/browse/REEF-1473>
- [REEF-1622] Fix TestKMeansOnLocalRuntimeWithGroupCommunications failures in AppVeyor <https://issues.apache.org/jira/browse/REEF-1622>
- [REEF-1723] Fix TestFailMapperEvaluatorOnWaitingForEvaluatorAndExecution failures in AppVeyor <https://issues.apache.org/jira/browse/REEF-1723>

I would like to help in resolving these issues, but I’m not good at .NET side, so maybe it will take a long time to examine the codes. 
I think it would be good for .NET experts or someone who wrote the test codes (or related codes) to take a look at these issues.

Are there any volunteers who can handle the .NET side issues? 

Thanks!
Taegeon

Re: Handling CI failures on .NET side

Posted by Taegeon Um <ta...@gmail.com>.
Thanks Julia!

I agree the transient failure is really hard to debug and finding the root
cause is challenging.
However, for 0.16 release, we need to take an interest in these issues.

I'm really happy to hear that you are going to look at them again.

Thanks,
Taegeon

2017-03-07 3:39 GMT+09:00 Julia Wang (QIUHE) <
Qiuhe.Wang@microsoft.com.invalid>:

> Hi Taegeon,
>
> Thanks for the follow up! It is true you should not spend too much to time
> get familiar to the code base for resolving those test issues.
>
> I am very familiar to the code around this area and know all those test
> cases. The issue is none of the test failures can be reproduced locally.
> They only fail in AppVeyor sometimes, but not always. It looks like timing
> related, but is hard to debug. That's the challenging part. I believe we
> looked at those test failures before, but none of them is straightforward.
>
> When I get time, I will look at it again.
>
> Thanks,
> Julia
>
> -----Original Message-----
> From: Tae-Geon Um [mailto:taegeonum@gmail.com]
> Sent: Monday, March 6, 2017 6:23 AM
> To: dev@reef.apache.org
> Subject: Handling CI failures on .NET side
>
> Hi all,
>
> For 0.16 release, we should handle the blocking issues:  transient test
> failures on Java and .NET side.
> However, no one is assigned to the .NET side issues.
>
> We have four remaining issues on .NET side (https://na01.safelinks.
> protection.outlook.com/?url=https%3A%2F%2Fissues.apache.
> org%2Fjira%2Fbrowse%2FREEF-1462&data=02%7C01%7CQiuhe.Wang%40microsoft.com%
> 7C821eda16d9e84a19dedc08d4649c60a6%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C0%7C636244070183556982&sdata=FXN2zCrwL0b1u08IKFQL3oqurT%
> 2F4fz29yImXcL5BSWQ%3D&reserved=0)
> - [REEF-1462] Fix TestEvaluatorWithActiveContextImmediatePoison failures
> with unattached context <https://na01.safelinks.
> protection.outlook.com/?url=https%3A%2F%2Fissues.apache.
> org%2Fjira%2Fbrowse%2FREEF-1406&data=02%7C01%7CQiuhe.Wang%40microsoft.com%
> 7C821eda16d9e84a19dedc08d4649c60a6%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C0%7C636244070183556982&sdata=KrqQJwdjzp7i1RFBnO6qN0SRayvXye
> CXPRXfuGXH7LU%3D&reserved=0>
> - [REEF-1473] Fix CanRunClrBridgeExampleOnLocalRuntime failures in
> AppVeyor <https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FREEF-
> 1473&data=02%7C01%7CQiuhe.Wang%40microsoft.com%
> 7C821eda16d9e84a19dedc08d4649c60a6%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C0%7C636244070183566990&sdata=QukWv73dUgpHwrf89VBkA99TdfbpWq
> EkLirB%2BFWf72E%3D&reserved=0>
> - [REEF-1622] Fix TestKMeansOnLocalRuntimeWithGroupCommunications
> failures in AppVeyor <https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FREEF-
> 1622&data=02%7C01%7CQiuhe.Wang%40microsoft.com%
> 7C821eda16d9e84a19dedc08d4649c60a6%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C0%7C636244070183566990&sdata=aNCqZli%2BXQDaMExpWNFaOJOM%
> 2FJehH97k9XdVOD3vWmQ%3D&reserved=0>
> - [REEF-1723] Fix TestFailMapperEvaluatorOnWaitingForEvaluatorAndExecution
> failures in AppVeyor <https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FREEF-
> 1723&data=02%7C01%7CQiuhe.Wang%40microsoft.com%
> 7C821eda16d9e84a19dedc08d4649c60a6%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C0%7C636244070183566990&sdata=gdeAkbaf3tnVBqIKSNppuzQlCU1Ll9
> FRgyKjikM2O14%3D&reserved=0>
>
> I would like to help in resolving these issues, but I’m not good at .NET
> side, so maybe it will take a long time to examine the codes.
> I think it would be good for .NET experts or someone who wrote the test
> codes (or related codes) to take a look at these issues.
>
> Are there any volunteers who can handle the .NET side issues?
>
> Thanks!
> Taegeon
>

RE: Handling CI failures on .NET side

Posted by "Julia Wang (QIUHE)" <Qi...@microsoft.com.INVALID>.
Hi Taegeon, 

Thanks for the follow up! It is true you should not spend too much to time get familiar to the code base for resolving those test issues. 

I am very familiar to the code around this area and know all those test cases. The issue is none of the test failures can be reproduced locally. They only fail in AppVeyor sometimes, but not always. It looks like timing related, but is hard to debug. That's the challenging part. I believe we looked at those test failures before, but none of them is straightforward. 

When I get time, I will look at it again. 

Thanks,
Julia

-----Original Message-----
From: Tae-Geon Um [mailto:taegeonum@gmail.com] 
Sent: Monday, March 6, 2017 6:23 AM
To: dev@reef.apache.org
Subject: Handling CI failures on .NET side

Hi all, 
 
For 0.16 release, we should handle the blocking issues:  transient test failures on Java and .NET side. 
However, no one is assigned to the .NET side issues. 

We have four remaining issues on .NET side (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FREEF-1462&data=02%7C01%7CQiuhe.Wang%40microsoft.com%7C821eda16d9e84a19dedc08d4649c60a6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636244070183556982&sdata=FXN2zCrwL0b1u08IKFQL3oqurT%2F4fz29yImXcL5BSWQ%3D&reserved=0)
- [REEF-1462] Fix TestEvaluatorWithActiveContextImmediatePoison failures with unattached context <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FREEF-1406&data=02%7C01%7CQiuhe.Wang%40microsoft.com%7C821eda16d9e84a19dedc08d4649c60a6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636244070183556982&sdata=KrqQJwdjzp7i1RFBnO6qN0SRayvXyeCXPRXfuGXH7LU%3D&reserved=0> 
- [REEF-1473] Fix CanRunClrBridgeExampleOnLocalRuntime failures in AppVeyor <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FREEF-1473&data=02%7C01%7CQiuhe.Wang%40microsoft.com%7C821eda16d9e84a19dedc08d4649c60a6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636244070183566990&sdata=QukWv73dUgpHwrf89VBkA99TdfbpWqEkLirB%2BFWf72E%3D&reserved=0>
- [REEF-1622] Fix TestKMeansOnLocalRuntimeWithGroupCommunications failures in AppVeyor <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FREEF-1622&data=02%7C01%7CQiuhe.Wang%40microsoft.com%7C821eda16d9e84a19dedc08d4649c60a6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636244070183566990&sdata=aNCqZli%2BXQDaMExpWNFaOJOM%2FJehH97k9XdVOD3vWmQ%3D&reserved=0>
- [REEF-1723] Fix TestFailMapperEvaluatorOnWaitingForEvaluatorAndExecution failures in AppVeyor <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FREEF-1723&data=02%7C01%7CQiuhe.Wang%40microsoft.com%7C821eda16d9e84a19dedc08d4649c60a6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636244070183566990&sdata=gdeAkbaf3tnVBqIKSNppuzQlCU1Ll9FRgyKjikM2O14%3D&reserved=0>

I would like to help in resolving these issues, but I’m not good at .NET side, so maybe it will take a long time to examine the codes. 
I think it would be good for .NET experts or someone who wrote the test codes (or related codes) to take a look at these issues.

Are there any volunteers who can handle the .NET side issues? 

Thanks!
Taegeon