You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@yetus.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2018/12/21 19:47:00 UTC

[jira] [Updated] (YETUS-744) Report broken ASF nodes

     [ https://issues.apache.org/jira/browse/YETUS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer updated YETUS-744:
-----------------------------------
    Description: 
The ASF build infrastructure is barely monitored and most of the jobs are pretty terrible. This means it isn't unusual for things such as process slots to drop to zero and cause problems.  For example, it isn't unusual for the relatively tiny Yetus project jobs to fail.  But they fail in such a way that Yetus doesn't really report the problem correctly.  Digging into the coprocessors log will show:

{code}
/testptch/patchprocess/precommit/core.d/00-yetuslib.sh: fork: retry: No child processes
/testptch/patchprocess/precommit/core.d/00-yetuslib.sh: fork: retry: No child processes
/testptch/patchprocess/precommit/core.d/00-yetuslib.sh: fork: retry: No child processes
/testptch/patchprocess/precommit/core.d/00-yetuslib.sh: fork: retry: No child processes
/testptch/patchprocess/precommit/core.d/00-yetuslib.sh: fork: Resource temporarily unavailable
{code}

test-patch should: 
* specifically look for this condition 
* bail out early rather than trying to continue on
* report exactly which node is broken, especially if it can be done prior or after launching docker

  was:
The ASF build infrastructure is barely monitored and most of the jobs are pretty terrible. This means it isn't unusually for things such as process slots to drop to zero and cause problems.  For example, it isn't unusual for the relatively tiny Yetus project jobs to fail.  But they fail in such a way that Yetus doesn't really report the problem correctly.  Digging into the coprocessors log will show:

{code}
/testptch/patchprocess/precommit/core.d/00-yetuslib.sh: fork: retry: No child processes
/testptch/patchprocess/precommit/core.d/00-yetuslib.sh: fork: retry: No child processes
/testptch/patchprocess/precommit/core.d/00-yetuslib.sh: fork: retry: No child processes
/testptch/patchprocess/precommit/core.d/00-yetuslib.sh: fork: retry: No child processes
/testptch/patchprocess/precommit/core.d/00-yetuslib.sh: fork: Resource temporarily unavailable
{code}

test-patch should: 
* specifically look for this condition 
* bail out early rather than trying to continue on
* report exactly which node is broken, especially if it can be done prior or after launching docker


> Report broken ASF nodes
> -----------------------
>
>                 Key: YETUS-744
>                 URL: https://issues.apache.org/jira/browse/YETUS-744
>             Project: Yetus
>          Issue Type: New Feature
>            Reporter: Allen Wittenauer
>            Priority: Major
>
> The ASF build infrastructure is barely monitored and most of the jobs are pretty terrible. This means it isn't unusual for things such as process slots to drop to zero and cause problems.  For example, it isn't unusual for the relatively tiny Yetus project jobs to fail.  But they fail in such a way that Yetus doesn't really report the problem correctly.  Digging into the coprocessors log will show:
> {code}
> /testptch/patchprocess/precommit/core.d/00-yetuslib.sh: fork: retry: No child processes
> /testptch/patchprocess/precommit/core.d/00-yetuslib.sh: fork: retry: No child processes
> /testptch/patchprocess/precommit/core.d/00-yetuslib.sh: fork: retry: No child processes
> /testptch/patchprocess/precommit/core.d/00-yetuslib.sh: fork: retry: No child processes
> /testptch/patchprocess/precommit/core.d/00-yetuslib.sh: fork: Resource temporarily unavailable
> {code}
> test-patch should: 
> * specifically look for this condition 
> * bail out early rather than trying to continue on
> * report exactly which node is broken, especially if it can be done prior or after launching docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)