You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by Joe Witt <jo...@gmail.com> on 2021/02/01 21:41:56 UTC

Reliability of the builds on Github CI....feeling like Travis CI again...

Team,

For those who have been watching the builds on Github Actions they have
become substantially less stable.  Initially it was just Java 11 builds
that became problematic and that happened as soon as the latest azul jdk 11
became the default.  David H fixed the tests using the now different API
(different exceptions thrown).  Then we started seeing other tests failing
due to timeouts.  These issues and the brittleness are starting to remind
me a lot of what we faced with Travis CI as that infrastructure clearly
struggled.  I dont know what changed in Github CI recently but...it isn't
the same.

Now, all the said it is still our problem to figure out.  I'll relax the
timing of the tests again as during all this I tried to utilize more of the
core(s) available. That might well add to the timing issues.  But we must
also be far more careful on the reliability/repeatability/assumptions of
these tests.  They do an awful lot of socket creation, timing checks, etc..
that are probably better as integration tests.  We don't run integration
tests nearly as often though either.  We also need to really split out the
many nars from the core framework.  Build times are pretty brutal and the
convenience build is constantly at the limit that ASF will allow.  So we
have some strategery to sort through soon.

Thanks

Re: Reliability of the builds on Github CI....feeling like Travis CI again...

Posted by Joe Witt <jo...@gmail.com>.

Team

Thanks to a pointer from Otto Fowler it is clear that our challenges with
Github CI are not uncommon at the ASF these days.  There is a list of
projects struggling with it and we're not even on it - this means our usage
is so comparatively low to them that we're not even showing up in the
graphs where this is being watched.  Other projects are effectively
consuming so much of the build resources that we're being impacted.  Will
do further review and learnings here but ultimately this is likely to
encourage us further to soon separate extensions from the core framework.

Thanks

On Mon, Feb 1, 2021 at 2:41 PM Joe Witt <jo...@gmail.com> wrote:

> Team,
>
> For those who have been watching the builds on Github Actions they have
> become substantially less stable.  Initially it was just Java 11 builds
> that became problematic and that happened as soon as the latest azul jdk 11
> became the default.  David H fixed the tests using the now different API
> (different exceptions thrown).  Then we started seeing other tests failing
> due to timeouts.  These issues and the brittleness are starting to remind
> me a lot of what we faced with Travis CI as that infrastructure clearly
> struggled.  I dont know what changed in Github CI recently but...it isn't
> the same.
>
> Now, all the said it is still our problem to figure out.  I'll relax the
> timing of the tests again as during all this I tried to utilize more of the
> core(s) available. That might well add to the timing issues.  But we must
> also be far more careful on the reliability/repeatability/assumptions of
> these tests.  They do an awful lot of socket creation, timing checks, etc..
> that are probably better as integration tests.  We don't run integration
> tests nearly as often though either.  We also need to really split out the
> many nars from the core framework.  Build times are pretty brutal and the
> convenience build is constantly at the limit that ASF will allow.  So we
> have some strategery to sort through soon.
>
> Thanks
>