You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Josh Rosen <ro...@gmail.com> on 2015/01/01 02:07:56 UTC

Today's Jenkins failures in the Spark Maven builds

If you've been following AMPLab Jenkins today, you'll notice that there's
been a huge number of Spark test failures in the maintenance branches and
Maven builds.

My best guess as to what's causing this is that I pushed a backport to all
maintenance branches at a moment where Jenkins was otherwise idle, causing
many builds to kick off at almost the exact same time and eventually fail
due to port contention issues in SparkSubmit tests (we didn't disable the
web UI when making external calls to ./spark-submit).  I pushed a hotfix to
address this.  When the first wave of Jenkins builds failed, the next wave
kicked off more-or-less in lockstep since there's only ever one active
build for the master builds and the problem was hit again, this time
failing a DriverSuite test (which has a port contention problem that needs
a separate fix; I'll hotfix this soon).

I believe that this flakiness is due to the lockstep synchronization of the
first wave of builds (e.g. a bunch of builds that ran DriverSuite and
SparkSubmitSuite within a minute or two of each other), and not changes in
recent patches.  If the problem persists after further web UI disabling
hotfixes, then I'll investigate the recent changes in more detail.

Thanks,
Josh