You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Josh Rosen <jo...@databricks.com> on 2015/12/30 21:52:29 UTC

New processes / tools for changing dependencies in Spark

I just merged https://github.com/apache/spark/pull/10461, a PR that adds
new automated tooling to help us reason about dependency changes in Spark.
Here's a summary of the changes:

   - The dev/run-tests script (used in the SBT Jenkins builds and for
   testing Spark pull requests) now generates a file which contains Spark's
   resolved runtime classpath for each Hadoop profile, then compares that file
   to a copy which is checked into the repository. These dependency lists are
   found at https://github.com/apache/spark/tree/master/dev/deps; there is
   a separate list for each Hadoop profile.

   - If a pull request changes dependencies without updating these manifest
   files, our test script will fail the build
   <https://github.com/apache/spark/pull/10461#issuecomment-168066328> and
   the build console output will list the dependency diff
   <https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48505/consoleFull>
   .

   - If you are intentionally changing dependencies, run
./dev/test-dependencies.sh
   --replace-manifest to re-generate these dependency manifests then commit
   the changed files and include them with your pull request.

The goal of this change is to make it simpler to reason about build
changes: it should now be much easier to verify whether dependency
exclusions worked properly or determine whether transitive dependencies
changed in a way that affects the final classpath.

Let me know if you have any questions about this change and, as always,
feel free to submit pull requests if you would like to make any
enhancements to this script.

Thanks,
Josh