You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/08/31 12:36:00 UTC

[jira] [Commented] (FLINK-8819) Rework travis script to use build stages

    [ https://issues.apache.org/jira/browse/FLINK-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598680#comment-16598680 ] 

ASF GitHub Bot commented on FLINK-8819:
---------------------------------------

zentol opened a new pull request #6642: [FLINK-8819][travis] Rework travis script to use stages
URL: https://github.com/apache/flink/pull/6642
 
 
   ## What is the purpose of the change
   
   This PR reworks the travis scripts to use stages. Stages allow jobs to be organized in sequential steps, in contrast to the current approach of all jobs running in parallel. This allows jobs to depend on each other, with the obvious use-case of separating code compilation and test execution.
   A subsequent stage is only executed if the previous stage has completed successfully, in that all builds in the stage have completed successfully. In other words, if checkstyle fails, no tests are executed, so be mindful of that.
   
   The benefit here really is that we no longer compile (parts of) Flink in each profile, and move part of the compilation overhead into a separate profile. We don't decrease the total runtime due to added overhead (upload/download of cache), but the individual builds are faster, and more manageable in the long-term.
   
   An example build can be seen here: https://travis-ci.org/zentol/flink/builds/422925766
   
   ## High-level overview
   
   The new scripts define 3 stages: Compile, Test and Cleanup.
   
   In the compile stage we compile Flink and run QA checks like checkstyle. The compiled Flink project is placed into the travis cache to make it accessible to subsequent builds.
   
   The test stage consists of 5 jobs based on our existing test splitting (core, libs, connectors, tests, misc). These builds retrieve the compiled Flink version from the cache, install it into the local repository and subsequently run the tests.
   
   The cleanup jobs deletes the compiled Flink artifact from the cache. This step isn't exactly necessary, but still nice to have.
   
   Some additional small refactorings have been made to separate `travis_mvn_watchdog.sh` into individual parts, which we can build on in the future.
   
   ## Low-level details
   
   ### Caching
   
   The downside of stages is there is no easy-to-use way to pass on build artifacts. The caching approach _works_ but has the caveat that builds have to share the same cache. The travis cache is only shared between builds if the build configurations are identical; most notably they can't call different scripts nor have different environment variables.
   
   As a workaround we map the `TRAVIS_JOB_NUMBER` to a specific stage. (If you look at the build linked in the PR, `4583.1` would be the value I'm talking about). The order of jobs is deterministic, so for example we always know that `1-2` belong to the compile stage, with `2` always being configured for the legacy codebase.
   
   ### travis_controller
   All stage-related logic is handled by the `travis_controller` script.
   In short:
   * it determines where we are in the build process based on `TRAVIS_JOB_NUMBER`
   * if in compile step
     * remove existing cached flink versions (fail-safe cleanup to prevent cache from growing larger over time)
     * compile Flink and do QA checks (shading, dependency convergence, checkstyle etc.)
     * copy flink to cache location
     * drop unnecessary files (like original jars) from compiled version
   * if in test step
     * fetch flink from cache
     * update all timestamps to prevent compiler plugins from recompiling classes
     * execute `travis_mvn_watchdog.sh`
   * if in cleanup step
     * well, cleanup stuff
   
   ### travis_mvn_watchdog
   
   Despite the above changes `travis_mvn_watchdog.sh` works pretty much like it did before. It first `install`s Flink (except now without `clean` as this would remove already compiled classes) and then runs `mvn verify`.
   This has the downside that we still package jars twice, which actually takes a while. We could skip this in theory by directly invoking the `surefire` plugin, but various issue in our build/tests prevent this from working at the moment. And I don't want to delay this change further.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Rework travis script to use build stages
> ----------------------------------------
>
>                 Key: FLINK-8819
>                 URL: https://issues.apache.org/jira/browse/FLINK-8819
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Build System, Travis
>            Reporter: Chesnay Schepler
>            Assignee: Chesnay Schepler
>            Priority: Trivial
>              Labels: pull-request-available
>
> This issue is for tracking efforts to rework our Travis scripts to use [stages|https://docs.travis-ci.com/user/build-stages/].
> This feature allows us to define a sequence of jobs that are run one after another. This implies that we can define dependencies between jobs, in contrast to our existing jobs that have to be self-contained.
> As an example, we could have a compile stage, and a test stage with multiple jobs.
> The main benefit here is that we no longer have to compile modules multiple times, which would reduce our build times.
> The major issue here however is that there is no _proper_ support for passing build-artifacts from one stage to the next. According to this [issue|https://github.com/travis-ci/beta-features/issues/28] it is on their to-do-list however.
> In the mean-time we could manually transfer the artifacts between stages by either using the Travis cache or some other external storage. The cache solution would work by setting up a cached directory (just like the mvn cache) and creating build-scope directories within containing the artifacts (I have a prototype that works like this).
> The major concern here is that of cleaning up the cache/storage.
>  We can clean things up if
>  * our script fails
>  * the last stage succeeds.
> We can *not* clean things up if
>  * the build is canceled
>  * travis fails the build due to a timeout or similar
> as apparently there is [no way to run a script at the end of a build|https://github.com/travis-ci/travis-ci/issues/4221].
> Thus we would either have to periodically clear the cache, or encode more information into the cached files that would allow _other_ builds to clean up stale date. (For example the build number or date).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)