You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/11/12 18:21:58 UTC

[GitHub] [pulsar] lhotari opened a new pull request #8485: [CI] Refactor Github Workflows to reduce CI build times

lhotari opened a new pull request #8485:
URL: https://github.com/apache/pulsar/pull/8485


   ### Motivation
   
   Reduce CI build times and improve observability of the build by using [Gradle Enterprise Maven Extension](https://docs.gradle.com/enterprise/maven-extension).
   
   ### Modifications
   
   - refactor workflows to reduce duplication
     - replace duplicated workflows with matrix jobs
     - move logic for clean-disk and tune-os to
       composite Github Actions that are stored in .github/actions
       directory
   - enable Gradle Build scans for Maven, https://scans.gradle.com/#maven
     - increases visibility to build
     - example of a Build scan for a Pulsar build: https://scans.gradle.com/s/rd4arezx5qsuk/tests
   - build Pulsar artifacts once and share .m2/repository/org/apache/pulsar
     across the remaining build jobs
   - build Pulsar Docker images once and share across the
     integration tests


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari closed pull request #8485: [CI] Refactor Github Workflows to reduce CI build times

Posted by GitBox <gi...@apache.org>.
lhotari closed pull request #8485:
URL: https://github.com/apache/pulsar/pull/8485


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #8485: [CI] Refactor Github Workflows to reduce CI build times

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #8485:
URL: https://github.com/apache/pulsar/pull/8485#issuecomment-724474645


   I'm currently experimenting a solution where the build is split in to multiple phases
   
   1. license check, build Pulsar artifacts
   2. run unit tests
   3. build docker images
   4. run integration tests
   
   Each phase is a "job" in Github Workflow. The the unit test and integration test jobs have parallel sub-jobs by using the matrix feature of Github Flows.
   
   The challenge is the large size of Pulsar artifacts. Currently the ~/.m2/repository/org/apache/pulsar files installed with "mvn install" are about 2.5 GB in size.
   Break down of directory sizes in MB:
   https://gist.github.com/lhotari/3da3b220edd5684e54a005f358f3d045
   
   The large size of artifacts seems to be caused by shaded and bundled dependencies. 
   The bundled dependencies seems to be the pulsar-io modules built with [nifi-nar-maven-plugin](https://github.com/apache/nifi-maven). This results in the excessive IO during builds.
   
   The solution seems to be to create yet another maven profile that is for building just the essentials for running unit tests. Unit tests should be able to run without building the shaded jars, the distribution or the nar modules with the embedded dependencies.
   
   Perhaps there's also a way to share the dependencies across the Pulsar IO nar modules. It seems like a waste to duplicate most of the same dependencies in each nar file.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #8485: [CI] Refactor Github Workflows to reduce CI build times

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #8485:
URL: https://github.com/apache/pulsar/pull/8485#issuecomment-741778710


   I'm closing this PR and possibly creating a new one for the Github Workflows refactoring once I have it resolved. I'll be running the build in my own fork until I have resolved the open issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #8485: [CI] Enable "Gradle Enterprise for Maven" for CI builds

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #8485:
URL: https://github.com/apache/pulsar/pull/8485#issuecomment-723970551


   unfortunately, "Gradle Enterprise for Maven" doesn't allow using a remote build cache server without a proper licensed Gradle Enterprise server. explained at https://stackoverflow.com/a/58644090


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari closed pull request #8485: [CI] Refactor Github Workflows to reduce CI build times

Posted by GitBox <gi...@apache.org>.
lhotari closed pull request #8485:
URL: https://github.com/apache/pulsar/pull/8485


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari edited a comment on pull request #8485: [CI] Refactor Github Workflows to reduce CI build times

Posted by GitBox <gi...@apache.org>.
lhotari edited a comment on pull request #8485:
URL: https://github.com/apache/pulsar/pull/8485#issuecomment-724474645


   I'm currently experimenting a solution where the build is split in to multiple phases
   
   1. license check, build Pulsar artifacts
   2. run unit tests
   3. build docker images
   4. run integration tests
   
   Each phase is a "job" in Github Workflow. The the unit test and integration test jobs have parallel sub-jobs by using the matrix feature of Github Flows.
   
   The challenge is the large size of Pulsar artifacts. Currently the ~/.m2/repository/org/apache/pulsar files installed with "mvn install" are about 2.5 GB in size.
   Break down of directory sizes in MB:
   https://gist.github.com/lhotari/3da3b220edd5684e54a005f358f3d045
   
   The large size of artifacts seems to be caused by shaded and bundled dependencies. 
   The bundled dependencies seems to be the pulsar-io modules built with [nifi-nar-maven-plugin](https://github.com/apache/nifi-maven). This results in the excessive IO during builds.
   
   The solution seems to be to create yet another maven profile that is for building just the essentials for running unit tests. Unit tests should be able to run without building the shaded jars, the distribution or the nar modules with the embedded dependencies.
   
   Perhaps there's also a way to share the dependencies across the Pulsar IO nar modules. It seems like a waste to duplicate most of the same dependencies in each nar file.
   
   ---
   
   The size of the core-modules maven profile is already much lower than the total size of the Pulsar artifacts. 
   `mvn install -Pcore-modules -DskipTests` produces about 377MB in ~/.m2/repository/org/apache/pulsar:
   https://gist.github.com/lhotari/b6f51edc935787b530055a20bc685394
   This could be reduced by 269MB, from 377MB to 108MB by removing the "distribution/server" module out of core-modules profile.
   
   Since the artifact cache of Github actions is 5GB in total for a repository, it would help a lot in being able to use the artifact cache for sharing the artifacts between the 1. phase and the later unit test and integration test phases of the build.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org