You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/11/10 06:20:31 UTC

[GitHub] [pulsar] lhotari edited a comment on pull request #8485: [CI] Refactor Github Workflows to reduce CI build times

lhotari edited a comment on pull request #8485:
URL: https://github.com/apache/pulsar/pull/8485#issuecomment-724474645


   I'm currently experimenting a solution where the build is split in to multiple phases
   
   1. license check, build Pulsar artifacts
   2. run unit tests
   3. build docker images
   4. run integration tests
   
   Each phase is a "job" in Github Workflow. The the unit test and integration test jobs have parallel sub-jobs by using the matrix feature of Github Flows.
   
   The challenge is the large size of Pulsar artifacts. Currently the ~/.m2/repository/org/apache/pulsar files installed with "mvn install" are about 2.5 GB in size.
   Break down of directory sizes in MB:
   https://gist.github.com/lhotari/3da3b220edd5684e54a005f358f3d045
   
   The large size of artifacts seems to be caused by shaded and bundled dependencies. 
   The bundled dependencies seems to be the pulsar-io modules built with [nifi-nar-maven-plugin](https://github.com/apache/nifi-maven). This results in the excessive IO during builds.
   
   The solution seems to be to create yet another maven profile that is for building just the essentials for running unit tests. Unit tests should be able to run without building the shaded jars, the distribution or the nar modules with the embedded dependencies.
   
   Perhaps there's also a way to share the dependencies across the Pulsar IO nar modules. It seems like a waste to duplicate most of the same dependencies in each nar file.
   
   ---
   
   The size of the core-modules maven profile is already much lower than the total size of the Pulsar artifacts. 
   `mvn install -Pcore-modules -DskipTests` produces about 377MB in ~/.m2/repository/org/apache/pulsar:
   https://gist.github.com/lhotari/b6f51edc935787b530055a20bc685394
   This could be reduced by 269MB, from 377MB to 108MB by removing the "distribution/server" module out of core-modules profile.
   
   Since the artifact cache of Github actions is 5GB in total for a repository, it would help a lot in being able to use the artifact cache for sharing the artifacts between the 1. phase and the later unit test and integration test phases of the build.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org