You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Stephen Mallette <sp...@gmail.com> on 2016/10/28 18:43:02 UTC

[DISCUSS] Speeding up the build - spark-gremlin

I've done a few runs of our coverage reports this morning with different
configurations trying to analyze the effect spark-gremlin has on our test
coverage. Why spark-gremlin? Because it takes about 50% of our build time
right now and, as such, seemed like a good place to start thinking about
how to make improvements to test performance. I didn't not run "integration
tests" in any of my analysis, as I'm mostly interested in figuring out how
to make "unit tests" faster without losing basic coverage there.

Some things I noted:

+ spark-gremlin unit tests offer good coverage of spark-gremlin itself at
about 72% line coverage and 76% branch coverage
+ the tests on spark-gremlin only add about 3% more branch coverage and 2%
more line coverage to gremlin-core. I'm not sure what I expected to find,
but that seems pretty small to me.
+ Removing the groovy or java "process suite" makes almost no difference in
coverage - less than 1%
+ The longest run tests are the spark "io" tests,
SparkGraphComputerProcessTest and SparkGraphComputerGroovyProcessTest.

Marko, I think you would be best to speak to the implications of this
suggestion, but if we make these longer run tests in gremlin-spark into
integration tests we cut the speed of the build by roughly 40%. If we do
that we sacrifice the minimal coverage we get in gremlin-core and our
overall coverage of spark-gremlin drops by only 3% on lines of code and
about 7% on branch coverage, so there is not a lot of loss there.

If a change goes into gremlin-core, then it would be smart to run: "mvn
clean install && mvn verify -pl gremlin-spark
-DskipIntegrationTests=false". we would look to catch that in pull requests
during code review.

If we go with this approach, I could do some further analysis on
spark-gremlin and determine if some new tests could be added to gain back
the lost coverage via unit tests (that I presume will require mocking) so
that we can get those few percentage points back (and maybe more).

Anyway, I'll assume lazy consensus on this in 72 hours (Monday, October 31,
2016, 2:30pm) and move forward if there are no objections.

Re: [DISCUSS] Speeding up the build - spark-gremlin

Posted by Stephen Mallette <sp...@gmail.com>.
Just wanted to point out that this change has been implemented and simple
"mvn clean install" runs are much faster for it. I'll keep looking for
additional changes we can make, but the build time is semi-reasonably now
on my system. It would be nice to be solidly in that 10-15 minute spot
without appreciable loss of coverage, so I'll see what else might be done
to get us there.

On Fri, Oct 28, 2016 at 2:43 PM, Stephen Mallette <sp...@gmail.com>
wrote:

> I've done a few runs of our coverage reports this morning with different
> configurations trying to analyze the effect spark-gremlin has on our test
> coverage. Why spark-gremlin? Because it takes about 50% of our build time
> right now and, as such, seemed like a good place to start thinking about
> how to make improvements to test performance. I didn't not run "integration
> tests" in any of my analysis, as I'm mostly interested in figuring out how
> to make "unit tests" faster without losing basic coverage there.
>
> Some things I noted:
>
> + spark-gremlin unit tests offer good coverage of spark-gremlin itself at
> about 72% line coverage and 76% branch coverage
> + the tests on spark-gremlin only add about 3% more branch coverage and 2%
> more line coverage to gremlin-core. I'm not sure what I expected to find,
> but that seems pretty small to me.
> + Removing the groovy or java "process suite" makes almost no difference
> in coverage - less than 1%
> + The longest run tests are the spark "io" tests,
> SparkGraphComputerProcessTest and SparkGraphComputerGroovyProcessTest.
>
> Marko, I think you would be best to speak to the implications of this
> suggestion, but if we make these longer run tests in gremlin-spark into
> integration tests we cut the speed of the build by roughly 40%. If we do
> that we sacrifice the minimal coverage we get in gremlin-core and our
> overall coverage of spark-gremlin drops by only 3% on lines of code and
> about 7% on branch coverage, so there is not a lot of loss there.
>
> If a change goes into gremlin-core, then it would be smart to run: "mvn
> clean install && mvn verify -pl gremlin-spark
> -DskipIntegrationTests=false". we would look to catch that in pull requests
> during code review.
>
> If we go with this approach, I could do some further analysis on
> spark-gremlin and determine if some new tests could be added to gain back
> the lost coverage via unit tests (that I presume will require mocking) so
> that we can get those few percentage points back (and maybe more).
>
> Anyway, I'll assume lazy consensus on this in 72 hours (Monday, October
> 31, 2016, 2:30pm) and move forward if there are no objections.
>
>
>
>
>
>