You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Tom Burton-West <tb...@umich.edu> on 2013/08/17 00:20:16 UTC

Luceneutil high variability between runs

Hello,

I'm trying to benchmark a change to BM25Similarity (LUCENE-5175 )using
luceneutil

I'm running this on a lightly loaded machine with a load average (top) of
about 0.01 when the benchmark is not running.

I made the following changes:
1) localrun.py changed Competition(debug=True) to Competition(debug=False)
2) made the following changes to localconstants.py per Robert Muir's
suggestion:
JAVA_COMMAND = 'java -server -Xms4g -Xmx4g'
SEARCH_NUM_THREADS = 1
3) for the BM25 tests set SIMILARITY_DEFAULT='BM25Similarity'
4) for the BM25 tests uncommened   the following line from searchBench.py
#verifyScores = False

Attached is output from iter 19 of several runs

The first 4 runs show consistently that the modified version is somewhere
between 6% and 8% slower on the tasks with the highest difference between
trunk and patch.
However if you look at the baseline TaskQPS, for HighTerm, for example,
 run 3 is about 55 and run 1 is about 88.  So the difference for this task
 between different runs of the bench program is very much higher than the
differences between trunk and modified/patch within a run.

Is this to be expected?   Is there a reason I should believe  the
differences shown within a run reflect the true differences?

Seeing this variability, I then switched DEFAULT_SIMILARITY back to
"DefaultSimilarity".  In this case trunk and my_modified, should be
exercising exactly the same code, since the only changes in the patch are
the addition of a test case for BM25Similarity and a change to
BM25Similarity.

In this case the "modified" version varies from -6.2% difference from the
base to +4.4% difference from the base for LowTerm.
Comparing  QPS for the base case for HighTerm between different runs we can
see it varies from about 21 for run 1 to 76 for run 3.

Is this kind of  variation between runs of the benchmark to be expected?

Any suggestions about where to look to reduce the variations between runs?

Tom

Re: Luceneutil high variability between runs

Posted by Robert Muir <rc...@gmail.com>.
I think the raw values don't matter so much because there is some
randomization involved? And the same random seed is used...

Your DefaultSimilarityRuns look pretty stable. its between 0.0% and
1.5% variation which is about as good as it gets for HighTerm....

LowTerm i am guessing is always noisy because they are so fast. a few
of these measures at least are, i know particularly IntNRQ :)

On Fri, Aug 16, 2013 at 6:20 PM, Tom Burton-West <tb...@umich.edu> wrote:
> Hello,
>
> I'm trying to benchmark a change to BM25Similarity (LUCENE-5175 )using
> luceneutil
>
> I'm running this on a lightly loaded machine with a load average (top) of
> about 0.01 when the benchmark is not running.
>
> I made the following changes:
> 1) localrun.py changed Competition(debug=True) to Competition(debug=False)
> 2) made the following changes to localconstants.py per Robert Muir's
> suggestion:
> JAVA_COMMAND = 'java -server -Xms4g -Xmx4g'
> SEARCH_NUM_THREADS = 1
> 3) for the BM25 tests set SIMILARITY_DEFAULT='BM25Similarity'
> 4) for the BM25 tests uncommened   the following line from searchBench.py
> #verifyScores = False
>
> Attached is output from iter 19 of several runs
>
> The first 4 runs show consistently that the modified version is somewhere
> between 6% and 8% slower on the tasks with the highest difference between
> trunk and patch.
> However if you look at the baseline TaskQPS, for HighTerm, for example,  run
> 3 is about 55 and run 1 is about 88.  So the difference for this task
> between different runs of the bench program is very much higher than the
> differences between trunk and modified/patch within a run.
>
> Is this to be expected?   Is there a reason I should believe  the
> differences shown within a run reflect the true differences?
>
> Seeing this variability, I then switched DEFAULT_SIMILARITY back to
> "DefaultSimilarity".  In this case trunk and my_modified, should be
> exercising exactly the same code, since the only changes in the patch are
> the addition of a test case for BM25Similarity and a change to
> BM25Similarity.
>
> In this case the "modified" version varies from -6.2% difference from the
> base to +4.4% difference from the base for LowTerm.
> Comparing  QPS for the base case for HighTerm between different runs we can
> see it varies from about 21 for run 1 to 76 for run 3.
>
> Is this kind of  variation between runs of the benchmark to be expected?
>
> Any suggestions about where to look to reduce the variations between runs?
>
> Tom
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org