You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Dawid Weiss <da...@gmail.com> on 2012/04/20 00:45:58 UTC

Long jenkins builds with clover/ code cache warning.

So, for the past three or so hours I've been looking at this:

https://builds.apache.org/job/Lucene-Trunk/1899/consoleText

You'll notice that certain tests in there have super-long execution
times, like this one:

   [junit4] Suite: org.apache.lucene.search.TestNumericRangeQuery64
   [junit4] Completed in 4251.28s, 34 tests

This test, if run separately, passes in under 60 seconds on my machine
(running with clover, the same seed so identical startup parameters
etc.).

I'll spare you the details but what seems to be happening is that at
some point the code cache is saturated and the JVM emits this:

   [junit4] OpenJDK 64-Bit Server VM warning: CodeCache is full.
Compiler has been disabled.
   [junit4] OpenJDK 64-Bit Server VM warning: Try increasing the code
cache size using -XX:ReservedCodeCacheSize=

Note the "Compiler has been disabled." bit. I believe this effectively
means the JVM falls back to interpreted mode (!) and this creates a
situation in which:

1) there is a shitty random pick of components for the test (see
attached log dump),
2) the code is instrumented with clover, which adds an extra layer of
super-overhead method calls and complexity,
3) the JVM falls back to interpreted mode.

TestNumericRangeQuery64 is the first test when this hits, but every
test from that moment on is just sloooow. Like this one:

  [junit4] Suite: org.apache.lucene.index.TestDocTermOrds
  [junit4] Completed in 91.39s, 3 tests

ran separately (same seed, clover):

   [junit4] Suite: org.apache.lucene.index.TestDocTermOrds
   [junit4] Completed in 2.30s, 3 tests

I don't think there are simple solutions to this problem other than
setting a larger code cache (which I did, we will see).

Dawid

Re: Long jenkins builds with clover/ code cache warning.

Posted by Dawid Weiss <da...@gmail.com>.

Just wanted to confirm my above suspicion about the code cache --
those long build times are fully repeatable on my machine (including
the time when code cache explodes). Timing from my machine for that
long test (when run in complete mode):

Suite: org.apache.lucene.search.TestNumericRangeQuery64
Completed in 3305.33s, 34 tests

The most recent build passed (with increased code cache threshold).

Dawid

On Fri, Apr 20, 2012 at 1:05 AM, Dawid Weiss <da...@gmail.com> wrote:
> Oh, I just digested a spoonful of hotspot source code. It seems there
> is a magic option to attempt to clean the code cache before shutting
> down the compiler...
>
>  product(bool, UseCodeCacheFlushing, false,                                \
>          "Attempt to clean the code cache before shutting off compiler")   \
>
> Unfortunately it's false by default...
>
> Dawid
>
> On Fri, Apr 20, 2012 at 12:45 AM, Dawid Weiss <da...@gmail.com> wrote:
>> So, for the past three or so hours I've been looking at this:
>>
>> https://builds.apache.org/job/Lucene-Trunk/1899/consoleText
>>
>> You'll notice that certain tests in there have super-long execution
>> times, like this one:
>>
>>   [junit4] Suite: org.apache.lucene.search.TestNumericRangeQuery64
>>   [junit4] Completed in 4251.28s, 34 tests
>>
>> This test, if run separately, passes in under 60 seconds on my machine
>> (running with clover, the same seed so identical startup parameters
>> etc.).
>>
>> I'll spare you the details but what seems to be happening is that at
>> some point the code cache is saturated and the JVM emits this:
>>
>>   [junit4] OpenJDK 64-Bit Server VM warning: CodeCache is full.
>> Compiler has been disabled.
>>   [junit4] OpenJDK 64-Bit Server VM warning: Try increasing the code
>> cache size using -XX:ReservedCodeCacheSize=
>>
>> Note the "Compiler has been disabled." bit. I believe this effectively
>> means the JVM falls back to interpreted mode (!) and this creates a
>> situation in which:
>>
>> 1) there is a shitty random pick of components for the test (see
>> attached log dump),
>> 2) the code is instrumented with clover, which adds an extra layer of
>> super-overhead method calls and complexity,
>> 3) the JVM falls back to interpreted mode.
>>
>> TestNumericRangeQuery64 is the first test when this hits, but every
>> test from that moment on is just sloooow. Like this one:
>>
>>  [junit4] Suite: org.apache.lucene.index.TestDocTermOrds
>>  [junit4] Completed in 91.39s, 3 tests
>>
>> ran separately (same seed, clover):
>>
>>   [junit4] Suite: org.apache.lucene.index.TestDocTermOrds
>>   [junit4] Completed in 2.30s, 3 tests
>>
>> I don't think there are simple solutions to this problem other than
>> setting a larger code cache (which I did, we will see).
>>
>> Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Long jenkins builds with clover/ code cache warning.

Posted by Dawid Weiss <da...@gmail.com>.

Oh, I just digested a spoonful of hotspot source code. It seems there
is a magic option to attempt to clean the code cache before shutting
down the compiler...

  product(bool, UseCodeCacheFlushing, false,                                \
          "Attempt to clean the code cache before shutting off compiler")   \

Unfortunately it's false by default...

Dawid

On Fri, Apr 20, 2012 at 12:45 AM, Dawid Weiss <da...@gmail.com> wrote:
> So, for the past three or so hours I've been looking at this:
>
> https://builds.apache.org/job/Lucene-Trunk/1899/consoleText
>
> You'll notice that certain tests in there have super-long execution
> times, like this one:
>
>   [junit4] Suite: org.apache.lucene.search.TestNumericRangeQuery64
>   [junit4] Completed in 4251.28s, 34 tests
>
> This test, if run separately, passes in under 60 seconds on my machine
> (running with clover, the same seed so identical startup parameters
> etc.).
>
> I'll spare you the details but what seems to be happening is that at
> some point the code cache is saturated and the JVM emits this:
>
>   [junit4] OpenJDK 64-Bit Server VM warning: CodeCache is full.
> Compiler has been disabled.
>   [junit4] OpenJDK 64-Bit Server VM warning: Try increasing the code
> cache size using -XX:ReservedCodeCacheSize=
>
> Note the "Compiler has been disabled." bit. I believe this effectively
> means the JVM falls back to interpreted mode (!) and this creates a
> situation in which:
>
> 1) there is a shitty random pick of components for the test (see
> attached log dump),
> 2) the code is instrumented with clover, which adds an extra layer of
> super-overhead method calls and complexity,
> 3) the JVM falls back to interpreted mode.
>
> TestNumericRangeQuery64 is the first test when this hits, but every
> test from that moment on is just sloooow. Like this one:
>
>  [junit4] Suite: org.apache.lucene.index.TestDocTermOrds
>  [junit4] Completed in 91.39s, 3 tests
>
> ran separately (same seed, clover):
>
>   [junit4] Suite: org.apache.lucene.index.TestDocTermOrds
>   [junit4] Completed in 2.30s, 3 tests
>
> I don't think there are simple solutions to this problem other than
> setting a larger code cache (which I did, we will see).
>
> Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org