You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Karl Stoney <Ka...@autotrader.co.uk.INVALID> on 2020/02/01 16:13:12 UTC

G1GC Pauses (Young Gen)

Hey all, me again.

I'm still investigating the pauses that I get when a soft commit happens.  I'm now convinced they're coming from G1GC pauses that happen when the soft commit happens and wondering if anyone can see what's up.  Caveat: I'm no JVM expert.

I've uploaded a small time window to gceasy and the latency spikes we see correlate to young gen completely emptying (which is I presume as a result of the new searcher loading?)

https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjAvMDIvMS8tLWdjLXNvbHItMy0yMDIwXzAyXzAxLTE1XzA2LmxvZy0tMTUtNTItNDA=&channel=WEB

Here's one of the longer pauses.  The only thing that stands out to me is that it's getting rid of a lot more Eden region than the other pauses

[1294.080s][info ][gc,start      ] GC(369) Pause Young (Normal) (G1 Evacuation Pause)
[1294.080s][info ][gc,task       ] GC(369) Using 8 workers of 8 for evacuation
[1294.080s][debug][gc,age        ] GC(369) Desired survivor size 785173704 bytes, new threshold 1 (max threshold 1)
[1294.194s][trace][gc,age        ] GC(369) Age table with threshold 1 (max threshold 1)
[1294.194s][trace][gc,age        ] GC(369) - age   1:  128787840 bytes,  128787840 total
[1294.194s][info ][gc,mmu        ] GC(369) MMU target violated: 51.0ms (50.0ms/51.0ms)
[1294.194s][info ][gc,phases     ] GC(369)   Pre Evacuate Collection Set: 0.2ms
[1294.194s][info ][gc,phases     ] GC(369)   Evacuate Collection Set: 110.7ms
[1294.194s][info ][gc,phases     ] GC(369)   Post Evacuate Collection Set: 3.4ms
[1294.194s][info ][gc,phases     ] GC(369)   Other: 0.5ms
[1294.194s][info ][gc,heap       ] GC(369) Eden regions: 397->0(217)
[1294.194s][info ][gc,heap       ] GC(369) Survivor regions: 19->8(52)
[1294.194s][info ][gc,heap       ] GC(369) Old regions: 1340->1359
[1294.194s][info ][gc,heap       ] GC(369) Humongous regions: 2->2
[1294.194s][info ][gc,metaspace  ] GC(369) Metaspace: 56772K->56772K(1101824K)
[1294.194s][info ][gc            ] GC(369) Pause Young (Normal) (G1 Evacuation Pause) 28120M->21883M(31744M) 114.912ms
[1294.194s][info ][gc,cpu        ] GC(369) User=0.67s Sys=0.23s Real=0.11s
[1294.195s][info ][gc,stringdedup] Concurrent String Deduplication (1294.195s)
[1294.196s][info ][gc,stringdedup] Concurrent String Deduplication 106.2K->15696.0B(93032.0B) avg 84.1% (1294.195s, 1294.196s) 1.397ms

We're using the following settings:

The machine has 64gb RAM, 31gb heap.

The core we're constantly querying and updating is 40GB.  It's got a soft auto commit every 10 minutes.

  export GC_TUNE="-XshowSettings:vm \
    -XX:+UnlockExperimentalVMOptions \
    -XX:+ExitOnOutOfMemoryError \
    -XX:+UseG1GC \
    -XX:+PerfDisableSharedMem \
    -XX:+ParallelRefProcEnabled \
    -XX:G1MaxNewSizePercent=30 \
    -XX:G1NewSizePercent=6 \
    -XX:G1HeapRegionSize=16M \
    -XX:G1HeapWastePercent=10 \
    -XX:G1MixedGCCountTarget=16 \
    -XX:InitiatingHeapOccupancyPercent=70 \
    -XX:MaxGCPauseMillis=50 \
    -XX:-ResizePLAB \
    -XX:MaxTenuringThreshold=1 \
    -XX:ParallelGCThreads=8 \
    -XX:ConcGCThreads=2 \
    -XX:TargetSurvivorRatio=90 \
    -XX:+UseStringDeduplication"

Any advice would be greatly appreciated!
This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 9439967). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.

Re: G1GC Pauses (Young Gen)

Posted by Walter Underwood <wu...@wunderwood.org>.

Indexing shouldn’t require a massive heap. The memory used is proportional to the size of the update, not the size of the index.

Our shards are 30 B and we run with an 8 GB heap. Never had a problem.

You only need a lot of heap if you are running faceting (or maybe sorting) on a big index.

Does your system have 70+ GB of RAM? If not, a smaller heap means you can keep more of the index in file buffers. That will make things faster.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 2, 2020, at 1:01 AM, Karl Stoney <Ka...@autotrader.co.uk.INVALID> wrote:
> 
> So interesting fact, setting XX:MaxGCPauseMillis causes g1gc to dynamically adjust the size of your young space, and setting it too low makes it nose dive as tiny as possible during the memory allocations that happen during soft commits.
> 
> Setting XX:MaxGCPauseMillis much high has actually improved my gc experience.
> 
> I'm still getting pretty large pauses during the soft commits though seemingly because of the amount of memory being allocated.  I'm not sure there's much I can do about it other than more memory, however 31gb heap seems pretty reasonable given the size of my index?
> ________________________________
> From: Karl Stoney <Ka...@autotrader.co.uk.INVALID>
> Sent: 01 February 2020 16:13
> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
> Subject: G1GC Pauses (Young Gen)
> 
> Hey all, me again.
> 
> I'm still investigating the pauses that I get when a soft commit happens.  I'm now convinced they're coming from G1GC pauses that happen when the soft commit happens and wondering if anyone can see what's up.  Caveat: I'm no JVM expert.
> 
> I've uploaded a small time window to gceasy and the latency spikes we see correlate to young gen completely emptying (which is I presume as a result of the new searcher loading?)
> 
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgceasy.io%2Fmy-gc-report.jsp%3Fp%3Dc2hhcmVkLzIwMjAvMDIvMS8tLWdjLXNvbHItMy0yMDIwXzAyXzAxLTE1XzA2LmxvZy0tMTUtNTItNDA%3D%26channel%3DWEB&amp;data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C50583bc3d3b8447709da08d7a731ad66%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637161704107629087&amp;sdata=w9QDb0v1qfezc%2Bj9M4a0ZG0Yw6pDRcnzp0qpHCJzMGk%3D&amp;reserved=0
> 
> Here's one of the longer pauses.  The only thing that stands out to me is that it's getting rid of a lot more Eden region than the other pauses
> 
> [1294.080s][info ][gc,start      ] GC(369) Pause Young (Normal) (G1 Evacuation Pause)
> [1294.080s][info ][gc,task       ] GC(369) Using 8 workers of 8 for evacuation
> [1294.080s][debug][gc,age        ] GC(369) Desired survivor size 785173704 bytes, new threshold 1 (max threshold 1)
> [1294.194s][trace][gc,age        ] GC(369) Age table with threshold 1 (max threshold 1)
> [1294.194s][trace][gc,age        ] GC(369) - age   1:  128787840 bytes,  128787840 total
> [1294.194s][info ][gc,mmu        ] GC(369) MMU target violated: 51.0ms (50.0ms/51.0ms)
> [1294.194s][info ][gc,phases     ] GC(369)   Pre Evacuate Collection Set: 0.2ms
> [1294.194s][info ][gc,phases     ] GC(369)   Evacuate Collection Set: 110.7ms
> [1294.194s][info ][gc,phases     ] GC(369)   Post Evacuate Collection Set: 3.4ms
> [1294.194s][info ][gc,phases     ] GC(369)   Other: 0.5ms
> [1294.194s][info ][gc,heap       ] GC(369) Eden regions: 397->0(217)
> [1294.194s][info ][gc,heap       ] GC(369) Survivor regions: 19->8(52)
> [1294.194s][info ][gc,heap       ] GC(369) Old regions: 1340->1359
> [1294.194s][info ][gc,heap       ] GC(369) Humongous regions: 2->2
> [1294.194s][info ][gc,metaspace  ] GC(369) Metaspace: 56772K->56772K(1101824K)
> [1294.194s][info ][gc            ] GC(369) Pause Young (Normal) (G1 Evacuation Pause) 28120M->21883M(31744M) 114.912ms
> [1294.194s][info ][gc,cpu        ] GC(369) User=0.67s Sys=0.23s Real=0.11s
> [1294.195s][info ][gc,stringdedup] Concurrent String Deduplication (1294.195s)
> [1294.196s][info ][gc,stringdedup] Concurrent String Deduplication 106.2K->15696.0B(93032.0B) avg 84.1% (1294.195s, 1294.196s) 1.397ms
> 
> We're using the following settings:
> 
> The machine has 64gb RAM, 31gb heap.
> 
> The core we're constantly querying and updating is 40GB.  It's got a soft auto commit every 10 minutes.
> 
>  export GC_TUNE="-XshowSettings:vm \
>    -XX:+UnlockExperimentalVMOptions \
>    -XX:+ExitOnOutOfMemoryError \
>    -XX:+UseG1GC \
>    -XX:+PerfDisableSharedMem \
>    -XX:+ParallelRefProcEnabled \
>    -XX:G1MaxNewSizePercent=30 \
>    -XX:G1NewSizePercent=6 \
>    -XX:G1HeapRegionSize=16M \
>    -XX:G1HeapWastePercent=10 \
>    -XX:G1MixedGCCountTarget=16 \
>    -XX:InitiatingHeapOccupancyPercent=70 \
>    -XX:MaxGCPauseMillis=50 \
>    -XX:-ResizePLAB \
>    -XX:MaxTenuringThreshold=1 \
>    -XX:ParallelGCThreads=8 \
>    -XX:ConcGCThreads=2 \
>    -XX:TargetSurvivorRatio=90 \
>    -XX:+UseStringDeduplication"
> 
> Any advice would be greatly appreciated!
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 9439967). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 9439967). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.

Re: G1GC Pauses (Young Gen)

Posted by Karl Stoney <Ka...@autotrader.co.uk.INVALID>.

So interesting fact, setting XX:MaxGCPauseMillis causes g1gc to dynamically adjust the size of your young space, and setting it too low makes it nose dive as tiny as possible during the memory allocations that happen during soft commits.

Setting XX:MaxGCPauseMillis much high has actually improved my gc experience.

I'm still getting pretty large pauses during the soft commits though seemingly because of the amount of memory being allocated.  I'm not sure there's much I can do about it other than more memory, however 31gb heap seems pretty reasonable given the size of my index?
________________________________
From: Karl Stoney <Ka...@autotrader.co.uk.INVALID>
Sent: 01 February 2020 16:13
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: G1GC Pauses (Young Gen)

Hey all, me again.

I'm still investigating the pauses that I get when a soft commit happens.  I'm now convinced they're coming from G1GC pauses that happen when the soft commit happens and wondering if anyone can see what's up.  Caveat: I'm no JVM expert.

I've uploaded a small time window to gceasy and the latency spikes we see correlate to young gen completely emptying (which is I presume as a result of the new searcher loading?)

https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgceasy.io%2Fmy-gc-report.jsp%3Fp%3Dc2hhcmVkLzIwMjAvMDIvMS8tLWdjLXNvbHItMy0yMDIwXzAyXzAxLTE1XzA2LmxvZy0tMTUtNTItNDA%3D%26channel%3DWEB&amp;data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C50583bc3d3b8447709da08d7a731ad66%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637161704107629087&amp;sdata=w9QDb0v1qfezc%2Bj9M4a0ZG0Yw6pDRcnzp0qpHCJzMGk%3D&amp;reserved=0

Here's one of the longer pauses.  The only thing that stands out to me is that it's getting rid of a lot more Eden region than the other pauses

[1294.080s][info ][gc,start      ] GC(369) Pause Young (Normal) (G1 Evacuation Pause)
[1294.080s][info ][gc,task       ] GC(369) Using 8 workers of 8 for evacuation
[1294.080s][debug][gc,age        ] GC(369) Desired survivor size 785173704 bytes, new threshold 1 (max threshold 1)
[1294.194s][trace][gc,age        ] GC(369) Age table with threshold 1 (max threshold 1)
[1294.194s][trace][gc,age        ] GC(369) - age   1:  128787840 bytes,  128787840 total
[1294.194s][info ][gc,mmu        ] GC(369) MMU target violated: 51.0ms (50.0ms/51.0ms)
[1294.194s][info ][gc,phases     ] GC(369)   Pre Evacuate Collection Set: 0.2ms
[1294.194s][info ][gc,phases     ] GC(369)   Evacuate Collection Set: 110.7ms
[1294.194s][info ][gc,phases     ] GC(369)   Post Evacuate Collection Set: 3.4ms
[1294.194s][info ][gc,phases     ] GC(369)   Other: 0.5ms
[1294.194s][info ][gc,heap       ] GC(369) Eden regions: 397->0(217)
[1294.194s][info ][gc,heap       ] GC(369) Survivor regions: 19->8(52)
[1294.194s][info ][gc,heap       ] GC(369) Old regions: 1340->1359
[1294.194s][info ][gc,heap       ] GC(369) Humongous regions: 2->2
[1294.194s][info ][gc,metaspace  ] GC(369) Metaspace: 56772K->56772K(1101824K)
[1294.194s][info ][gc            ] GC(369) Pause Young (Normal) (G1 Evacuation Pause) 28120M->21883M(31744M) 114.912ms
[1294.194s][info ][gc,cpu        ] GC(369) User=0.67s Sys=0.23s Real=0.11s
[1294.195s][info ][gc,stringdedup] Concurrent String Deduplication (1294.195s)
[1294.196s][info ][gc,stringdedup] Concurrent String Deduplication 106.2K->15696.0B(93032.0B) avg 84.1% (1294.195s, 1294.196s) 1.397ms

We're using the following settings:

The machine has 64gb RAM, 31gb heap.

The core we're constantly querying and updating is 40GB.  It's got a soft auto commit every 10 minutes.

  export GC_TUNE="-XshowSettings:vm \
    -XX:+UnlockExperimentalVMOptions \
    -XX:+ExitOnOutOfMemoryError \
    -XX:+UseG1GC \
    -XX:+PerfDisableSharedMem \
    -XX:+ParallelRefProcEnabled \
    -XX:G1MaxNewSizePercent=30 \
    -XX:G1NewSizePercent=6 \
    -XX:G1HeapRegionSize=16M \
    -XX:G1HeapWastePercent=10 \
    -XX:G1MixedGCCountTarget=16 \
    -XX:InitiatingHeapOccupancyPercent=70 \
    -XX:MaxGCPauseMillis=50 \
    -XX:-ResizePLAB \
    -XX:MaxTenuringThreshold=1 \
    -XX:ParallelGCThreads=8 \
    -XX:ConcGCThreads=2 \
    -XX:TargetSurvivorRatio=90 \
    -XX:+UseStringDeduplication"

Any advice would be greatly appreciated!
This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 9439967). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 9439967). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.