You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Joe Obernberger <jo...@gmail.com> on 2017/08/18 16:37:26 UTC

Solr 6.6.0 - High CPU during indexing

Indexing about 15 million documents per day across 100 shards on 45 
servers.  Up until about 350 million documents, each of the solr 
instances was taking up about 1 core (100% CPU).  Recently, they all 
jumped to 700%.  Is this normal?  Anything that I can check for?

I don't see anything unusual in the solr logs.  Sample from the GC logs:

---------------

2017-08-18 11:53:15 GC log file created /opt/solr6/server/logs/solr_gc.log.2
OpenJDK 64-Bit Server VM (25.141-b16) for linux-amd64 JRE 
(1.8.0_141-b16), built on Jul 20 2017 11:14:57 by "mockbuild" with gcc 
4.4.7 20120313 (Red Hat 4.4.7-18)
Memory: 4k page, physical 99016188k(796940k free), swap 
33554428k(32614048k free)
CommandLine flags: -XX:+AggressiveOpts -XX:CICompilerCount=12 
-XX:ConcGCThreads=4 -XX:G1HeapRegionSize=16777216 
-XX:GCLogFileSize=20971520 -XX:InitialHeapSize=17179869184 
-XX:InitiatingHeapOccupancyPercent=75 -XX:MarkStackSize=4194304 
-XX:MaxDirectMemorySize=3221225472 -XX:MaxGCPauseMillis=300 
-XX:MaxHeapSize=30064771072 -XX:MaxNewSize=18035507200 
-XX:MinHeapDeltaBytes=16777216 -XX:NumberOfGCLogFiles=9 
-XX:OnOutOfMemoryError=/opt/solr6/bin/oom_solr.sh 9100 
/opt/solr6/server/logs -XX:ParallelGCThreads=16 
-XX:+ParallelRefProcEnabled -XX:+PerfDisableSharedMem -XX:+PrintGC 
-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps 
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC 
-XX:+PrintTenuringDistribution -XX:-ResizePLAB -XX:ThreadStackSize=256 
-XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC 
-XX:+UseGCLogFileRotation -XX:+UseLargePages
{Heap before GC invocations=559440 (full 0):
  garbage-first heap   total 29360128K, used 24944705K 
[0x00000000c0000000, 0x00000000c1003800, 0x00000007c0000000)
   region size 16384K, 1075 young (17612800K), 13 survivors (212992K)
  Metaspace       used 95460K, capacity 97248K, committed 97744K, 
reserved 1134592K
   class space    used 11616K, capacity 12104K, committed 12240K, 
reserved 1048576K
2017-08-18T11:53:15.985-0400: 522594.835: [GC pause (G1 Evacuation 
Pause) (young)
Desired survivor size 1132462080 bytes, new threshold 15 (max 15)
- age   1:   23419920 bytes,   23419920 total
- age   2:    9355296 bytes,   32775216 total
- age   3:    2455384 bytes,   35230600 total
- age   4:   38246704 bytes,   73477304 total
- age   5:   47064408 bytes,  120541712 total
- age   6:   13228864 bytes,  133770576 total
- age   7:   23990800 bytes,  157761376 total
- age   8:    1031416 bytes,  158792792 total
- age   9:   17011128 bytes,  175803920 total
- age  10:    7371888 bytes,  183175808 total
- age  11:    6226576 bytes,  189402384 total
- age  12:     637184 bytes,  190039568 total
- age  13:   11577864 bytes,  201617432 total
- age  14:    9519224 bytes,  211136656 total
- age  15:     672304 bytes,  211808960 total
, 0.0391210 secs]
    [Parallel Time: 32.1 ms, GC Workers: 16]
       [GC Worker Start (ms): Min: 522594835.0, Avg: 522594835.1, Max: 
522594835.2, Diff: 0.2]
       [Ext Root Scanning (ms): Min: 0.5, Avg: 0.8, Max: 2.2, Diff: 1.7, 
Sum: 12.2]
       [Update RS (ms): Min: 0.9, Avg: 2.3, Max: 3.2, Diff: 2.2, Sum: 36.6]
          [Processed Buffers: Min: 3, Avg: 4.7, Max: 8, Diff: 5, Sum: 75]
       [Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 3.0]
       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 
0.0, Sum: 0.1]
       [Object Copy (ms): Min: 27.7, Avg: 28.3, Max: 28.6, Diff: 0.8, 
Sum: 453.5]
       [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
          [Termination Attempts: Min: 1, Avg: 1.3, Max: 2, Diff: 1, Sum: 21]
       [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, 
Sum: 2.4]
       [GC Worker Total (ms): Min: 31.6, Avg: 31.7, Max: 32.0, Diff: 
0.4, Sum: 507.9]
       [GC Worker End (ms): Min: 522594866.7, Avg: 522594866.8, Max: 
522594867.0, Diff: 0.2]
    [Code Root Fixup: 0.1 ms]
    [Code Root Purge: 0.0 ms]
    [Clear CT: 1.7 ms]
    [Other: 5.2 ms]
       [Choose CSet: 0.0 ms]
       [Ref Proc: 2.9 ms]
       [Ref Enq: 0.1 ms]
       [Redirty Cards: 0.2 ms]
       [Humongous Register: 0.1 ms]
       [Humongous Reclaim: 0.0 ms]
       [Free CSet: 1.6 ms]
    [Eden: 16.6G(16.6G)->0.0B(16.6G) Survivors: 208.0M->208.0M Heap: 
23.8G(28.0G)->7371.4M(28.0G)]
Heap after GC invocations=559441 (full 0):
  garbage-first heap   total 29360128K, used 7548353K 
[0x00000000c0000000, 0x00000000c1003800, 0x00000007c0000000)
   region size 16384K, 13 young (212992K), 13 survivors (212992K)
  Metaspace       used 95460K, capacity 97248K, committed 97744K, 
reserved 1134592K
   class space    used 11616K, capacity 12104K, committed 12240K, 
reserved 1048576K
}
  [Times: user=0.54 sys=0.00, real=0.04 secs]
2017-08-18T11:53:16.024-0400: 522594.874: Total time for which 
application threads were stopped: 0.0471187 seconds, Stopping threads 
took: 0.0001739 seconds
2017-08-18T11:53:16.811-0400: 522595.661: Total time for which 
application threads were stopped: 0.0019163 seconds, Stopping threads 
took: 0.0001631 seconds
2017-08-18T11:53:17.210-0400: 522596.060: Total time for which 
application threads were stopped: 0.0010934 seconds, Stopping threads 
took: 0.0001659 seconds

----------------

-Joe


Re: Solr 6.6.0 - High CPU during indexing

Posted by Joe Obernberger <jo...@gmail.com>.
I was able to attach to one server by changing the startup and adding:

|-Dcom.sun.management.jmxremote \ 
-Dcom.sun.management.jmxremote.local.only=false \ 
-Dcom.sun.management.jmxremote.ssl=false \ 
-Dcom.sun.management.jmxremote.authenticate=false \ 
-Dcom.sun.management.jmxremote.port=18983 \ 
-Dcom.sun.management.jmxremote.rmi.port=18983|

I will let you know the results.

On 8/18/2017 12:43 PM, Michael Braun wrote:
> Have you attached JVisualVM or a similar application to the process to
> sample where the time is being spent? It can be very helpful for debugging
> this sort of problem.
>
> On Fri, Aug 18, 2017 at 12:37 PM, Joe Obernberger <
> joseph.obernberger@gmail.com> wrote:
>
>> Indexing about 15 million documents per day across 100 shards on 45
>> servers.  Up until about 350 million documents, each of the solr instances
>> was taking up about 1 core (100% CPU).  Recently, they all jumped to 700%.
>> Is this normal?  Anything that I can check for?
>>
>> I don't see anything unusual in the solr logs.  Sample from the GC logs:
>>
>> ---------------
>>
>> 2017-08-18 11:53:15 GC log file created /opt/solr6/server/logs/solr_gc
>> .log.2
>> OpenJDK 64-Bit Server VM (25.141-b16) for linux-amd64 JRE (1.8.0_141-b16),
>> built on Jul 20 2017 11:14:57 by "mockbuild" with gcc 4.4.7 20120313 (Red
>> Hat 4.4.7-18)
>> Memory: 4k page, physical 99016188k(796940k free), swap
>> 33554428k(32614048k free)
>> CommandLine flags: -XX:+AggressiveOpts -XX:CICompilerCount=12
>> -XX:ConcGCThreads=4 -XX:G1HeapRegionSize=16777216
>> -XX:GCLogFileSize=20971520 -XX:InitialHeapSize=17179869184
>> -XX:InitiatingHeapOccupancyPercent=75 -XX:MarkStackSize=4194304
>> -XX:MaxDirectMemorySize=3221225472 -XX:MaxGCPauseMillis=300
>> -XX:MaxHeapSize=30064771072 -XX:MaxNewSize=18035507200 <(803)%20550-7200>
>> -XX:MinHeapDeltaBytes=16777216 -XX:NumberOfGCLogFiles=9
>> -XX:OnOutOfMemoryError=/opt/solr6/bin/oom_solr.sh 9100
>> /opt/solr6/server/logs -XX:ParallelGCThreads=16 -XX:+ParallelRefProcEnabled
>> -XX:+PerfDisableSharedMem -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
>> -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>> -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:-ResizePLAB
>> -XX:ThreadStackSize=256 -XX:+UseCompressedClassPointers
>> -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation
>> -XX:+UseLargePages
>> {Heap before GC invocations=559440 (full 0):
>>   garbage-first heap   total 29360128K, used 24944705K [0x00000000c0000000,
>> 0x00000000c1003800, 0x00000007c0000000)
>>    region size 16384K, 1075 young (17612800K), 13 survivors (212992K)
>>   Metaspace       used 95460K, capacity 97248K, committed 97744K, reserved
>> 1134592K
>>    class space    used 11616K, capacity 12104K, committed 12240K, reserved
>> 1048576K
>> 2017-08-18T11:53:15.985-0400: 522594.835: [GC pause (G1 Evacuation Pause)
>> (young)
>> Desired survivor size 1132462080 bytes, new threshold 15 (max 15)
>> - age   1:   23419920 bytes,   23419920 total
>> - age   2:    9355296 bytes,   32775216 total
>> - age   3:    2455384 bytes,   35230600 total
>> - age   4:   38246704 bytes,   73477304 total
>> - age   5:   47064408 bytes,  120541712 total
>> - age   6:   13228864 bytes,  133770576 total
>> - age   7:   23990800 bytes,  157761376 total
>> - age   8:    1031416 bytes,  158792792 total
>> - age   9:   17011128 bytes,  175803920 total
>> - age  10:    7371888 bytes,  183175808 total
>> - age  11:    6226576 bytes,  189402384 total
>> - age  12:     637184 bytes,  190039568 total
>> - age  13:   11577864 bytes,  201617432 total
>> - age  14:    9519224 bytes,  211136656 total
>> - age  15:     672304 bytes,  211808960 total
>> , 0.0391210 secs]
>>     [Parallel Time: 32.1 ms, GC Workers: 16]
>>        [GC Worker Start (ms): Min: 522594835.0, Avg: 522594835.1, Max:
>> 522594835.2, Diff: 0.2]
>>        [Ext Root Scanning (ms): Min: 0.5, Avg: 0.8, Max: 2.2, Diff: 1.7,
>> Sum: 12.2]
>>        [Update RS (ms): Min: 0.9, Avg: 2.3, Max: 3.2, Diff: 2.2, Sum: 36.6]
>>           [Processed Buffers: Min: 3, Avg: 4.7, Max: 8, Diff: 5, Sum: 75]
>>        [Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 3.0]
>>        [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>> Sum: 0.1]
>>        [Object Copy (ms): Min: 27.7, Avg: 28.3, Max: 28.6, Diff: 0.8, Sum:
>> 453.5]
>>        [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
>>           [Termination Attempts: Min: 1, Avg: 1.3, Max: 2, Diff: 1, Sum: 21]
>>        [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum:
>> 2.4]
>>        [GC Worker Total (ms): Min: 31.6, Avg: 31.7, Max: 32.0, Diff: 0.4,
>> Sum: 507.9]
>>        [GC Worker End (ms): Min: 522594866.7, Avg: 522594866.8, Max:
>> 522594867.0, Diff: 0.2]
>>     [Code Root Fixup: 0.1 ms]
>>     [Code Root Purge: 0.0 ms]
>>     [Clear CT: 1.7 ms]
>>     [Other: 5.2 ms]
>>        [Choose CSet: 0.0 ms]
>>        [Ref Proc: 2.9 ms]
>>        [Ref Enq: 0.1 ms]
>>        [Redirty Cards: 0.2 ms]
>>        [Humongous Register: 0.1 ms]
>>        [Humongous Reclaim: 0.0 ms]
>>        [Free CSet: 1.6 ms]
>>     [Eden: 16.6G(16.6G)->0.0B(16.6G) Survivors: 208.0M->208.0M Heap:
>> 23.8G(28.0G)->7371.4M(28.0G)]
>> Heap after GC invocations=559441 (full 0):
>>   garbage-first heap   total 29360128K, used 7548353K [0x00000000c0000000,
>> 0x00000000c1003800, 0x00000007c0000000)
>>    region size 16384K, 13 young (212992K), 13 survivors (212992K)
>>   Metaspace       used 95460K, capacity 97248K, committed 97744K, reserved
>> 1134592K
>>    class space    used 11616K, capacity 12104K, committed 12240K, reserved
>> 1048576K
>> }
>>   [Times: user=0.54 sys=0.00, real=0.04 secs]
>> 2017-08-18T11:53:16.024-0400: 522594.874: Total time for which application
>> threads were stopped: 0.0471187 seconds, Stopping threads took: 0.0001739
>> seconds
>> 2017-08-18T11:53:16.811-0400: 522595.661: Total time for which application
>> threads were stopped: 0.0019163 seconds, Stopping threads took: 0.0001631
>> seconds
>> 2017-08-18T11:53:17.210-0400: 522596.060: Total time for which application
>> threads were stopped: 0.0010934 seconds, Stopping threads took: 0.0001659
>> seconds
>>
>> ----------------
>>
>> -Joe
>>
>>
>
> ---
> This email has been checked for viruses by AVG.
> http://www.avg.com
>


Re: Solr 6.6.0 - High CPU during indexing

Posted by Joe Obernberger <jo...@gmail.com>.
Thank you Michael.  Oddly when I start jstatd on one of the servers, I 
see all the JVM processes in jvisualvm except the solr one!  Any idea why?


On 8/18/2017 12:43 PM, Michael Braun wrote:
> Have you attached JVisualVM or a similar application to the process to
> sample where the time is being spent? It can be very helpful for debugging
> this sort of problem.
>
> On Fri, Aug 18, 2017 at 12:37 PM, Joe Obernberger <
> joseph.obernberger@gmail.com> wrote:
>
>> Indexing about 15 million documents per day across 100 shards on 45
>> servers.  Up until about 350 million documents, each of the solr instances
>> was taking up about 1 core (100% CPU).  Recently, they all jumped to 700%.
>> Is this normal?  Anything that I can check for?
>>
>> I don't see anything unusual in the solr logs.  Sample from the GC logs:
>>
>> ---------------
>>
>> 2017-08-18 11:53:15 GC log file created /opt/solr6/server/logs/solr_gc
>> .log.2
>> OpenJDK 64-Bit Server VM (25.141-b16) for linux-amd64 JRE (1.8.0_141-b16),
>> built on Jul 20 2017 11:14:57 by "mockbuild" with gcc 4.4.7 20120313 (Red
>> Hat 4.4.7-18)
>> Memory: 4k page, physical 99016188k(796940k free), swap
>> 33554428k(32614048k free)
>> CommandLine flags: -XX:+AggressiveOpts -XX:CICompilerCount=12
>> -XX:ConcGCThreads=4 -XX:G1HeapRegionSize=16777216
>> -XX:GCLogFileSize=20971520 -XX:InitialHeapSize=17179869184
>> -XX:InitiatingHeapOccupancyPercent=75 -XX:MarkStackSize=4194304
>> -XX:MaxDirectMemorySize=3221225472 -XX:MaxGCPauseMillis=300
>> -XX:MaxHeapSize=30064771072 -XX:MaxNewSize=18035507200 <(803)%20550-7200>
>> -XX:MinHeapDeltaBytes=16777216 -XX:NumberOfGCLogFiles=9
>> -XX:OnOutOfMemoryError=/opt/solr6/bin/oom_solr.sh 9100
>> /opt/solr6/server/logs -XX:ParallelGCThreads=16 -XX:+ParallelRefProcEnabled
>> -XX:+PerfDisableSharedMem -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
>> -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>> -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:-ResizePLAB
>> -XX:ThreadStackSize=256 -XX:+UseCompressedClassPointers
>> -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation
>> -XX:+UseLargePages
>> {Heap before GC invocations=559440 (full 0):
>>   garbage-first heap   total 29360128K, used 24944705K [0x00000000c0000000,
>> 0x00000000c1003800, 0x00000007c0000000)
>>    region size 16384K, 1075 young (17612800K), 13 survivors (212992K)
>>   Metaspace       used 95460K, capacity 97248K, committed 97744K, reserved
>> 1134592K
>>    class space    used 11616K, capacity 12104K, committed 12240K, reserved
>> 1048576K
>> 2017-08-18T11:53:15.985-0400: 522594.835: [GC pause (G1 Evacuation Pause)
>> (young)
>> Desired survivor size 1132462080 bytes, new threshold 15 (max 15)
>> - age   1:   23419920 bytes,   23419920 total
>> - age   2:    9355296 bytes,   32775216 total
>> - age   3:    2455384 bytes,   35230600 total
>> - age   4:   38246704 bytes,   73477304 total
>> - age   5:   47064408 bytes,  120541712 total
>> - age   6:   13228864 bytes,  133770576 total
>> - age   7:   23990800 bytes,  157761376 total
>> - age   8:    1031416 bytes,  158792792 total
>> - age   9:   17011128 bytes,  175803920 total
>> - age  10:    7371888 bytes,  183175808 total
>> - age  11:    6226576 bytes,  189402384 total
>> - age  12:     637184 bytes,  190039568 total
>> - age  13:   11577864 bytes,  201617432 total
>> - age  14:    9519224 bytes,  211136656 total
>> - age  15:     672304 bytes,  211808960 total
>> , 0.0391210 secs]
>>     [Parallel Time: 32.1 ms, GC Workers: 16]
>>        [GC Worker Start (ms): Min: 522594835.0, Avg: 522594835.1, Max:
>> 522594835.2, Diff: 0.2]
>>        [Ext Root Scanning (ms): Min: 0.5, Avg: 0.8, Max: 2.2, Diff: 1.7,
>> Sum: 12.2]
>>        [Update RS (ms): Min: 0.9, Avg: 2.3, Max: 3.2, Diff: 2.2, Sum: 36.6]
>>           [Processed Buffers: Min: 3, Avg: 4.7, Max: 8, Diff: 5, Sum: 75]
>>        [Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 3.0]
>>        [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>> Sum: 0.1]
>>        [Object Copy (ms): Min: 27.7, Avg: 28.3, Max: 28.6, Diff: 0.8, Sum:
>> 453.5]
>>        [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
>>           [Termination Attempts: Min: 1, Avg: 1.3, Max: 2, Diff: 1, Sum: 21]
>>        [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum:
>> 2.4]
>>        [GC Worker Total (ms): Min: 31.6, Avg: 31.7, Max: 32.0, Diff: 0.4,
>> Sum: 507.9]
>>        [GC Worker End (ms): Min: 522594866.7, Avg: 522594866.8, Max:
>> 522594867.0, Diff: 0.2]
>>     [Code Root Fixup: 0.1 ms]
>>     [Code Root Purge: 0.0 ms]
>>     [Clear CT: 1.7 ms]
>>     [Other: 5.2 ms]
>>        [Choose CSet: 0.0 ms]
>>        [Ref Proc: 2.9 ms]
>>        [Ref Enq: 0.1 ms]
>>        [Redirty Cards: 0.2 ms]
>>        [Humongous Register: 0.1 ms]
>>        [Humongous Reclaim: 0.0 ms]
>>        [Free CSet: 1.6 ms]
>>     [Eden: 16.6G(16.6G)->0.0B(16.6G) Survivors: 208.0M->208.0M Heap:
>> 23.8G(28.0G)->7371.4M(28.0G)]
>> Heap after GC invocations=559441 (full 0):
>>   garbage-first heap   total 29360128K, used 7548353K [0x00000000c0000000,
>> 0x00000000c1003800, 0x00000007c0000000)
>>    region size 16384K, 13 young (212992K), 13 survivors (212992K)
>>   Metaspace       used 95460K, capacity 97248K, committed 97744K, reserved
>> 1134592K
>>    class space    used 11616K, capacity 12104K, committed 12240K, reserved
>> 1048576K
>> }
>>   [Times: user=0.54 sys=0.00, real=0.04 secs]
>> 2017-08-18T11:53:16.024-0400: 522594.874: Total time for which application
>> threads were stopped: 0.0471187 seconds, Stopping threads took: 0.0001739
>> seconds
>> 2017-08-18T11:53:16.811-0400: 522595.661: Total time for which application
>> threads were stopped: 0.0019163 seconds, Stopping threads took: 0.0001631
>> seconds
>> 2017-08-18T11:53:17.210-0400: 522596.060: Total time for which application
>> threads were stopped: 0.0010934 seconds, Stopping threads took: 0.0001659
>> seconds
>>
>> ----------------
>>
>> -Joe
>>
>>
>
> ---
> This email has been checked for viruses by AVG.
> http://www.avg.com
>


Re: Solr 6.6.0 - High CPU during indexing

Posted by Michael Braun <n3...@gmail.com>.
Have you attached JVisualVM or a similar application to the process to
sample where the time is being spent? It can be very helpful for debugging
this sort of problem.

On Fri, Aug 18, 2017 at 12:37 PM, Joe Obernberger <
joseph.obernberger@gmail.com> wrote:

> Indexing about 15 million documents per day across 100 shards on 45
> servers.  Up until about 350 million documents, each of the solr instances
> was taking up about 1 core (100% CPU).  Recently, they all jumped to 700%.
> Is this normal?  Anything that I can check for?
>
> I don't see anything unusual in the solr logs.  Sample from the GC logs:
>
> ---------------
>
> 2017-08-18 11:53:15 GC log file created /opt/solr6/server/logs/solr_gc
> .log.2
> OpenJDK 64-Bit Server VM (25.141-b16) for linux-amd64 JRE (1.8.0_141-b16),
> built on Jul 20 2017 11:14:57 by "mockbuild" with gcc 4.4.7 20120313 (Red
> Hat 4.4.7-18)
> Memory: 4k page, physical 99016188k(796940k free), swap
> 33554428k(32614048k free)
> CommandLine flags: -XX:+AggressiveOpts -XX:CICompilerCount=12
> -XX:ConcGCThreads=4 -XX:G1HeapRegionSize=16777216
> -XX:GCLogFileSize=20971520 -XX:InitialHeapSize=17179869184
> -XX:InitiatingHeapOccupancyPercent=75 -XX:MarkStackSize=4194304
> -XX:MaxDirectMemorySize=3221225472 -XX:MaxGCPauseMillis=300
> -XX:MaxHeapSize=30064771072 -XX:MaxNewSize=18035507200 <(803)%20550-7200>
> -XX:MinHeapDeltaBytes=16777216 -XX:NumberOfGCLogFiles=9
> -XX:OnOutOfMemoryError=/opt/solr6/bin/oom_solr.sh 9100
> /opt/solr6/server/logs -XX:ParallelGCThreads=16 -XX:+ParallelRefProcEnabled
> -XX:+PerfDisableSharedMem -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:-ResizePLAB
> -XX:ThreadStackSize=256 -XX:+UseCompressedClassPointers
> -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation
> -XX:+UseLargePages
> {Heap before GC invocations=559440 (full 0):
>  garbage-first heap   total 29360128K, used 24944705K [0x00000000c0000000,
> 0x00000000c1003800, 0x00000007c0000000)
>   region size 16384K, 1075 young (17612800K), 13 survivors (212992K)
>  Metaspace       used 95460K, capacity 97248K, committed 97744K, reserved
> 1134592K
>   class space    used 11616K, capacity 12104K, committed 12240K, reserved
> 1048576K
> 2017-08-18T11:53:15.985-0400: 522594.835: [GC pause (G1 Evacuation Pause)
> (young)
> Desired survivor size 1132462080 bytes, new threshold 15 (max 15)
> - age   1:   23419920 bytes,   23419920 total
> - age   2:    9355296 bytes,   32775216 total
> - age   3:    2455384 bytes,   35230600 total
> - age   4:   38246704 bytes,   73477304 total
> - age   5:   47064408 bytes,  120541712 total
> - age   6:   13228864 bytes,  133770576 total
> - age   7:   23990800 bytes,  157761376 total
> - age   8:    1031416 bytes,  158792792 total
> - age   9:   17011128 bytes,  175803920 total
> - age  10:    7371888 bytes,  183175808 total
> - age  11:    6226576 bytes,  189402384 total
> - age  12:     637184 bytes,  190039568 total
> - age  13:   11577864 bytes,  201617432 total
> - age  14:    9519224 bytes,  211136656 total
> - age  15:     672304 bytes,  211808960 total
> , 0.0391210 secs]
>    [Parallel Time: 32.1 ms, GC Workers: 16]
>       [GC Worker Start (ms): Min: 522594835.0, Avg: 522594835.1, Max:
> 522594835.2, Diff: 0.2]
>       [Ext Root Scanning (ms): Min: 0.5, Avg: 0.8, Max: 2.2, Diff: 1.7,
> Sum: 12.2]
>       [Update RS (ms): Min: 0.9, Avg: 2.3, Max: 3.2, Diff: 2.2, Sum: 36.6]
>          [Processed Buffers: Min: 3, Avg: 4.7, Max: 8, Diff: 5, Sum: 75]
>       [Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 3.0]
>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
> Sum: 0.1]
>       [Object Copy (ms): Min: 27.7, Avg: 28.3, Max: 28.6, Diff: 0.8, Sum:
> 453.5]
>       [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
>          [Termination Attempts: Min: 1, Avg: 1.3, Max: 2, Diff: 1, Sum: 21]
>       [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum:
> 2.4]
>       [GC Worker Total (ms): Min: 31.6, Avg: 31.7, Max: 32.0, Diff: 0.4,
> Sum: 507.9]
>       [GC Worker End (ms): Min: 522594866.7, Avg: 522594866.8, Max:
> 522594867.0, Diff: 0.2]
>    [Code Root Fixup: 0.1 ms]
>    [Code Root Purge: 0.0 ms]
>    [Clear CT: 1.7 ms]
>    [Other: 5.2 ms]
>       [Choose CSet: 0.0 ms]
>       [Ref Proc: 2.9 ms]
>       [Ref Enq: 0.1 ms]
>       [Redirty Cards: 0.2 ms]
>       [Humongous Register: 0.1 ms]
>       [Humongous Reclaim: 0.0 ms]
>       [Free CSet: 1.6 ms]
>    [Eden: 16.6G(16.6G)->0.0B(16.6G) Survivors: 208.0M->208.0M Heap:
> 23.8G(28.0G)->7371.4M(28.0G)]
> Heap after GC invocations=559441 (full 0):
>  garbage-first heap   total 29360128K, used 7548353K [0x00000000c0000000,
> 0x00000000c1003800, 0x00000007c0000000)
>   region size 16384K, 13 young (212992K), 13 survivors (212992K)
>  Metaspace       used 95460K, capacity 97248K, committed 97744K, reserved
> 1134592K
>   class space    used 11616K, capacity 12104K, committed 12240K, reserved
> 1048576K
> }
>  [Times: user=0.54 sys=0.00, real=0.04 secs]
> 2017-08-18T11:53:16.024-0400: 522594.874: Total time for which application
> threads were stopped: 0.0471187 seconds, Stopping threads took: 0.0001739
> seconds
> 2017-08-18T11:53:16.811-0400: 522595.661: Total time for which application
> threads were stopped: 0.0019163 seconds, Stopping threads took: 0.0001631
> seconds
> 2017-08-18T11:53:17.210-0400: 522596.060: Total time for which application
> threads were stopped: 0.0010934 seconds, Stopping threads took: 0.0001659
> seconds
>
> ----------------
>
> -Joe
>
>

Re: Solr 6.6.0 - High CPU during indexing

Posted by Joe Obernberger <jo...@gmail.com>.
Ah!  Yes - that makes much more sense:

CPU: http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_CPU.jpg
Mem: http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_Mem.jpg

-Joe


On 8/18/2017 3:35 PM, Michael Braun wrote:
> When I recommended JVisualVM, specifically the "Sampling" portion of 
> the app - just using it to see the monitoring isn't half as useful.
>
> On Fri, Aug 18, 2017 at 3:31 PM, Joe Obernberger 
> <joseph.obernberger@gmail.com <ma...@gmail.com>> 
> wrote:
>
>     Hi Walter - I see what you are saying, but the machine is not
>     actively swapping (that would be the concern - right?) It's the
>     CPU usage that I'm trying to figure out.  Htop reports that there
>     is about 20G of disk cache in use, and about 76G of RAM in use by
>     programs.  VIRT memory is what was requested to be allocated, not
>     what was actually allocated; that is RES.  Unless my understanding
>     of top is wrong.
>
>     http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_htop.jpg
>     <http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_htop.jpg>
>
>     atop:
>     http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_atop.jpg
>     <http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_atop.jpg>
>
>     -Joe
>
>
>     On 8/18/2017 3:12 PM, Walter Underwood wrote:
>
>         I see a server with 100Gb of memory and processes (java and
>         jsvc) using 203Gb of virtual memory. Hmm.
>
>         wunder
>         Walter Underwood
>         wunder@wunderwood.org <ma...@wunderwood.org>
>         http://observer.wunderwood.org/
>         <http://observer.wunderwood.org/> (my blog)
>
>
>             On Aug 18, 2017, at 12:05 PM, Joe Obernberger
>             <joseph.obernberger@gmail.com
>             <ma...@gmail.com>> wrote:
>
>             Thank you Shawn.  Please see:
>             http://www.lovehorsepower.com/Vesta
>             <http://www.lovehorsepower.com/Vesta>
>             for screen shots of top
>             (http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_top.jpg
>             <http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_top.jpg>)
>             and several screen shots over various times of jvisualvm.
>
>             There is also the GC log and the regular solr.log for one
>             server (named Vesta).  Please note that we are using HDFS
>             for storage.  I love top, but also use htop and atop as
>             they show additional information.  In general we are RAM
>             limited and therefore do not have much cache for OS/disk
>             as we would like, but this issue is CPU related.  After
>             restarting the one node, the CPU usage stayed low for a
>             while, but then eventually comes up to ~800% where it will
>             stay.
>
>             Please let me know if there is other information that I
>             can provide, or what I should be looking for in the GC
>             logs.  Thanks!
>
>             -Joe
>
>
>             On 8/18/2017 2:25 PM, Shawn Heisey wrote:
>
>                 On 8/18/2017 10:37 AM, Joe Obernberger wrote:
>
>                     Indexing about 15 million documents per day across
>                     100 shards on 45
>                     servers.  Up until about 350 million documents,
>                     each of the solr
>                     instances was taking up about 1 core (100% CPU). 
>                     Recently, they all
>                     jumped to 700%.  Is this normal?  Anything that I
>                     can check for?
>
>                     I don't see anything unusual in the solr logs.
>                     Sample from the GC logs:
>
>                 A sample from GC logs won't reveal anything.  We would
>                 need the entire
>                 GC log.  To share something like that, you need a file
>                 sharing site,
>                 something like dropbox.  With the full log, we can
>                 analyze it for
>                 indications of GC problems.
>
>                 There are many things that can cause a sudden massive
>                 increase in CPU
>                 usage.  In this case, it is likely due to increased
>                 requirements because
>                 indexing 15 million documents per day has made the
>                 index larger, and now
>                 it probably needs additional resources on each server
>                 that are not
>                 available.
>
>                 The most common need for additional resources is
>                 unallocated system
>                 memory for the operating system to cache the index. 
>                 Something else that
>                 sometimes happens is that the index outgrows the max
>                 heap size, which we
>                 would be able to learn from the full GC log.
>
>                 These problem are discussed here:
>
>                 https://wiki.apache.org/solr/SolrPerformanceProblems
>                 <https://wiki.apache.org/solr/SolrPerformanceProblems>
>
>                 Another useful piece of information is obtained by
>                 running the "top"
>                 utility on the commandline, pressing shift-M to sort
>                 by memory, and
>                 taking a screenshot of that display.  Then you would
>                 need a file-sharing
>                 website to share the image.
>
>                 Thanks,
>                 Shawn
>
>                 ---
>                 This email has been checked for viruses by AVG.
>                 http://www.avg.com
>
>
>
>
>
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> 
> 	Virus-free. www.avg.com 
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> 
>
>
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: Solr 6.6.0 - High CPU during indexing

Posted by Joe Obernberger <jo...@gmail.com>.
Hi Walter - I see what you are saying, but the machine is not actively 
swapping (that would be the concern - right?) It's the CPU usage that 
I'm trying to figure out.  Htop reports that there is about 20G of disk 
cache in use, and about 76G of RAM in use by programs.  VIRT memory is 
what was requested to be allocated, not what was actually allocated; 
that is RES.  Unless my understanding of top is wrong.

http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_htop.jpg

atop:
http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_atop.jpg

-Joe

On 8/18/2017 3:12 PM, Walter Underwood wrote:
> I see a server with 100Gb of memory and processes (java and jsvc) using 203Gb of virtual memory. Hmm.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Aug 18, 2017, at 12:05 PM, Joe Obernberger <jo...@gmail.com> wrote:
>>
>> Thank you Shawn.  Please see:
>> http://www.lovehorsepower.com/Vesta
>> for screen shots of top (http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_top.jpg) and several screen shots over various times of jvisualvm.
>>
>> There is also the GC log and the regular solr.log for one server (named Vesta).  Please note that we are using HDFS for storage.  I love top, but also use htop and atop as they show additional information.  In general we are RAM limited and therefore do not have much cache for OS/disk as we would like, but this issue is CPU related.  After restarting the one node, the CPU usage stayed low for a while, but then eventually comes up to ~800% where it will stay.
>>
>> Please let me know if there is other information that I can provide, or what I should be looking for in the GC logs.  Thanks!
>>
>> -Joe
>>
>>
>> On 8/18/2017 2:25 PM, Shawn Heisey wrote:
>>> On 8/18/2017 10:37 AM, Joe Obernberger wrote:
>>>> Indexing about 15 million documents per day across 100 shards on 45
>>>> servers.  Up until about 350 million documents, each of the solr
>>>> instances was taking up about 1 core (100% CPU).  Recently, they all
>>>> jumped to 700%.  Is this normal?  Anything that I can check for?
>>>>
>>>> I don't see anything unusual in the solr logs.  Sample from the GC logs:
>>> A sample from GC logs won't reveal anything.  We would need the entire
>>> GC log.  To share something like that, you need a file sharing site,
>>> something like dropbox.  With the full log, we can analyze it for
>>> indications of GC problems.
>>>
>>> There are many things that can cause a sudden massive increase in CPU
>>> usage.  In this case, it is likely due to increased requirements because
>>> indexing 15 million documents per day has made the index larger, and now
>>> it probably needs additional resources on each server that are not
>>> available.
>>>
>>> The most common need for additional resources is unallocated system
>>> memory for the operating system to cache the index.  Something else that
>>> sometimes happens is that the index outgrows the max heap size, which we
>>> would be able to learn from the full GC log.
>>>
>>> These problem are discussed here:
>>>
>>> https://wiki.apache.org/solr/SolrPerformanceProblems
>>>
>>> Another useful piece of information is obtained by running the "top"
>>> utility on the commandline, pressing shift-M to sort by memory, and
>>> taking a screenshot of that display.  Then you would need a file-sharing
>>> website to share the image.
>>>
>>> Thanks,
>>> Shawn
>>>
>>> ---
>>> This email has been checked for viruses by AVG.
>>> http://www.avg.com
>>>
>


Re: Solr 6.6.0 - High CPU during indexing

Posted by Walter Underwood <wu...@wunderwood.org>.
I see a server with 100Gb of memory and processes (java and jsvc) using 203Gb of virtual memory. Hmm.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Aug 18, 2017, at 12:05 PM, Joe Obernberger <jo...@gmail.com> wrote:
> 
> Thank you Shawn.  Please see:
> http://www.lovehorsepower.com/Vesta
> for screen shots of top (http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_top.jpg) and several screen shots over various times of jvisualvm.
> 
> There is also the GC log and the regular solr.log for one server (named Vesta).  Please note that we are using HDFS for storage.  I love top, but also use htop and atop as they show additional information.  In general we are RAM limited and therefore do not have much cache for OS/disk as we would like, but this issue is CPU related.  After restarting the one node, the CPU usage stayed low for a while, but then eventually comes up to ~800% where it will stay.
> 
> Please let me know if there is other information that I can provide, or what I should be looking for in the GC logs.  Thanks!
> 
> -Joe
> 
> 
> On 8/18/2017 2:25 PM, Shawn Heisey wrote:
>> On 8/18/2017 10:37 AM, Joe Obernberger wrote:
>>> Indexing about 15 million documents per day across 100 shards on 45
>>> servers.  Up until about 350 million documents, each of the solr
>>> instances was taking up about 1 core (100% CPU).  Recently, they all
>>> jumped to 700%.  Is this normal?  Anything that I can check for?
>>> 
>>> I don't see anything unusual in the solr logs.  Sample from the GC logs:
>> A sample from GC logs won't reveal anything.  We would need the entire
>> GC log.  To share something like that, you need a file sharing site,
>> something like dropbox.  With the full log, we can analyze it for
>> indications of GC problems.
>> 
>> There are many things that can cause a sudden massive increase in CPU
>> usage.  In this case, it is likely due to increased requirements because
>> indexing 15 million documents per day has made the index larger, and now
>> it probably needs additional resources on each server that are not
>> available.
>> 
>> The most common need for additional resources is unallocated system
>> memory for the operating system to cache the index.  Something else that
>> sometimes happens is that the index outgrows the max heap size, which we
>> would be able to learn from the full GC log.
>> 
>> These problem are discussed here:
>> 
>> https://wiki.apache.org/solr/SolrPerformanceProblems
>> 
>> Another useful piece of information is obtained by running the "top"
>> utility on the commandline, pressing shift-M to sort by memory, and
>> taking a screenshot of that display.  Then you would need a file-sharing
>> website to share the image.
>> 
>> Thanks,
>> Shawn
>> 
>> ---
>> This email has been checked for viruses by AVG.
>> http://www.avg.com
>> 
> 


Re: Solr 6.6.0 - High CPU during indexing

Posted by Joe Obernberger <jo...@gmail.com>.
Thank you for the detailed response Shawn!  I've read it several times.  
Yes, that particular machine has 12 cores that are hyper-threaded.  Does 
Solr do something special when not running in HDFS to allocate memory 
that would result in VIRT showing memory required for index data size?

In my experience the VIRT shows (for java anyway) what the JVM wanted to 
allocate.  If I specify -Xms75G, VIRT will show 75G, but RES may show 
much less if the program doesn't do anything.
For example, I wrote a program that sleeps and then exits.  If I run it 
with java --Xms75G -jar blah.jar, top reports a VIRT of ~80G (notice PID 
29566)

top - 17:09:05 up 50 days,  4:24,  2 users,  load average: 9.82, 11.31, 
12.41
Tasks: 410 total,   1 running, 409 sleeping,   0 stopped,   0 zombie
Cpu(s): 39.8%us,  0.7%sy, 20.1%ni, 39.3%id,  0.0%wa, 0.0%hi,  0.2%si,  
0.0%st
Mem:  82505408k total, 76160560k used,  6344848k free, 356212k buffers
Swap: 33554428k total,   115756k used, 33438672k free, 14011992k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM TIME+  COMMAND
14260 solr      20   0 41.6g  33g  19m S 629.4 42.3 57412:59 java
29566 joeo      20   0 80.2g 275m  12m S  0.3  0.3 0:00.93 java

Note that the OS didn't actually give PID 29566 80G of memory, it 
actually gave it 275m.  Right?  Thanks again!

-Joe

On 8/18/2017 4:15 PM, Shawn Heisey wrote:
> On 8/18/2017 1:05 PM, Joe Obernberger wrote:
>> Thank you Shawn.  Please see:
>> http://www.lovehorsepower.com/Vesta
>> for screen shots of top
>> (http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_top.jpg) and
>> several screen shots over various times of jvisualvm.
>>
>> There is also the GC log and the regular solr.log for one server
>> (named Vesta).  Please note that we are using HDFS for storage.  I
>> love top, but also use htop and atop as they show additional
>> information.  In general we are RAM limited and therefore do not have
>> much cache for OS/disk as we would like, but this issue is CPU
>> related.  After restarting the one node, the CPU usage stayed low for
>> a while, but then eventually comes up to ~800% where it will stay.
> Your GC log does not show any evidence of extreme GC activity.  The
> longest pause in the whole thing is 1.4 seconds, and the average pause
> is only seven milliseconds.  Looking at percentile statistics, GC
> performance is amazing, especially given the rather large heap size.
>
> Problems with insufficient disk caching memory do frequently manifest as
> high CPU usage, because that situation will require waiting on I/O.
> When the CPU spends a lot of time in iowait, total CPU usage tends to be
> very high.  The iowait CPU percentage on the top output when that
> screenshot was taken was 8.5.  This sounds like a small number, but in
> fact it is quite high.  Very healthy Solr installs will have an
> extremely low iowait percentage -- possibly zero -- because they will
> rarely read off the disk.  I can see that on the atop screenshot, iowait
> percentage is 172.
>
> The load average on the system is well above 11. The atop output shows
> 24 CPU cores (which might actually be 12 if the CPUs have
> hypherthreading).  Even with all those CPUs, that load average is high
> enough to be concerned.
>
> I can see that the system has about 70GB of memory directly allocated to
> various Java processes, leaving about 30GB for disk caching purposes.
> Walter has noted that those same java processes have allocated over
> 200GB of virtual memory.  If we subtract the 70GB of allocated heap,
> this would tend to indicate that those processes, one of which is Solr,
> are accessing about 130GB of data.
>
> I have no idea how the memory situation works with HDFS, or how this
> screenshot should look on a healthy system.  Having 30GB of memory to
> cache the 130GB of data opened by these Java processes might be enough,
> or it might not.  If this were a system NOT running HDFS, then I would
> say that there isn't enough memory.  Putting HDFS into this mix makes it
> difficult for me to say anything useful, simply because I do not know
> much about it.  You should consult with an HDFS expert and ask them how
> to make sure that actual disk accesses are rare -- you want as much of
> the index data sitting in RAM on the Solr server as you can possibly get.
>
> Addressing a message later in the thread: The concern with high virtual
> memory is actually NOT swapping.  It's effective use of disk caching
> memory.  Let's examine a hypothetical situation with a machine running
> nothing but Solr, using a standard filesystem for data storage.
>
> The "top" output in this hypothetical situation indicates that total
> system memory is 128GB and there is no swap usage.  The Solr process has
> a RES memory size of 25GB, a SHR size of a few megabytes, and a VIRT
> size of 1000GB.  This tells me that their heap is approximately 25 GB,
> and that Solr is accessing 975GB of index data.  At that point, I know
> that they have about 103GB of memory to cache nearly a terabyte of index
> data.  This is a situation where there is nowhere near enough memory for
> good performance.
>
> Thanks,
> Shawn
>
>
> ---
> This email has been checked for viruses by AVG.
> http://www.avg.com
>


Re: Solr 6.6.0 - High CPU during indexing

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/18/2017 1:05 PM, Joe Obernberger wrote:
> Thank you Shawn.  Please see:
> http://www.lovehorsepower.com/Vesta
> for screen shots of top
> (http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_top.jpg) and
> several screen shots over various times of jvisualvm.
>
> There is also the GC log and the regular solr.log for one server
> (named Vesta).  Please note that we are using HDFS for storage.  I
> love top, but also use htop and atop as they show additional
> information.  In general we are RAM limited and therefore do not have
> much cache for OS/disk as we would like, but this issue is CPU
> related.  After restarting the one node, the CPU usage stayed low for
> a while, but then eventually comes up to ~800% where it will stay. 

Your GC log does not show any evidence of extreme GC activity.  The
longest pause in the whole thing is 1.4 seconds, and the average pause
is only seven milliseconds.  Looking at percentile statistics, GC
performance is amazing, especially given the rather large heap size.

Problems with insufficient disk caching memory do frequently manifest as
high CPU usage, because that situation will require waiting on I/O. 
When the CPU spends a lot of time in iowait, total CPU usage tends to be
very high.  The iowait CPU percentage on the top output when that
screenshot was taken was 8.5.  This sounds like a small number, but in
fact it is quite high.  Very healthy Solr installs will have an
extremely low iowait percentage -- possibly zero -- because they will
rarely read off the disk.  I can see that on the atop screenshot, iowait
percentage is 172.

The load average on the system is well above 11. The atop output shows
24 CPU cores (which might actually be 12 if the CPUs have
hypherthreading).  Even with all those CPUs, that load average is high
enough to be concerned.

I can see that the system has about 70GB of memory directly allocated to
various Java processes, leaving about 30GB for disk caching purposes. 
Walter has noted that those same java processes have allocated over
200GB of virtual memory.  If we subtract the 70GB of allocated heap,
this would tend to indicate that those processes, one of which is Solr,
are accessing about 130GB of data.

I have no idea how the memory situation works with HDFS, or how this
screenshot should look on a healthy system.  Having 30GB of memory to
cache the 130GB of data opened by these Java processes might be enough,
or it might not.  If this were a system NOT running HDFS, then I would
say that there isn't enough memory.  Putting HDFS into this mix makes it
difficult for me to say anything useful, simply because I do not know
much about it.  You should consult with an HDFS expert and ask them how
to make sure that actual disk accesses are rare -- you want as much of
the index data sitting in RAM on the Solr server as you can possibly get.

Addressing a message later in the thread: The concern with high virtual
memory is actually NOT swapping.  It's effective use of disk caching
memory.  Let's examine a hypothetical situation with a machine running
nothing but Solr, using a standard filesystem for data storage.

The "top" output in this hypothetical situation indicates that total
system memory is 128GB and there is no swap usage.  The Solr process has
a RES memory size of 25GB, a SHR size of a few megabytes, and a VIRT
size of 1000GB.  This tells me that their heap is approximately 25 GB,
and that Solr is accessing 975GB of index data.  At that point, I know
that they have about 103GB of memory to cache nearly a terabyte of index
data.  This is a situation where there is nowhere near enough memory for
good performance.

Thanks,
Shawn


Re: Solr 6.6.0 - High CPU during indexing

Posted by Joe Obernberger <jo...@gmail.com>.
Thank you Shawn.  Please see:
http://www.lovehorsepower.com/Vesta
for screen shots of top 
(http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_top.jpg) and several 
screen shots over various times of jvisualvm.

There is also the GC log and the regular solr.log for one server (named 
Vesta).  Please note that we are using HDFS for storage.  I love top, 
but also use htop and atop as they show additional information.  In 
general we are RAM limited and therefore do not have much cache for 
OS/disk as we would like, but this issue is CPU related.  After 
restarting the one node, the CPU usage stayed low for a while, but then 
eventually comes up to ~800% where it will stay.

Please let me know if there is other information that I can provide, or 
what I should be looking for in the GC logs.  Thanks!

-Joe


On 8/18/2017 2:25 PM, Shawn Heisey wrote:
> On 8/18/2017 10:37 AM, Joe Obernberger wrote:
>> Indexing about 15 million documents per day across 100 shards on 45
>> servers.  Up until about 350 million documents, each of the solr
>> instances was taking up about 1 core (100% CPU).  Recently, they all
>> jumped to 700%.  Is this normal?  Anything that I can check for?
>>
>> I don't see anything unusual in the solr logs.  Sample from the GC logs:
> A sample from GC logs won't reveal anything.  We would need the entire
> GC log.  To share something like that, you need a file sharing site,
> something like dropbox.  With the full log, we can analyze it for
> indications of GC problems.
>
> There are many things that can cause a sudden massive increase in CPU
> usage.  In this case, it is likely due to increased requirements because
> indexing 15 million documents per day has made the index larger, and now
> it probably needs additional resources on each server that are not
> available.
>
> The most common need for additional resources is unallocated system
> memory for the operating system to cache the index.  Something else that
> sometimes happens is that the index outgrows the max heap size, which we
> would be able to learn from the full GC log.
>
> These problem are discussed here:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems
>
> Another useful piece of information is obtained by running the "top"
> utility on the commandline, pressing shift-M to sort by memory, and
> taking a screenshot of that display.  Then you would need a file-sharing
> website to share the image.
>
> Thanks,
> Shawn
>
> ---
> This email has been checked for viruses by AVG.
> http://www.avg.com
>


Re: Solr 6.6.0 - High CPU during indexing

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/18/2017 10:37 AM, Joe Obernberger wrote:
> Indexing about 15 million documents per day across 100 shards on 45
> servers.  Up until about 350 million documents, each of the solr
> instances was taking up about 1 core (100% CPU).  Recently, they all
> jumped to 700%.  Is this normal?  Anything that I can check for?
> 
> I don't see anything unusual in the solr logs.  Sample from the GC logs:

A sample from GC logs won't reveal anything.  We would need the entire
GC log.  To share something like that, you need a file sharing site,
something like dropbox.  With the full log, we can analyze it for
indications of GC problems.

There are many things that can cause a sudden massive increase in CPU
usage.  In this case, it is likely due to increased requirements because
indexing 15 million documents per day has made the index larger, and now
it probably needs additional resources on each server that are not
available.

The most common need for additional resources is unallocated system
memory for the operating system to cache the index.  Something else that
sometimes happens is that the index outgrows the max heap size, which we
would be able to learn from the full GC log.

These problem are discussed here:

https://wiki.apache.org/solr/SolrPerformanceProblems

Another useful piece of information is obtained by running the "top"
utility on the commandline, pressing shift-M to sort by memory, and
taking a screenshot of that display.  Then you would need a file-sharing
website to share the image.

Thanks,
Shawn