You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Naveen Kumar <na...@gmail.com> on 2021/10/12 12:50:55 UTC

Re: apache ignite 2.10.0 heap starvation

On the same subject, we have made the changes as suggested

nodes are running on 8 CORE and 128 GB MEM VMs, i've added the following
jvm parameters

-XX:ParallelGCThreads=4
-XX:ConcGCThreads=2
-XX:MaxGCPauseMillis=200
-XX:InitiatingHeapOccupancyPercent=40

Not used any of these below, using the default values for all these,
which is 8 (as the number of cores)

        <property name="systemThreadPoolSize" value="8"/>
        <property name="publicThreadPoolSize" value="8"/>
        <property name="queryThreadPoolSize" value="8"/>
        <property name="serviceThreadPoolSize" value="8"/>

I could still see our heap is increasing,  but atleast I could see a
pattern now (not like earlier which is almost exponential)

Attaching the screenshots of heap, CPU, GC and start script with all the
jvm arguments used.
what do you think I should be changing to run to use heap effectively



On Wed, Sep 29, 2021 at 2:35 PM Ibrahim Altun <ib...@segmentify.com>
wrote:

> after many configuration changes and optimizations, i think i've solved
> the heap problem.
>
> here are the changes that i applied to the system;
> JVM changes ->
> https://medium.com/@hoan.nguyen.it/how-did-g1gc-tuning-flags-affect-our-back-end-web-app-c121d38dfe56
> helped a lot
>
> nodes are running on 12CORE and 64GB MEM servers, i've added the following
> jvm parameters
>
> -XX:ParallelGCThreads=6
> -XX:ConcGCThreads=2
> -XX:MaxGCPauseMillis=200
> -XX:InitiatingHeapOccupancyPercent=40
>
> on ignite configuration i've changed all thread pool sizes, which were
> much more than these;
>         <property name="systemThreadPoolSize" value="12"/>
>         <property name="publicThreadPoolSize" value="12"/>
>         <property name="queryThreadPoolSize" value="12"/>
>         <property name="serviceThreadPoolSize" value="12"/>
>         <property name="stripedPoolSize" value="12"/>
>         <property name="dataStreamerThreadPoolSize" value="12"/>
>         <property name="rebalanceThreadPoolSize" value="12"/>
>
> Here is the 16 hours of GC report;
>
> https://gceasy.io/diamondgc-report.jsp?p=c2hhcmVkLzIwMjEvMDkvMjkvLS1nYy5sb2cuMC5jdXJyZW50LS04LTU4LTMx&channel=WEB
>
>
>
> On 2021/09/27 17:11:21, Ilya Korol <ll...@gmail.com> wrote:
> > Actually Query interface doesn't define close() method, but QueryCursor
> > does.
> > In your snippets you're using try-with-resource construction for SELECT
> > queries which is good, but when you run MERGE INTO query you would also
> > get an QueryCursor as a result of
> >
> > igniteCacheService.getCache(ID,
> IgniteCacheType.LABEL).query(insertQuery);
> >
> > so maybe this QueryCursor objects still hold some resources/memory.
> > Javadoc for QueryCursor states that you should always close cursors.
> >
> > To simplify cursor closing there is a cursor.getAll() method that will
> > do this for you under the hood.
> >
> >
> > On 2021/09/13 06:17:21, Ibrahim Altun <i....@segmentify.com> wrote:
> >  > Hi Ilya,>
> >  >
> >  > since this is production environment i could not risk to take heap
> > dump for now, but i will try to convince my superiors to get one and
> > analyze it.>
> >  >
> >  > Queries are heavily used in our system but aren't they autoclosable
> > objects? do we have to close them anyway?>
> >  >
> >  > here are some usage examples on our system;>
> >  > --insert query is like this; MERGE INTO "ProductLabel" ("productId",
> > "label", "language") VALUES (?, ?, ?)>
> >  > igniteCacheService.getCache(ID,
> > IgniteCacheType.LABEL).query(insertQuery);>
> >  >
> >  > another usage example;>
> >  > --sqlFieldsQuery is like this; >
> >  > String sql = "SELECT _val FROM \"UserRecord\" WHERE \"email\" IN
> (?)";>
> >  > SqlFieldsQuery sqlFieldsQuery = new SqlFieldsQuery(sql);>
> >  > sqlFieldsQuery.setLazy(true);>
> >  > sqlFieldsQuery.setArgs(emails.toArray());>
> >  >
> >  > try (QueryCursor<List<?>> ignored = igniteCacheService.getCache(ID,
> > IgniteCacheType.USER).query(sqlFieldsQuery)) {...}>
> >  >
> >  >
> >  >
> >  > On 2021/09/12 20:28:09, Shishkov Ilya <sh...@gmail.com> wrote: >
> >  > > Hi, Ibrahim!>
> >  > > Have you analyzed the heap dump of the server node JVMs?>
> >  > > In case your application executes queries are their cursors closed?>
> >  > > >
> >  > > пт, 10 сент. 2021 г. в 11:54, Ibrahim Altun <ib...@segmentify.com
> >:>
> >  > > >
> >  > > > Igniters any comment on this issue, we are facing huge GC
> > problems on>
> >  > > > production environment, please advise.>
> >  > > >>
> >  > > > On 2021/09/07 14:11:09, Ibrahim Altun <ib...@segmentify.com>>
> >  > > > wrote:>
> >  > > > > Hi,>
> >  > > > >>
> >  > > > > totally 400 - 600K reads/writes/updates>
> >  > > > > 12core>
> >  > > > > 64GB RAM>
> >  > > > > no iowait>
> >  > > > > 10 nodes>
> >  > > > >>
> >  > > > > On 2021/09/07 12:51:28, Piotr Jagielski <pj...@touk.pl> wrote:>
> >  > > > > > Hi,>
> >  > > > > > Can you provide some information on how you use the cluster?
> > How many>
> >  > > > reads/writes/updates per second? Also CPU / RAM spec of cluster
> > nodes?>
> >  > > > > >>
> >  > > > > > We observed full GC / CPU load / OOM killer when loading big
> > amount of>
> >  > > > data (15 mln records, data streamer + allowOverwrite=true). We've
> > seen>
> >  > > > 200-400k updates per sec on JMX metrics, but load up to 10 on
> > nodes, iowait>
> >  > > > to 30%. Our cluster is 3 x 4CPU, 16GB RAM (already upgradingto
> > 8CPU, 32GB>
> >  > > > RAM). Ignite 2.10>
> >  > > > > >>
> >  > > > > > Regards,>
> >  > > > > > Piotr>
> >  > > > > >>
> >  > > > > > On 2021/09/02 08:36:07, Ibrahim Altun <ib...@segmentify.com>>
> >  > > > wrote:>
> >  > > > > > > After upgrading from 2.7.1 version to 2.10.0 version ignite
> > nodes>
> >  > > > facing>
> >  > > > > > > huge full GC operations after 24-36 hours after node start.>
> >  > > > > > >>
> >  > > > > > > We try to increase heap size but no luck, here is the start>
> >  > > > configuration>
> >  > > > > > > for nodes;>
> >  > > > > > >>
> >  > > > > > > JVM_OPTS="$JVM_OPTS -Xms12g -Xmx12g -server>
> >  > > > > > >>
> >  > > >
> >
> -javaagent:/etc/prometheus/jmx_prometheus_javaagent-0.14.0.jar=8090:/etc/prometheus/jmx.yml>
>
> >
> >  > > > > > > -Dcom.sun.management.jmxremote>
> >  > > > > > > -Dcom.sun.management.jmxremote.authenticate=false>
> >  > > > > > > -Dcom.sun.management.jmxremote.port=49165>
> >  > > > > > > -Dcom.sun.management.jmxremote.host=localhost>
> >  > > > > > > -XX:MaxMetaspaceSize=256m -XX:MaxDirectMemorySize=1g>
> >  > > > > > > -DIGNITE_SKIP_CONFIGURATION_CONSISTENCY_CHECK=true>
> >  > > > > > > -DIGNITE_WAL_MMAP=true
> > -DIGNITE_BPLUS_TREE_LOCK_RETRIES=100000>
> >  > > > > > > -Djava.net.preferIPv4Stack=true">
> >  > > > > > >>
> >  > > > > > > JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch -XX:+UseG1GC>
> >  > > > > > > -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC>
> >  > > > > > > -XX:+UseStringDeduplication
> > -Xloggc:/var/log/apache-ignite/gc.log>
> >  > > > > > > -XX:+PrintGCDetails -XX:+PrintGCDateStamps>
> >  > > > > > > -XX:+PrintTenuringDistribution -XX:+PrintGCCause>
> >  > > > > > > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10>
> >  > > > > > > -XX:GCLogFileSize=100M">
> >  > > > > > >>
> >  > > > > > > here is the 80 hours of GC analyize report:>
> >  > > > > > >>
> >  > > >
> >
> https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMDgvMzEvLS1nYy5sb2cuMC5jdXJyZW50LnppcC0tNS01MS0yOQ==&channel=WEB>
>
> >
> >  > > > > > >>
> >  > > > > > > do we need more heap size or is there a BUG that we need to
> > be aware?>
> >  > > > > > >>
> >  > > > > > > here is the node configuration:>
> >  > > > > > >>
> >  > > > > > > <?xml version="1.0" encoding="UTF-8"?>>
> >  > > > > > > <beans xmlns="http://www.springframework.org/schema/beans">
> >  > > > > > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
> >  > > > > > > xsi:schemaLocation=">
> >  > > > > > > http://www.springframework.org/schema/beans>
> >  > > > > > >
> > http://www.springframework.org/schema/beans/spring-beans.xsd">>
> >  > > > > > > <bean id="ignite.cfg">
> >  > > > > > >
> class="org.apache.ignite.configuration.IgniteConfiguration">>
> >  > > > > > > <property name="gridLogger">>
> >  > > > > > > <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">>
> >  > > > > > > <constructor-arg type="java.lang.String">
> >  > > > > > > value="/etc/apache-ignite/ignite-log4j2.xml"/>>
> >  > > > > > > </bean>>
> >  > > > > > > </property>>
> >  > > > > > > <property name="communicationSpi">>
> >  > > > > > > <bean>
> >  > > >
> > class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">>
> >  > > > > > > <property name="usePairedConnections" value="true"/>>
> >  > > > > > > </bean>>
> >  > > > > > > </property>>
> >  > > > > > > <property name="failureDetectionTimeout" value="60000"/>>
> >  > > > > > > <property name="systemThreadPoolSize" value="128"/>>
> >  > > > > > > <property name="publicThreadPoolSize" value="128"/>>
> >  > > > > > > <property name="queryThreadPoolSize" value="128"/>>
> >  > > > > > > <property name="serviceThreadPoolSize" value="128"/>>
> >  > > > > > > <property name="stripedPoolSize" value="128"/>>
> >  > > > > > > <property name="dataStreamerThreadPoolSize" value="4"/>>
> >  > > > > > > <property name="rebalanceThreadPoolSize" value="16"/>>
> >  > > > > > >>
> >  > > > > > > <!-- Explicitly enable peer class loading. -->>
> >  > > > > > > <property name="peerClassLoadingEnabled" value="true"/>>
> >  > > > > > >>
> >  > > > > > > <!-- Enable deploymentSpi,>
> >  > > > > > > /usr/share/apache-ignite/libs/segmentify directory will be
> > checked>
> >  > > > > > > every 5 seconds for changed files-->>
> >  > > > > > > <property name="deploymentSpi">>
> >  > > > > > > <bean>
> >  > > > class="org.apache.ignite.spi.deployment.uri.UriDeploymentSpi">>
> >  > > > > > > <property name="temporaryDirectoryPath">
> >  > > > > > > value="/tmp/temp_ignite_libs"/>>
> >  > > > > > > <property name="uriList">>
> >  > > > > > > <list>>
> >  > > > > > >>
> >  > > > > > > <va...@localhost>
> >  > > > /usr/share/apache-ignite/libs/segmentify/</value>>
> >  > > > > > > </list>>
> >  > > > > > > </property>>
> >  > > > > > > </bean>>
> >  > > > > > > </property>>
> >  > > > > > >>
> >  > > > > > > <property name="cacheConfiguration">>
> >  > > > > > > <list>>
> >  > > > > > > <!-- Partitioned cache example configuration (Atomic>
> >  > > > mode). -->>
> >  > > > > > > <bean>
> >  > > > class="org.apache.ignite.configuration.CacheConfiguration">>
> >  > > > > > > <property name="name" value="default"/>>
> >  > > > > > > <property name="atomicityMode" value="ATOMIC"/>>
> >  > > > > > > <property name="backups" value="1"/>>
> >  > > > > > > </bean>>
> >  > > > > > > </list>>
> >  > > > > > > </property>>
> >  > > > > > >>
> >  > > > > > > <!-- Explicitly configure TCP discovery SPI to provide list
> > of>
> >  > > > > > > initial nodes. -->>
> >  > > > > > > <property name="discoverySpi">>
> >  > > > > > > <bean>
> >  > > > class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">>
> >  > > > > > > <property name="networkTimeout" value="60000"/>>
> >  > > > > > > <property name="ipFinder">>
> >  > > > > > > <bean>
> >  > > > > > >>
> >  > > >
> >
> class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">>
>
> >
> >  > > > > > > <property name="addresses">>
> >  > > > > > > <list>>
> >  > > > > > > <!-- THERE ARE 10 NODES -->>
> >  > > > > > > </list>>
> >  > > > > > > </property>>
> >  > > > > > > </bean>>
> >  > > > > > > </property>>
> >  > > > > > > </bean>>
> >  > > > > > > </property>>
> >  > > > > > >>
> >  > > > > > > <!-- Enabling Apache Ignite native persistence. -->>
> >  > > > > > > <property name="dataStorageConfiguration">>
> >  > > > > > > <bean>
> >  > > > class="org.apache.ignite.configuration.DataStorageConfiguration">>
> >  > > > > > > <property name="defaultDataRegionConfiguration">>
> >  > > > > > > <bean>
> >  > > > > > >
> > class="org.apache.ignite.configuration.DataRegionConfiguration">>
> >  > > > > > > <property name="persistenceEnabled">
> >  > > > value="true"/>>
> >  > > > > > > <property name="checkpointPageBufferSize">
> >  > > > > > > value="#{ 2L * 1024 * 1024 * 1024}"/>>
> >  > > > > > > <property name="maxSize" value="#{ 40L * 1024 *>
> >  > > > > > > 1024 * 1024 }"/>>
> >  > > > > > > </bean>>
> >  > > > > > > </property>>
> >  > > > > > > <property name="storagePath">
> >  > > > value="/srv/ignite/persist"/>>
> >  > > > > > > <property name="walPath" value="/srv/ignite/wal"/>>
> >  > > > > > > <property name="walArchivePath" value="/srv/ignite/wal"/>>
> >  > > > > > > <property name="walMode" value="LOG_ONLY"/>>
> >  > > > > > > <property name="walSegmentSize" value="#{ 256L * 1024 *>
> >  > > > 1024 }"/>>
> >  > > > > > > <property name="walFlushFrequency" value="5000"/>>
> >  > > > > > > <property name="maxWalArchiveSize" value="#{ 512L * 1024>
> >  > > > * 1024 }"/>>
> >  > > > > > > <property name="writeThrottlingEnabled" value="true"/>>
> >  > > > > > > <property name="checkpointFrequency" value="300000"/>>
> >  > > > > > > <property name="checkpointWriteOrder" value="SEQUENTIAL">
> >  > > > />>
> >  > > > > > > </bean>>
> >  > > > > > > </property>>
> >  > > > > > > </bean>>
> >  > > > > > >>
> >  > > > > > >>
> >  > > > > > > -->
> >  > > > > > > <https://www.segmentify.com/>İbrahim Halil AltunSenior
> > Software>
> >  > > > Engineer+90>
> >  > > > > > > 536 3327510 • segmentify.com →
> > <https://www.segmentify.com/>UK •>
> >  > > > Germany •>
> >  > > > > > > Turkey <https://www.segmentify.com/ecommerce-growth-show>>
> >  > > > > > > <https://www.g2.com/products/segmentify/reviews>>
> >  > > > > > >>
> >  > > > > >>
> >  > > > >>
> >  > > >>
> >  > > >
> >  >
> >
>


-- 
Thanks & Regards,
Naveen Bandaru

Re[4]: apache ignite 2.10.0 heap starvation

Posted by Zhenya Stanilovsky <ar...@mail.ru>.
Node is going down due to full gc triggering, you can avoid it by [1] but you obtain ALL not only life objects.
Additionally you can try to attach with async prof [2] or probably visual vm
 
[1]  https://stackoverflow.com/questions/23393480/can-heap-dump-be-created-for-analyzing-memory-leak-without-garbage-collection
[2]  https://github.com/jvm-profiling-tools/async-profiler
 
>heap dump generation does not seems to be working.
>whenever I tried to generate the heap dump, node is going down, bit strange,
>what else we could analyze  
>On Tue, Oct 12, 2021 at 7:35 PM Zhenya Stanilovsky < arzamas123@mail.ru > wrote:
>>hi, highly likely the problem in your code - cpu usage grow synchronously with heap increasing between 00.00 and 12.00.
>>You need to analyze heap dump, no additional settings will help here.
>>   
>>>On the same subject, we have made the changes as suggested 
>>> 
>>>nodes are running on 8 CORE and 128 GB MEM VMs, i've added the following jvm parameters
>>>
>>>-XX:ParallelGCThreads=4
>>>-XX:ConcGCThreads=2
>>>-XX:MaxGCPauseMillis=200
>>>-XX:InitiatingHeapOccupancyPercent=40
>>> 
>>>Not used any of these below, using the default values for all these, which is 8 (as the number of cores)
>>> 
>>>        <property name="systemThreadPoolSize" value="8"/>
>>>        <property name="publicThreadPoolSize" value="8"/>
>>>        <property name="queryThreadPoolSize" value="8"/>
>>>        <property name="serviceThreadPoolSize" value="8"/>
>>> 
>>>I could still see our heap is increasing,  but atleast I could see a pattern now (not like earlier which is almost exponential)
>>> 
>>>Attaching the screenshots of heap, CPU, GC and start script with all the jvm arguments used. 
>>>what do you think I should be changing to run to use heap effectively 
>>> 
>>>   
>>>On Wed, Sep 29, 2021 at 2:35 PM Ibrahim Altun < ibrahim.altun@segmentify.com > wrote:
>>>>after many configuration changes and optimizations, i think i've solved the heap problem.
>>>>
>>>>here are the changes that i applied to the system;
>>>>JVM changes ->  https://medium.com/@hoan.nguyen.it/how-did-g1gc-tuning-flags-affect-our-back-end-web-app-c121d38dfe56 helped a lot
>>>>
>>>>nodes are running on 12CORE and 64GB MEM servers, i've added the following jvm parameters
>>>>
>>>>-XX:ParallelGCThreads=6
>>>>-XX:ConcGCThreads=2
>>>>-XX:MaxGCPauseMillis=200
>>>>-XX:InitiatingHeapOccupancyPercent=40
>>>>
>>>>on ignite configuration i've changed all thread pool sizes, which were much more than these;
>>>>        <property name="systemThreadPoolSize" value="12"/>
>>>>        <property name="publicThreadPoolSize" value="12"/>
>>>>        <property name="queryThreadPoolSize" value="12"/>
>>>>        <property name="serviceThreadPoolSize" value="12"/>
>>>>        <property name="stripedPoolSize" value="12"/>
>>>>        <property name="dataStreamerThreadPoolSize" value="12"/>
>>>>        <property name="rebalanceThreadPoolSize" value="12"/>
>>>>
>>>>Here is the 16 hours of GC report;
>>>>https://gceasy.io/diamondgc-report.jsp?p=c2hhcmVkLzIwMjEvMDkvMjkvLS1nYy5sb2cuMC5jdXJyZW50LS04LTU4LTMx&channel=WEB
>>>>
>>>>
>>>>
>>>>On 2021/09/27 17:11:21, Ilya Korol < llivezking@gmail.com > wrote:
>>>>> Actually Query interface doesn't define close() method, but QueryCursor
>>>>> does.
>>>>> In your snippets you're using try-with-resource construction for SELECT
>>>>> queries which is good, but when you run MERGE INTO query you would also
>>>>> get an QueryCursor as a result of
>>>>>
>>>>> igniteCacheService.getCache(ID, IgniteCacheType.LABEL).query(insertQuery);
>>>>>
>>>>> so maybe this QueryCursor objects still hold some resources/memory.
>>>>> Javadoc for QueryCursor states that you should always close cursors.
>>>>>
>>>>> To simplify cursor closing there is a cursor.getAll() method that will
>>>>> do this for you under the hood.
>>>>>
>>>>>
>>>>> On 2021/09/13 06:17:21, Ibrahim Altun < i...@segmentify.com > wrote:
>>>>>  > Hi Ilya,>
>>>>>  >
>>>>>  > since this is production environment i could not risk to take heap
>>>>> dump for now, but i will try to convince my superiors to get one and
>>>>> analyze it.>
>>>>>  >
>>>>>  > Queries are heavily used in our system but aren't they autoclosable
>>>>> objects? do we have to close them anyway?>
>>>>>  >
>>>>>  > here are some usage examples on our system;>
>>>>>  > --insert query is like this; MERGE INTO "ProductLabel" ("productId",
>>>>> "label", "language") VALUES (?, ?, ?)>
>>>>>  > igniteCacheService.getCache(ID,
>>>>> IgniteCacheType.LABEL).query(insertQuery);>
>>>>>  >
>>>>>  > another usage example;>
>>>>>  > --sqlFieldsQuery is like this; >
>>>>>  > String sql = "SELECT _val FROM \"UserRecord\" WHERE \"email\" IN (?)";>
>>>>>  > SqlFieldsQuery sqlFieldsQuery = new SqlFieldsQuery(sql);>
>>>>>  > sqlFieldsQuery.setLazy(true);>
>>>>>  > sqlFieldsQuery.setArgs(emails.toArray());>
>>>>>  >
>>>>>  > try (QueryCursor<List<?>> ignored = igniteCacheService.getCache(ID,
>>>>> IgniteCacheType.USER).query(sqlFieldsQuery)) {...}>
>>>>>  >
>>>>>  >
>>>>>  >
>>>>>  > On 2021/09/12 20:28:09, Shishkov Ilya < sh...@gmail.com > wrote: >
>>>>>  > > Hi, Ibrahim!>
>>>>>  > > Have you analyzed the heap dump of the server node JVMs?>
>>>>>  > > In case your application executes queries are their cursors closed?>
>>>>>  > > >
>>>>>  > > пт, 10 сент. 2021 г. в 11:54, Ibrahim Altun < ib...@segmentify.com >:>
>>>>>  > > >
>>>>>  > > > Igniters any comment on this issue, we are facing huge GC
>>>>> problems on>
>>>>>  > > > production environment, please advise.>
>>>>>  > > >>
>>>>>  > > > On 2021/09/07 14:11:09, Ibrahim Altun < ib...@segmentify.com >>
>>>>>  > > > wrote:>
>>>>>  > > > > Hi,>
>>>>>  > > > >>
>>>>>  > > > > totally 400 - 600K reads/writes/updates>
>>>>>  > > > > 12core>
>>>>>  > > > > 64GB RAM>
>>>>>  > > > > no iowait>
>>>>>  > > > > 10 nodes>
>>>>>  > > > >>
>>>>>  > > > > On 2021/09/07 12:51:28, Piotr Jagielski < pj...@touk.pl > wrote:>
>>>>>  > > > > > Hi,>
>>>>>  > > > > > Can you provide some information on how you use the cluster?
>>>>> How many>
>>>>>  > > > reads/writes/updates per second? Also CPU / RAM spec of cluster
>>>>> nodes?>
>>>>>  > > > > >>
>>>>>  > > > > > We observed full GC / CPU load / OOM killer when loading big
>>>>> amount of>
>>>>>  > > > data (15 mln records, data streamer + allowOverwrite=true). We've
>>>>> seen>
>>>>>  > > > 200-400k updates per sec on JMX metrics, but load up to 10 on
>>>>> nodes, iowait>
>>>>>  > > > to 30%. Our cluster is 3 x 4CPU, 16GB RAM (already upgradingto
>>>>> 8CPU, 32GB>
>>>>>  > > > RAM). Ignite 2.10>
>>>>>  > > > > >>
>>>>>  > > > > > Regards,>
>>>>>  > > > > > Piotr>
>>>>>  > > > > >>
>>>>>  > > > > > On 2021/09/02 08:36:07, Ibrahim Altun < ib...@segmentify.com >>
>>>>>  > > > wrote:>
>>>>>  > > > > > > After upgrading from 2.7.1 version to 2.10.0 version ignite
>>>>> nodes>
>>>>>  > > > facing>
>>>>>  > > > > > > huge full GC operations after 24-36 hours after node start.>
>>>>>  > > > > > >>
>>>>>  > > > > > > We try to increase heap size but no luck, here is the start>
>>>>>  > > > configuration>
>>>>>  > > > > > > for nodes;>
>>>>>  > > > > > >>
>>>>>  > > > > > > JVM_OPTS="$JVM_OPTS -Xms12g -Xmx12g -server>
>>>>>  > > > > > >>
>>>>>  > > >
>>>>> -javaagent:/etc/prometheus/jmx_prometheus_javaagent-0.14.0.jar=8090:/etc/prometheus/jmx.yml>
>>>>>
>>>>>  > > > > > > -Dcom.sun.management.jmxremote>
>>>>>  > > > > > > -Dcom.sun.management.jmxremote.authenticate=false>
>>>>>  > > > > > > -Dcom.sun.management.jmxremote.port=49165>
>>>>>  > > > > > > -Dcom.sun.management.jmxremote.host=localhost>
>>>>>  > > > > > > -XX:MaxMetaspaceSize=256m -XX:MaxDirectMemorySize=1g>
>>>>>  > > > > > > -DIGNITE_SKIP_CONFIGURATION_CONSISTENCY_CHECK=true>
>>>>>  > > > > > > -DIGNITE_WAL_MMAP=true
>>>>> -DIGNITE_BPLUS_TREE_LOCK_RETRIES=100000>
>>>>>  > > > > > > -Djava.net.preferIPv4Stack=true">
>>>>>  > > > > > >>
>>>>>  > > > > > > JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch -XX:+UseG1GC>
>>>>>  > > > > > > -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC>
>>>>>  > > > > > > -XX:+UseStringDeduplication
>>>>> -Xloggc:/var/log/apache-ignite/gc.log>
>>>>>  > > > > > > -XX:+PrintGCDetails -XX:+PrintGCDateStamps>
>>>>>  > > > > > > -XX:+PrintTenuringDistribution -XX:+PrintGCCause>
>>>>>  > > > > > > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10>
>>>>>  > > > > > > -XX:GCLogFileSize=100M">
>>>>>  > > > > > >>
>>>>>  > > > > > > here is the 80 hours of GC analyize report:>
>>>>>  > > > > > >>
>>>>>  > > >
>>>>>  https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMDgvMzEvLS1nYy5sb2cuMC5jdXJyZW50LnppcC0tNS01MS0yOQ==&channel=WEB >
>>>>>
>>>>>  > > > > > >>
>>>>>  > > > > > > do we need more heap size or is there a BUG that we need to
>>>>> be aware?>
>>>>>  > > > > > >>
>>>>>  > > > > > > here is the node configuration:>
>>>>>  > > > > > >>
>>>>>  > > > > > > <?xml version="1.0" encoding="UTF-8"?>>
>>>>>  > > > > > > <beans xmlns=" http://www.springframework.org/schema/beans ">
>>>>>  > > > > > > xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance ">
>>>>>  > > > > > > xsi:schemaLocation=">
>>>>>  > > > > > >  http://www.springframework.org/schema/beans >
>>>>>  > > > > > >
>>>>>  http://www.springframework.org/schema/beans/spring-beans.xsd ">>
>>>>>  > > > > > > <bean id="ignite.cfg">
>>>>>  > > > > > > class="org.apache.ignite.configuration.IgniteConfiguration">>
>>>>>  > > > > > > <property name="gridLogger">>
>>>>>  > > > > > > <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">>
>>>>>  > > > > > > <constructor-arg type="java.lang.String">
>>>>>  > > > > > > value="/etc/apache-ignite/ignite-log4j2.xml"/>>
>>>>>  > > > > > > </bean>>
>>>>>  > > > > > > </property>>
>>>>>  > > > > > > <property name="communicationSpi">>
>>>>>  > > > > > > <bean>
>>>>>  > > >
>>>>> class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">>
>>>>>  > > > > > > <property name="usePairedConnections" value="true"/>>
>>>>>  > > > > > > </bean>>
>>>>>  > > > > > > </property>>
>>>>>  > > > > > > <property name="failureDetectionTimeout" value="60000"/>>
>>>>>  > > > > > > <property name="systemThreadPoolSize" value="128"/>>
>>>>>  > > > > > > <property name="publicThreadPoolSize" value="128"/>>
>>>>>  > > > > > > <property name="queryThreadPoolSize" value="128"/>>
>>>>>  > > > > > > <property name="serviceThreadPoolSize" value="128"/>>
>>>>>  > > > > > > <property name="stripedPoolSize" value="128"/>>
>>>>>  > > > > > > <property name="dataStreamerThreadPoolSize" value="4"/>>
>>>>>  > > > > > > <property name="rebalanceThreadPoolSize" value="16"/>>
>>>>>  > > > > > >>
>>>>>  > > > > > > <!-- Explicitly enable peer class loading. -->>
>>>>>  > > > > > > <property name="peerClassLoadingEnabled" value="true"/>>
>>>>>  > > > > > >>
>>>>>  > > > > > > <!-- Enable deploymentSpi,>
>>>>>  > > > > > > /usr/share/apache-ignite/libs/segmentify directory will be
>>>>> checked>
>>>>>  > > > > > > every 5 seconds for changed files-->>
>>>>>  > > > > > > <property name="deploymentSpi">>
>>>>>  > > > > > > <bean>
>>>>>  > > > class="org.apache.ignite.spi.deployment.uri.UriDeploymentSpi">>
>>>>>  > > > > > > <property name="temporaryDirectoryPath">
>>>>>  > > > > > > value="/tmp/temp_ignite_libs"/>>
>>>>>  > > > > > > <property name="uriList">>
>>>>>  > > > > > > <list>>
>>>>>  > > > > > >>
>>>>>  > > > > > > <va...@localhost>
>>>>>  > > > /usr/share/apache-ignite/libs/segmentify/</value>>
>>>>>  > > > > > > </list>>
>>>>>  > > > > > > </property>>
>>>>>  > > > > > > </bean>>
>>>>>  > > > > > > </property>>
>>>>>  > > > > > >>
>>>>>  > > > > > > <property name="cacheConfiguration">>
>>>>>  > > > > > > <list>>
>>>>>  > > > > > > <!-- Partitioned cache example configuration (Atomic>
>>>>>  > > > mode). -->>
>>>>>  > > > > > > <bean>
>>>>>  > > > class="org.apache.ignite.configuration.CacheConfiguration">>
>>>>>  > > > > > > <property name="name" value="default"/>>
>>>>>  > > > > > > <property name="atomicityMode" value="ATOMIC"/>>
>>>>>  > > > > > > <property name="backups" value="1"/>>
>>>>>  > > > > > > </bean>>
>>>>>  > > > > > > </list>>
>>>>>  > > > > > > </property>>
>>>>>  > > > > > >>
>>>>>  > > > > > > <!-- Explicitly configure TCP discovery SPI to provide list
>>>>> of>
>>>>>  > > > > > > initial nodes. -->>
>>>>>  > > > > > > <property name="discoverySpi">>
>>>>>  > > > > > > <bean>
>>>>>  > > > class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">>
>>>>>  > > > > > > <property name="networkTimeout" value="60000"/>>
>>>>>  > > > > > > <property name="ipFinder">>
>>>>>  > > > > > > <bean>
>>>>>  > > > > > >>
>>>>>  > > >
>>>>> class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">>
>>>>>
>>>>>  > > > > > > <property name="addresses">>
>>>>>  > > > > > > <list>>
>>>>>  > > > > > > <!-- THERE ARE 10 NODES -->>
>>>>>  > > > > > > </list>>
>>>>>  > > > > > > </property>>
>>>>>  > > > > > > </bean>>
>>>>>  > > > > > > </property>>
>>>>>  > > > > > > </bean>>
>>>>>  > > > > > > </property>>
>>>>>  > > > > > >>
>>>>>  > > > > > > <!-- Enabling Apache Ignite native persistence. -->>
>>>>>  > > > > > > <property name="dataStorageConfiguration">>
>>>>>  > > > > > > <bean>
>>>>>  > > > class="org.apache.ignite.configuration.DataStorageConfiguration">>
>>>>>  > > > > > > <property name="defaultDataRegionConfiguration">>
>>>>>  > > > > > > <bean>
>>>>>  > > > > > >
>>>>> class="org.apache.ignite.configuration.DataRegionConfiguration">>
>>>>>  > > > > > > <property name="persistenceEnabled">
>>>>>  > > > value="true"/>>
>>>>>  > > > > > > <property name="checkpointPageBufferSize">
>>>>>  > > > > > > value="#{ 2L * 1024 * 1024 * 1024}"/>>
>>>>>  > > > > > > <property name="maxSize" value="#{ 40L * 1024 *>
>>>>>  > > > > > > 1024 * 1024 }"/>>
>>>>>  > > > > > > </bean>>
>>>>>  > > > > > > </property>>
>>>>>  > > > > > > <property name="storagePath">
>>>>>  > > > value="/srv/ignite/persist"/>>
>>>>>  > > > > > > <property name="walPath" value="/srv/ignite/wal"/>>
>>>>>  > > > > > > <property name="walArchivePath" value="/srv/ignite/wal"/>>
>>>>>  > > > > > > <property name="walMode" value="LOG_ONLY"/>>
>>>>>  > > > > > > <property name="walSegmentSize" value="#{ 256L * 1024 *>
>>>>>  > > > 1024 }"/>>
>>>>>  > > > > > > <property name="walFlushFrequency" value="5000"/>>
>>>>>  > > > > > > <property name="maxWalArchiveSize" value="#{ 512L * 1024>
>>>>>  > > > * 1024 }"/>>
>>>>>  > > > > > > <property name="writeThrottlingEnabled" value="true"/>>
>>>>>  > > > > > > <property name="checkpointFrequency" value="300000"/>>
>>>>>  > > > > > > <property name="checkpointWriteOrder" value="SEQUENTIAL">
>>>>>  > > > />>
>>>>>  > > > > > > </bean>>
>>>>>  > > > > > > </property>>
>>>>>  > > > > > > </bean>>
>>>>>  > > > > > >>
>>>>>  > > > > > >>
>>>>>  > > > > > > -->
>>>>>  > > > > > > < https://www.segmentify.com/ >İbrahim Halil AltunSenior
>>>>> Software>
>>>>>  > > > Engineer+90>
>>>>>  > > > > > > 536 3327510 •  segmentify.com →
>>>>> < https://www.segmentify.com/ >UK •>
>>>>>  > > > Germany •>
>>>>>  > > > > > > Turkey < https://www.segmentify.com/ecommerce-growth-show >>
>>>>>  > > > > > > < https://www.g2.com/products/segmentify/reviews >>
>>>>>  > > > > > >>
>>>>>  > > > > >>
>>>>>  > > > >>
>>>>>  > > >>
>>>>>  > > >
>>>>>  >
>>>>> 
>>> 
>>>  --
>>>Thanks & Regards,
>>>Naveen Bandaru 
>> 
>> 
>> 
>>  
> 
>  --
>Thanks & Regards,
>Naveen Bandaru 
 
 
 
 

Re: Re[2]: apache ignite 2.10.0 heap starvation

Posted by Naveen Kumar <na...@gmail.com>.
heap dump generation does not seems to be working.
whenever I tried to generate the heap dump, node is going down, bit strange,
what else we could analyze

On Tue, Oct 12, 2021 at 7:35 PM Zhenya Stanilovsky <ar...@mail.ru>
wrote:

> hi, highly likely the problem in your code - cpu usage grow synchronously
> with heap increasing between 00.00 and 12.00.
> You need to analyze heap dump, no additional settings will help here.
>
>
> On the same subject, we have made the changes as suggested
>
> nodes are running on 8 CORE and 128 GB MEM VMs, i've added the following
> jvm parameters
>
> -XX:ParallelGCThreads=4
> -XX:ConcGCThreads=2
> -XX:MaxGCPauseMillis=200
> -XX:InitiatingHeapOccupancyPercent=40
>
> Not used any of these below, using the default values for all these,
> which is 8 (as the number of cores)
>
>         <property name="systemThreadPoolSize" value="8"/>
>         <property name="publicThreadPoolSize" value="8"/>
>         <property name="queryThreadPoolSize" value="8"/>
>         <property name="serviceThreadPoolSize" value="8"/>
>
> I could still see our heap is increasing,  but atleast I could see a
> pattern now (not like earlier which is almost exponential)
>
> Attaching the screenshots of heap, CPU, GC and start script with all the
> jvm arguments used.
> what do you think I should be changing to run to use heap effectively
>
>
>
> On Wed, Sep 29, 2021 at 2:35 PM Ibrahim Altun <
> ibrahim.altun@segmentify.com
> <//...@segmentify.com>>
> wrote:
>
> after many configuration changes and optimizations, i think i've solved
> the heap problem.
>
> here are the changes that i applied to the system;
> JVM changes ->
> https://medium.com/@hoan.nguyen.it/how-did-g1gc-tuning-flags-affect-our-back-end-web-app-c121d38dfe56
> helped a lot
>
> nodes are running on 12CORE and 64GB MEM servers, i've added the following
> jvm parameters
>
> -XX:ParallelGCThreads=6
> -XX:ConcGCThreads=2
> -XX:MaxGCPauseMillis=200
> -XX:InitiatingHeapOccupancyPercent=40
>
> on ignite configuration i've changed all thread pool sizes, which were
> much more than these;
>         <property name="systemThreadPoolSize" value="12"/>
>         <property name="publicThreadPoolSize" value="12"/>
>         <property name="queryThreadPoolSize" value="12"/>
>         <property name="serviceThreadPoolSize" value="12"/>
>         <property name="stripedPoolSize" value="12"/>
>         <property name="dataStreamerThreadPoolSize" value="12"/>
>         <property name="rebalanceThreadPoolSize" value="12"/>
>
> Here is the 16 hours of GC report;
>
> https://gceasy.io/diamondgc-report.jsp?p=c2hhcmVkLzIwMjEvMDkvMjkvLS1nYy5sb2cuMC5jdXJyZW50LS04LTU4LTMx&channel=WEB
>
>
>
> On 2021/09/27 17:11:21, Ilya Korol <llivezking@gmail.com
> <//...@gmail.com>> wrote:
> > Actually Query interface doesn't define close() method, but QueryCursor
> > does.
> > In your snippets you're using try-with-resource construction for SELECT
> > queries which is good, but when you run MERGE INTO query you would also
> > get an QueryCursor as a result of
> >
> > igniteCacheService.getCache(ID,
> IgniteCacheType.LABEL).query(insertQuery);
> >
> > so maybe this QueryCursor objects still hold some resources/memory.
> > Javadoc for QueryCursor states that you should always close cursors.
> >
> > To simplify cursor closing there is a cursor.getAll() method that will
> > do this for you under the hood.
> >
> >
> > On 2021/09/13 06:17:21, Ibrahim Altun <i...@segmentify.com
> <//...@segmentify.com>> wrote:
> >  > Hi Ilya,>
> >  >
> >  > since this is production environment i could not risk to take heap
> > dump for now, but i will try to convince my superiors to get one and
> > analyze it.>
> >  >
> >  > Queries are heavily used in our system but aren't they autoclosable
> > objects? do we have to close them anyway?>
> >  >
> >  > here are some usage examples on our system;>
> >  > --insert query is like this; MERGE INTO "ProductLabel" ("productId",
> > "label", "language") VALUES (?, ?, ?)>
> >  > igniteCacheService.getCache(ID,
> > IgniteCacheType.LABEL).query(insertQuery);>
> >  >
> >  > another usage example;>
> >  > --sqlFieldsQuery is like this; >
> >  > String sql = "SELECT _val FROM \"UserRecord\" WHERE \"email\" IN
> (?)";>
> >  > SqlFieldsQuery sqlFieldsQuery = new SqlFieldsQuery(sql);>
> >  > sqlFieldsQuery.setLazy(true);>
> >  > sqlFieldsQuery.setArgs(emails.toArray());>
> >  >
> >  > try (QueryCursor<List<?>> ignored = igniteCacheService.getCache(ID,
> > IgniteCacheType.USER).query(sqlFieldsQuery)) {...}>
> >  >
> >  >
> >  >
> >  > On 2021/09/12 20:28:09, Shishkov Ilya <sh...@gmail.com
> <//...@gmail.com>> wrote: >
> >  > > Hi, Ibrahim!>
> >  > > Have you analyzed the heap dump of the server node JVMs?>
> >  > > In case your application executes queries are their cursors closed?>
> >  > > >
> >  > > пт, 10 сент. 2021 г. в 11:54, Ibrahim Altun <ib...@segmentify.com
> <//...@segmentify.com>>:>
> >  > > >
> >  > > > Igniters any comment on this issue, we are facing huge GC
> > problems on>
> >  > > > production environment, please advise.>
> >  > > >>
> >  > > > On 2021/09/07 14:11:09, Ibrahim Altun <ib...@segmentify.com
> <//...@segmentify.com>>>
> >  > > > wrote:>
> >  > > > > Hi,>
> >  > > > >>
> >  > > > > totally 400 - 600K reads/writes/updates>
> >  > > > > 12core>
> >  > > > > 64GB RAM>
> >  > > > > no iowait>
> >  > > > > 10 nodes>
> >  > > > >>
> >  > > > > On 2021/09/07 12:51:28, Piotr Jagielski <pj...@touk.pl
> <//...@touk.pl>> wrote:>
> >  > > > > > Hi,>
> >  > > > > > Can you provide some information on how you use the cluster?
> > How many>
> >  > > > reads/writes/updates per second? Also CPU / RAM spec of cluster
> > nodes?>
> >  > > > > >>
> >  > > > > > We observed full GC / CPU load / OOM killer when loading big
> > amount of>
> >  > > > data (15 mln records, data streamer + allowOverwrite=true). We've
> > seen>
> >  > > > 200-400k updates per sec on JMX metrics, but load up to 10 on
> > nodes, iowait>
> >  > > > to 30%. Our cluster is 3 x 4CPU, 16GB RAM (already upgradingto
> > 8CPU, 32GB>
> >  > > > RAM). Ignite 2.10>
> >  > > > > >>
> >  > > > > > Regards,>
> >  > > > > > Piotr>
> >  > > > > >>
> >  > > > > > On 2021/09/02 08:36:07, Ibrahim Altun <ib...@segmentify.com
> <//...@segmentify.com>>>
> >  > > > wrote:>
> >  > > > > > > After upgrading from 2.7.1 version to 2.10.0 version ignite
> > nodes>
> >  > > > facing>
> >  > > > > > > huge full GC operations after 24-36 hours after node start.>
> >  > > > > > >>
> >  > > > > > > We try to increase heap size but no luck, here is the start>
> >  > > > configuration>
> >  > > > > > > for nodes;>
> >  > > > > > >>
> >  > > > > > > JVM_OPTS="$JVM_OPTS -Xms12g -Xmx12g -server>
> >  > > > > > >>
> >  > > >
> >
> -javaagent:/etc/prometheus/jmx_prometheus_javaagent-0.14.0.jar=8090:/etc/prometheus/jmx.yml>
> >
> >  > > > > > > -Dcom.sun.management.jmxremote>
> >  > > > > > > -Dcom.sun.management.jmxremote.authenticate=false>
> >  > > > > > > -Dcom.sun.management.jmxremote.port=49165>
> >  > > > > > > -Dcom.sun.management.jmxremote.host=localhost>
> >  > > > > > > -XX:MaxMetaspaceSize=256m -XX:MaxDirectMemorySize=1g>
> >  > > > > > > -DIGNITE_SKIP_CONFIGURATION_CONSISTENCY_CHECK=true>
> >  > > > > > > -DIGNITE_WAL_MMAP=true
> > -DIGNITE_BPLUS_TREE_LOCK_RETRIES=100000>
> >  > > > > > > -Djava.net.preferIPv4Stack=true">
> >  > > > > > >>
> >  > > > > > > JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch -XX:+UseG1GC>
> >  > > > > > > -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC>
> >  > > > > > > -XX:+UseStringDeduplication
> > -Xloggc:/var/log/apache-ignite/gc.log>
> >  > > > > > > -XX:+PrintGCDetails -XX:+PrintGCDateStamps>
> >  > > > > > > -XX:+PrintTenuringDistribution -XX:+PrintGCCause>
> >  > > > > > > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10>
> >  > > > > > > -XX:GCLogFileSize=100M">
> >  > > > > > >>
> >  > > > > > > here is the 80 hours of GC analyize report:>
> >  > > > > > >>
> >  > > >
> >
> https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMDgvMzEvLS1nYy5sb2cuMC5jdXJyZW50LnppcC0tNS01MS0yOQ==&channel=WEB
> >
> >
> >  > > > > > >>
> >  > > > > > > do we need more heap size or is there a BUG that we need to
> > be aware?>
> >  > > > > > >>
> >  > > > > > > here is the node configuration:>
> >  > > > > > >>
> >  > > > > > > <?xml version="1.0" encoding="UTF-8"?>>
> >  > > > > > > <beans xmlns="http://www.springframework.org/schema/beans">
> >  > > > > > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
> >  > > > > > > xsi:schemaLocation=">
> >  > > > > > > http://www.springframework.org/schema/beans>
> >  > > > > > >
> > http://www.springframework.org/schema/beans/spring-beans.xsd">>
> >  > > > > > > <bean id="ignite.cfg">
> >  > > > > > >
> class="org.apache.ignite.configuration.IgniteConfiguration">>
> >  > > > > > > <property name="gridLogger">>
> >  > > > > > > <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">>
> >  > > > > > > <constructor-arg type="java.lang.String">
> >  > > > > > > value="/etc/apache-ignite/ignite-log4j2.xml"/>>
> >  > > > > > > </bean>>
> >  > > > > > > </property>>
> >  > > > > > > <property name="communicationSpi">>
> >  > > > > > > <bean>
> >  > > >
> > class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">>
> >  > > > > > > <property name="usePairedConnections" value="true"/>>
> >  > > > > > > </bean>>
> >  > > > > > > </property>>
> >  > > > > > > <property name="failureDetectionTimeout" value="60000"/>>
> >  > > > > > > <property name="systemThreadPoolSize" value="128"/>>
> >  > > > > > > <property name="publicThreadPoolSize" value="128"/>>
> >  > > > > > > <property name="queryThreadPoolSize" value="128"/>>
> >  > > > > > > <property name="serviceThreadPoolSize" value="128"/>>
> >  > > > > > > <property name="stripedPoolSize" value="128"/>>
> >  > > > > > > <property name="dataStreamerThreadPoolSize" value="4"/>>
> >  > > > > > > <property name="rebalanceThreadPoolSize" value="16"/>>
> >  > > > > > >>
> >  > > > > > > <!-- Explicitly enable peer class loading. -->>
> >  > > > > > > <property name="peerClassLoadingEnabled" value="true"/>>
> >  > > > > > >>
> >  > > > > > > <!-- Enable deploymentSpi,>
> >  > > > > > > /usr/share/apache-ignite/libs/segmentify directory will be
> > checked>
> >  > > > > > > every 5 seconds for changed files-->>
> >  > > > > > > <property name="deploymentSpi">>
> >  > > > > > > <bean>
> >  > > > class="org.apache.ignite.spi.deployment.uri.UriDeploymentSpi">>
> >  > > > > > > <property name="temporaryDirectoryPath">
> >  > > > > > > value="/tmp/temp_ignite_libs"/>>
> >  > > > > > > <property name="uriList">>
> >  > > > > > > <list>>
> >  > > > > > >>
> >  > > > > > > <va...@localhost>
> >  > > > /usr/share/apache-ignite/libs/segmentify/</value>>
> >  > > > > > > </list>>
> >  > > > > > > </property>>
> >  > > > > > > </bean>>
> >  > > > > > > </property>>
> >  > > > > > >>
> >  > > > > > > <property name="cacheConfiguration">>
> >  > > > > > > <list>>
> >  > > > > > > <!-- Partitioned cache example configuration (Atomic>
> >  > > > mode). -->>
> >  > > > > > > <bean>
> >  > > > class="org.apache.ignite.configuration.CacheConfiguration">>
> >  > > > > > > <property name="name" value="default"/>>
> >  > > > > > > <property name="atomicityMode" value="ATOMIC"/>>
> >  > > > > > > <property name="backups" value="1"/>>
> >  > > > > > > </bean>>
> >  > > > > > > </list>>
> >  > > > > > > </property>>
> >  > > > > > >>
> >  > > > > > > <!-- Explicitly configure TCP discovery SPI to provide list
> > of>
> >  > > > > > > initial nodes. -->>
> >  > > > > > > <property name="discoverySpi">>
> >  > > > > > > <bean>
> >  > > > class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">>
> >  > > > > > > <property name="networkTimeout" value="60000"/>>
> >  > > > > > > <property name="ipFinder">>
> >  > > > > > > <bean>
> >  > > > > > >>
> >  > > >
> >
> class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">>
> >
> >  > > > > > > <property name="addresses">>
> >  > > > > > > <list>>
> >  > > > > > > <!-- THERE ARE 10 NODES -->>
> >  > > > > > > </list>>
> >  > > > > > > </property>>
> >  > > > > > > </bean>>
> >  > > > > > > </property>>
> >  > > > > > > </bean>>
> >  > > > > > > </property>>
> >  > > > > > >>
> >  > > > > > > <!-- Enabling Apache Ignite native persistence. -->>
> >  > > > > > > <property name="dataStorageConfiguration">>
> >  > > > > > > <bean>
> >  > > > class="org.apache.ignite.configuration.DataStorageConfiguration">>
> >  > > > > > > <property name="defaultDataRegionConfiguration">>
> >  > > > > > > <bean>
> >  > > > > > >
> > class="org.apache.ignite.configuration.DataRegionConfiguration">>
> >  > > > > > > <property name="persistenceEnabled">
> >  > > > value="true"/>>
> >  > > > > > > <property name="checkpointPageBufferSize">
> >  > > > > > > value="#{ 2L * 1024 * 1024 * 1024}"/>>
> >  > > > > > > <property name="maxSize" value="#{ 40L * 1024 *>
> >  > > > > > > 1024 * 1024 }"/>>
> >  > > > > > > </bean>>
> >  > > > > > > </property>>
> >  > > > > > > <property name="storagePath">
> >  > > > value="/srv/ignite/persist"/>>
> >  > > > > > > <property name="walPath" value="/srv/ignite/wal"/>>
> >  > > > > > > <property name="walArchivePath" value="/srv/ignite/wal"/>>
> >  > > > > > > <property name="walMode" value="LOG_ONLY"/>>
> >  > > > > > > <property name="walSegmentSize" value="#{ 256L * 1024 *>
> >  > > > 1024 }"/>>
> >  > > > > > > <property name="walFlushFrequency" value="5000"/>>
> >  > > > > > > <property name="maxWalArchiveSize" value="#{ 512L * 1024>
> >  > > > * 1024 }"/>>
> >  > > > > > > <property name="writeThrottlingEnabled" value="true"/>>
> >  > > > > > > <property name="checkpointFrequency" value="300000"/>>
> >  > > > > > > <property name="checkpointWriteOrder" value="SEQUENTIAL">
> >  > > > />>
> >  > > > > > > </bean>>
> >  > > > > > > </property>>
> >  > > > > > > </bean>>
> >  > > > > > >>
> >  > > > > > >>
> >  > > > > > > -->
> >  > > > > > > <https://www.segmentify.com/>İbrahim Halil AltunSenior
> > Software>
> >  > > > Engineer+90>
> >  > > > > > > 536 3327510 • segmentify.com →
> > <https://www.segmentify.com/>UK •>
> >  > > > Germany •>
> >  > > > > > > Turkey <https://www.segmentify.com/ecommerce-growth-show>>
> >  > > > > > > <https://www.g2.com/products/segmentify/reviews>>
> >  > > > > > >>
> >  > > > > >>
> >  > > > >>
> >  > > >>
> >  > > >
> >  >
> >
>
>
>
> --
> Thanks & Regards,
> Naveen Bandaru
>
>
>
>
>
>


-- 
Thanks & Regards,
Naveen Bandaru

Re[2]: apache ignite 2.10.0 heap starvation

Posted by Zhenya Stanilovsky <ar...@mail.ru>.
hi, highly likely the problem in your code - cpu usage grow synchronously with heap increasing between 00.00 and 12.00.
You need to analyze heap dump, no additional settings will help here.
 
>On the same subject, we have made the changes as suggested 
> 
>nodes are running on 8 CORE and 128 GB MEM VMs, i've added the following jvm parameters
>
>-XX:ParallelGCThreads=4
>-XX:ConcGCThreads=2
>-XX:MaxGCPauseMillis=200
>-XX:InitiatingHeapOccupancyPercent=40
> 
>Not used any of these below, using the default values for all these, which is 8 (as the number of cores)
> 
>        <property name="systemThreadPoolSize" value="8"/>
>        <property name="publicThreadPoolSize" value="8"/>
>        <property name="queryThreadPoolSize" value="8"/>
>        <property name="serviceThreadPoolSize" value="8"/>
> 
>I could still see our heap is increasing,  but atleast I could see a pattern now (not like earlier which is almost exponential)
> 
>Attaching the screenshots of heap, CPU, GC and start script with all the jvm arguments used. 
>what do you think I should be changing to run to use heap effectively 
> 
>   
>On Wed, Sep 29, 2021 at 2:35 PM Ibrahim Altun < ibrahim.altun@segmentify.com > wrote:
>>after many configuration changes and optimizations, i think i've solved the heap problem.
>>
>>here are the changes that i applied to the system;
>>JVM changes ->  https://medium.com/@hoan.nguyen.it/how-did-g1gc-tuning-flags-affect-our-back-end-web-app-c121d38dfe56 helped a lot
>>
>>nodes are running on 12CORE and 64GB MEM servers, i've added the following jvm parameters
>>
>>-XX:ParallelGCThreads=6
>>-XX:ConcGCThreads=2
>>-XX:MaxGCPauseMillis=200
>>-XX:InitiatingHeapOccupancyPercent=40
>>
>>on ignite configuration i've changed all thread pool sizes, which were much more than these;
>>        <property name="systemThreadPoolSize" value="12"/>
>>        <property name="publicThreadPoolSize" value="12"/>
>>        <property name="queryThreadPoolSize" value="12"/>
>>        <property name="serviceThreadPoolSize" value="12"/>
>>        <property name="stripedPoolSize" value="12"/>
>>        <property name="dataStreamerThreadPoolSize" value="12"/>
>>        <property name="rebalanceThreadPoolSize" value="12"/>
>>
>>Here is the 16 hours of GC report;
>>https://gceasy.io/diamondgc-report.jsp?p=c2hhcmVkLzIwMjEvMDkvMjkvLS1nYy5sb2cuMC5jdXJyZW50LS04LTU4LTMx&channel=WEB
>>
>>
>>
>>On 2021/09/27 17:11:21, Ilya Korol < llivezking@gmail.com > wrote:
>>> Actually Query interface doesn't define close() method, but QueryCursor
>>> does.
>>> In your snippets you're using try-with-resource construction for SELECT
>>> queries which is good, but when you run MERGE INTO query you would also
>>> get an QueryCursor as a result of
>>>
>>> igniteCacheService.getCache(ID, IgniteCacheType.LABEL).query(insertQuery);
>>>
>>> so maybe this QueryCursor objects still hold some resources/memory.
>>> Javadoc for QueryCursor states that you should always close cursors.
>>>
>>> To simplify cursor closing there is a cursor.getAll() method that will
>>> do this for you under the hood.
>>>
>>>
>>> On 2021/09/13 06:17:21, Ibrahim Altun < i...@segmentify.com > wrote:
>>>  > Hi Ilya,>
>>>  >
>>>  > since this is production environment i could not risk to take heap
>>> dump for now, but i will try to convince my superiors to get one and
>>> analyze it.>
>>>  >
>>>  > Queries are heavily used in our system but aren't they autoclosable
>>> objects? do we have to close them anyway?>
>>>  >
>>>  > here are some usage examples on our system;>
>>>  > --insert query is like this; MERGE INTO "ProductLabel" ("productId",
>>> "label", "language") VALUES (?, ?, ?)>
>>>  > igniteCacheService.getCache(ID,
>>> IgniteCacheType.LABEL).query(insertQuery);>
>>>  >
>>>  > another usage example;>
>>>  > --sqlFieldsQuery is like this; >
>>>  > String sql = "SELECT _val FROM \"UserRecord\" WHERE \"email\" IN (?)";>
>>>  > SqlFieldsQuery sqlFieldsQuery = new SqlFieldsQuery(sql);>
>>>  > sqlFieldsQuery.setLazy(true);>
>>>  > sqlFieldsQuery.setArgs(emails.toArray());>
>>>  >
>>>  > try (QueryCursor<List<?>> ignored = igniteCacheService.getCache(ID,
>>> IgniteCacheType.USER).query(sqlFieldsQuery)) {...}>
>>>  >
>>>  >
>>>  >
>>>  > On 2021/09/12 20:28:09, Shishkov Ilya < sh...@gmail.com > wrote: >
>>>  > > Hi, Ibrahim!>
>>>  > > Have you analyzed the heap dump of the server node JVMs?>
>>>  > > In case your application executes queries are their cursors closed?>
>>>  > > >
>>>  > > пт, 10 сент. 2021 г. в 11:54, Ibrahim Altun < ib...@segmentify.com >:>
>>>  > > >
>>>  > > > Igniters any comment on this issue, we are facing huge GC
>>> problems on>
>>>  > > > production environment, please advise.>
>>>  > > >>
>>>  > > > On 2021/09/07 14:11:09, Ibrahim Altun < ib...@segmentify.com >>
>>>  > > > wrote:>
>>>  > > > > Hi,>
>>>  > > > >>
>>>  > > > > totally 400 - 600K reads/writes/updates>
>>>  > > > > 12core>
>>>  > > > > 64GB RAM>
>>>  > > > > no iowait>
>>>  > > > > 10 nodes>
>>>  > > > >>
>>>  > > > > On 2021/09/07 12:51:28, Piotr Jagielski < pj...@touk.pl > wrote:>
>>>  > > > > > Hi,>
>>>  > > > > > Can you provide some information on how you use the cluster?
>>> How many>
>>>  > > > reads/writes/updates per second? Also CPU / RAM spec of cluster
>>> nodes?>
>>>  > > > > >>
>>>  > > > > > We observed full GC / CPU load / OOM killer when loading big
>>> amount of>
>>>  > > > data (15 mln records, data streamer + allowOverwrite=true). We've
>>> seen>
>>>  > > > 200-400k updates per sec on JMX metrics, but load up to 10 on
>>> nodes, iowait>
>>>  > > > to 30%. Our cluster is 3 x 4CPU, 16GB RAM (already upgradingto
>>> 8CPU, 32GB>
>>>  > > > RAM). Ignite 2.10>
>>>  > > > > >>
>>>  > > > > > Regards,>
>>>  > > > > > Piotr>
>>>  > > > > >>
>>>  > > > > > On 2021/09/02 08:36:07, Ibrahim Altun < ib...@segmentify.com >>
>>>  > > > wrote:>
>>>  > > > > > > After upgrading from 2.7.1 version to 2.10.0 version ignite
>>> nodes>
>>>  > > > facing>
>>>  > > > > > > huge full GC operations after 24-36 hours after node start.>
>>>  > > > > > >>
>>>  > > > > > > We try to increase heap size but no luck, here is the start>
>>>  > > > configuration>
>>>  > > > > > > for nodes;>
>>>  > > > > > >>
>>>  > > > > > > JVM_OPTS="$JVM_OPTS -Xms12g -Xmx12g -server>
>>>  > > > > > >>
>>>  > > >
>>> -javaagent:/etc/prometheus/jmx_prometheus_javaagent-0.14.0.jar=8090:/etc/prometheus/jmx.yml>
>>>
>>>  > > > > > > -Dcom.sun.management.jmxremote>
>>>  > > > > > > -Dcom.sun.management.jmxremote.authenticate=false>
>>>  > > > > > > -Dcom.sun.management.jmxremote.port=49165>
>>>  > > > > > > -Dcom.sun.management.jmxremote.host=localhost>
>>>  > > > > > > -XX:MaxMetaspaceSize=256m -XX:MaxDirectMemorySize=1g>
>>>  > > > > > > -DIGNITE_SKIP_CONFIGURATION_CONSISTENCY_CHECK=true>
>>>  > > > > > > -DIGNITE_WAL_MMAP=true
>>> -DIGNITE_BPLUS_TREE_LOCK_RETRIES=100000>
>>>  > > > > > > -Djava.net.preferIPv4Stack=true">
>>>  > > > > > >>
>>>  > > > > > > JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch -XX:+UseG1GC>
>>>  > > > > > > -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC>
>>>  > > > > > > -XX:+UseStringDeduplication
>>> -Xloggc:/var/log/apache-ignite/gc.log>
>>>  > > > > > > -XX:+PrintGCDetails -XX:+PrintGCDateStamps>
>>>  > > > > > > -XX:+PrintTenuringDistribution -XX:+PrintGCCause>
>>>  > > > > > > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10>
>>>  > > > > > > -XX:GCLogFileSize=100M">
>>>  > > > > > >>
>>>  > > > > > > here is the 80 hours of GC analyize report:>
>>>  > > > > > >>
>>>  > > >
>>>  https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMDgvMzEvLS1nYy5sb2cuMC5jdXJyZW50LnppcC0tNS01MS0yOQ==&channel=WEB >
>>>
>>>  > > > > > >>
>>>  > > > > > > do we need more heap size or is there a BUG that we need to
>>> be aware?>
>>>  > > > > > >>
>>>  > > > > > > here is the node configuration:>
>>>  > > > > > >>
>>>  > > > > > > <?xml version="1.0" encoding="UTF-8"?>>
>>>  > > > > > > <beans xmlns=" http://www.springframework.org/schema/beans ">
>>>  > > > > > > xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance ">
>>>  > > > > > > xsi:schemaLocation=">
>>>  > > > > > >  http://www.springframework.org/schema/beans >
>>>  > > > > > >
>>>  http://www.springframework.org/schema/beans/spring-beans.xsd ">>
>>>  > > > > > > <bean id="ignite.cfg">
>>>  > > > > > > class="org.apache.ignite.configuration.IgniteConfiguration">>
>>>  > > > > > > <property name="gridLogger">>
>>>  > > > > > > <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">>
>>>  > > > > > > <constructor-arg type="java.lang.String">
>>>  > > > > > > value="/etc/apache-ignite/ignite-log4j2.xml"/>>
>>>  > > > > > > </bean>>
>>>  > > > > > > </property>>
>>>  > > > > > > <property name="communicationSpi">>
>>>  > > > > > > <bean>
>>>  > > >
>>> class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">>
>>>  > > > > > > <property name="usePairedConnections" value="true"/>>
>>>  > > > > > > </bean>>
>>>  > > > > > > </property>>
>>>  > > > > > > <property name="failureDetectionTimeout" value="60000"/>>
>>>  > > > > > > <property name="systemThreadPoolSize" value="128"/>>
>>>  > > > > > > <property name="publicThreadPoolSize" value="128"/>>
>>>  > > > > > > <property name="queryThreadPoolSize" value="128"/>>
>>>  > > > > > > <property name="serviceThreadPoolSize" value="128"/>>
>>>  > > > > > > <property name="stripedPoolSize" value="128"/>>
>>>  > > > > > > <property name="dataStreamerThreadPoolSize" value="4"/>>
>>>  > > > > > > <property name="rebalanceThreadPoolSize" value="16"/>>
>>>  > > > > > >>
>>>  > > > > > > <!-- Explicitly enable peer class loading. -->>
>>>  > > > > > > <property name="peerClassLoadingEnabled" value="true"/>>
>>>  > > > > > >>
>>>  > > > > > > <!-- Enable deploymentSpi,>
>>>  > > > > > > /usr/share/apache-ignite/libs/segmentify directory will be
>>> checked>
>>>  > > > > > > every 5 seconds for changed files-->>
>>>  > > > > > > <property name="deploymentSpi">>
>>>  > > > > > > <bean>
>>>  > > > class="org.apache.ignite.spi.deployment.uri.UriDeploymentSpi">>
>>>  > > > > > > <property name="temporaryDirectoryPath">
>>>  > > > > > > value="/tmp/temp_ignite_libs"/>>
>>>  > > > > > > <property name="uriList">>
>>>  > > > > > > <list>>
>>>  > > > > > >>
>>>  > > > > > > <va...@localhost>
>>>  > > > /usr/share/apache-ignite/libs/segmentify/</value>>
>>>  > > > > > > </list>>
>>>  > > > > > > </property>>
>>>  > > > > > > </bean>>
>>>  > > > > > > </property>>
>>>  > > > > > >>
>>>  > > > > > > <property name="cacheConfiguration">>
>>>  > > > > > > <list>>
>>>  > > > > > > <!-- Partitioned cache example configuration (Atomic>
>>>  > > > mode). -->>
>>>  > > > > > > <bean>
>>>  > > > class="org.apache.ignite.configuration.CacheConfiguration">>
>>>  > > > > > > <property name="name" value="default"/>>
>>>  > > > > > > <property name="atomicityMode" value="ATOMIC"/>>
>>>  > > > > > > <property name="backups" value="1"/>>
>>>  > > > > > > </bean>>
>>>  > > > > > > </list>>
>>>  > > > > > > </property>>
>>>  > > > > > >>
>>>  > > > > > > <!-- Explicitly configure TCP discovery SPI to provide list
>>> of>
>>>  > > > > > > initial nodes. -->>
>>>  > > > > > > <property name="discoverySpi">>
>>>  > > > > > > <bean>
>>>  > > > class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">>
>>>  > > > > > > <property name="networkTimeout" value="60000"/>>
>>>  > > > > > > <property name="ipFinder">>
>>>  > > > > > > <bean>
>>>  > > > > > >>
>>>  > > >
>>> class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">>
>>>
>>>  > > > > > > <property name="addresses">>
>>>  > > > > > > <list>>
>>>  > > > > > > <!-- THERE ARE 10 NODES -->>
>>>  > > > > > > </list>>
>>>  > > > > > > </property>>
>>>  > > > > > > </bean>>
>>>  > > > > > > </property>>
>>>  > > > > > > </bean>>
>>>  > > > > > > </property>>
>>>  > > > > > >>
>>>  > > > > > > <!-- Enabling Apache Ignite native persistence. -->>
>>>  > > > > > > <property name="dataStorageConfiguration">>
>>>  > > > > > > <bean>
>>>  > > > class="org.apache.ignite.configuration.DataStorageConfiguration">>
>>>  > > > > > > <property name="defaultDataRegionConfiguration">>
>>>  > > > > > > <bean>
>>>  > > > > > >
>>> class="org.apache.ignite.configuration.DataRegionConfiguration">>
>>>  > > > > > > <property name="persistenceEnabled">
>>>  > > > value="true"/>>
>>>  > > > > > > <property name="checkpointPageBufferSize">
>>>  > > > > > > value="#{ 2L * 1024 * 1024 * 1024}"/>>
>>>  > > > > > > <property name="maxSize" value="#{ 40L * 1024 *>
>>>  > > > > > > 1024 * 1024 }"/>>
>>>  > > > > > > </bean>>
>>>  > > > > > > </property>>
>>>  > > > > > > <property name="storagePath">
>>>  > > > value="/srv/ignite/persist"/>>
>>>  > > > > > > <property name="walPath" value="/srv/ignite/wal"/>>
>>>  > > > > > > <property name="walArchivePath" value="/srv/ignite/wal"/>>
>>>  > > > > > > <property name="walMode" value="LOG_ONLY"/>>
>>>  > > > > > > <property name="walSegmentSize" value="#{ 256L * 1024 *>
>>>  > > > 1024 }"/>>
>>>  > > > > > > <property name="walFlushFrequency" value="5000"/>>
>>>  > > > > > > <property name="maxWalArchiveSize" value="#{ 512L * 1024>
>>>  > > > * 1024 }"/>>
>>>  > > > > > > <property name="writeThrottlingEnabled" value="true"/>>
>>>  > > > > > > <property name="checkpointFrequency" value="300000"/>>
>>>  > > > > > > <property name="checkpointWriteOrder" value="SEQUENTIAL">
>>>  > > > />>
>>>  > > > > > > </bean>>
>>>  > > > > > > </property>>
>>>  > > > > > > </bean>>
>>>  > > > > > >>
>>>  > > > > > >>
>>>  > > > > > > -->
>>>  > > > > > > < https://www.segmentify.com/ >İbrahim Halil AltunSenior
>>> Software>
>>>  > > > Engineer+90>
>>>  > > > > > > 536 3327510 •  segmentify.com →
>>> < https://www.segmentify.com/ >UK •>
>>>  > > > Germany •>
>>>  > > > > > > Turkey < https://www.segmentify.com/ecommerce-growth-show >>
>>>  > > > > > > < https://www.g2.com/products/segmentify/reviews >>
>>>  > > > > > >>
>>>  > > > > >>
>>>  > > > >>
>>>  > > >>
>>>  > > >
>>>  >
>>> 
> 
>  --
>Thanks & Regards,
>Naveen Bandaru