You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Muhammed Favas <fa...@expeedsoftware.com> on 2019/10/03 09:27:24 UTC

RE: Ignite bulk data load issue

Hi all,

Can some one help me here. I still facing the serious performance issue.

Regards,
Favas

From: Muhammed Favas <fa...@expeedsoftware.com>
Sent: Tuesday, September 24, 2019 2:15 PM
To: user@ignite.apache.org
Subject: RE: Ignite bulk data load issue

Thanks Igor for the suggestions.

But, In the log JVM pause is not showing too much of time consumption(around 500 milliseconds) and I assume it is not the only problem of time taking. There are other reasons I expect for this strange behavior.
Even I tried with WAL disabled, still the time is going high after few files import.

Is there any way to figure out why so many THREAD WAIT are happening. Because of it there are SEVERE errors logging in the ignite nodes. Below are some sample errors from the log file I shared. Please help me here.


[05:02:45,413][SEVERE][tcp-disco-msg-worker-#2][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=data-streamer-stripe-0, blockedFor=34s]
[05:02:45,414][WARNING][tcp-disco-msg-worker-#2][G] Thread [name="data-streamer-stripe-0-#9", id=22, state=RUNNABLE, blockCnt=0, waitCnt=4375]

[05:02:45,415][SEVERE][tcp-disco-msg-worker-#2][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-0, igniteInstanceName=null, finished=false, heartbeatTs=1569214931358]]]
class org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-0, igniteInstanceName=null, finished=false, heartbeatTs=1569214931358]
                at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
                at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
                at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
                at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
                at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
                at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
                at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
                at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
                at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
                at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)


Thread [name="sys-#367", id=468, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
    Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7a4b4b27, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Thread [name="sys-#366", id=467, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
    Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7a4b4b27, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748).


Regards,
Favas

From: Igor Belyakov <ig...@gmail.com>>
Sent: Monday, September 23, 2019 7:37 PM
To: user@ignite.apache.org<ma...@ignite.apache.org>
Subject: Re: Ignite bulk data load issue

Hi,

According to the logs, I see a lot of "Possible too long JVM pause" messages, seems like you have a problem with GC activity. You should check GC logs and try to tune GC following this manual:
https://apacheignite.readme.io/docs/jvm-and-system-tuning

Also, it's suggested to disable WAL during initial data loading, to reduce amount of disk IO operations. Check "Performance tips" in the bottom of the next page for more details:
https://apacheignite.readme.io/docs/data-loading

Regards,
Igor

On Mon, Sep 23, 2019 at 1:07 PM Muhammed Favas <fa...@expeedsoftware.com>> wrote:
Hi,

I need help to figure out the issues identified during the bulk load of data into ignite cluster.
My cluster consist of 5 nodes, each with 8 core CPU, 32 GB RAM and 30GB Disk. Also ignite native persistence is enabled for the table.
I am trying to load data into my ignite SQL table from csv file using COPY command. Each file consist of 50 Million record and I have numerous file of same size. During the initial time of loading, it was quite fast, but after some time the data load become very slow and now it is taking hours to load even a single file. Below is my observations


  *   When I trigger the load first time after a pause, the CPU usage shows in promising level and that time data load is in higher rate.
  *   After loading 2-3 file, the CPU starts dropping down to less than 1 % and it continue in that state for ever.
  *   Then I stop the loading processes for some time and re-start, it again perform well and after some time the same situation happens.

When I checked the log file, I saw certain THREAD WAIT are happening, I believe due to this waits, the CPU is dropping down. I have attached the entire log file.
Can some one help me to figure out why it so in ignite? Or is it something I have made wrong in my configuration. Below is my configuration file content

<bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="failureDetectionTimeout" value="30000"/>
        <!-- Redefining maximum memory size for the cluster node usage. -->
        <property name="dataStorageConfiguration">
            <bean class="org.apache.ignite.configuration.DataStorageConfiguration">

                <property name="CheckpointReadLockTimeout" value="0" />
                <!-- Set the page size to 4 KB -->
                <property name="pageSize" value="#{4 * 1024}"/>
                <!-- Incraese WAL segment size to 1 GB - Default was 64 MB -->
                <property name="walSegmentSize" value="#{1L * 1024 * 1024 * 1024}"/>
                <!-- Set the wal segment to 5. Default was 10 -->
                <property name="walSegments" value="5"/>
                <!-- Set the wal segment history size to 5. Default was 20 -->
                <property name="WalHistorySize" value="5"/>
                <property name="walCompactionEnabled" value="true" />
                <property name="walCompactionLevel" value="6" />
                <!-- Enable write throttling. -->
                <property name="writeThrottlingEnabled" value="true"/>
            <!-- Redefining the default region's settings -->
            <property name="defaultDataRegionConfiguration">
                <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
                        <property name="persistenceEnabled" value="true"/>
                        <property name="name" value="Default_Region"/>
                        <!-- Setting the size of the default region to 24GB. -->
                        <property name="maxSize" value="#{24L * 1024 * 1024 * 1024}"/>
                        <!-- Increasing the check point buffer size to 2 GB. -->
                        <property name="checkpointPageBufferSize" value="#{2L * 1024 * 1024 * 1024}"/>
                </bean>
            </property>
            </bean>
        </property>

        <property name="cacheConfiguration">
            <list>
                <!-- Partitioned cache example configuration -->
                <bean class="org.apache.ignite.configuration.CacheConfiguration">
                    <property name="name" value="default*"/>
                    <property name="cacheMode" value="PARTITIONED" />
                    <property name="atomicityMode" value="TRANSACTIONAL"/>
                    <property name="queryParallelism" value="8" />
                </bean>
            </list>
        </property>


Regards,
Favas