You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Renato Melo <re...@yahoo.com.br> on 2018/09/14 20:49:38 UTC

Huge Query order by PK causes OOMKilled

I am running on Kubertes three nodes. 32 caches are replicated across these 3 nodes.

Currently each node is Limited to 10 cpus and 30 Gigabits of memory. Caches are persisted. Max heap memory is set to 18g.

Before running the query, with all nodes up and connect, I ran top and I see that Ignite Java process is using 11 GB of RAM (RS) and 18 GB (Virt)

When I run a query in on table that has 1350000 records and ordering the data by the primary key (ORDER BY PK), the memory of one nodes is consumed until the node crashes.

The data is need to CDC (Change-data-capture) thus the select must return all rows order by PK.

During the query execution I got this logs in the node that failed:

[16:51:27,258][WARNING][client-connector-#1437][IgniteH2Indexing] Query execution is too long [time=9638 ms, sql='SELECT

Then:

[20:24:39,552][INFO][grid-timeout-worker-#199][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=ea6f72d6, uptime=03:19:00.981]
    ^-- H/N/C [hosts=3, nodes=3, CPUs=288]
    ^-- CPU [cur=0.47%, avg=0.03%, GC=0%]
    ^-- PageMemory [pages=1163734]
    ^-- Heap [used=3530MB, free=80.84%, comm=5120MB]
    ^-- Non heap [used=74MB, free=-1%, comm=75MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=6, qSize=0]
[20:24:43,859][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 1443 milliseconds.
[20:24:54,859][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 1179 milliseconds.
[20:25:01,181][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 979 milliseconds.
[20:25:03,897][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 508 milliseconds.
[20:25:17,503][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 1337 milliseconds.
[20:25:21,520][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 2661 milliseconds.
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007ff5463758e5, pid=373, tid=0x00007fe02ffff700
#
# JRE version: OpenJDK Runtime Environment (8.0_171-b11) (build 1.8.0_171-8u171-b11-1~deb9u1-b11)
# Java VM: OpenJDK 64-Bit Server VM (25.171-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# J 7881 C2 org.apache.ignite.internal.binary.BinaryObjectImpl.fieldByOrder(I)Ljava/lang/Object; (766 bytes) @ 0x00007ff5463758e5 [0x00007ff546374e80+0xa65]
#
# Core dump written. Default location: /opt/ignite/core or core.373
#
# An error report file with more information is saved as:
# /opt/ignite/hs_err_pid373.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
/opt/ignite/apache-ignite-fabric-2.6.0-bin/bin/ignite.sh: line 181:   373 Aborted                 (core dumped) "$JAVA" ${JVM_OPTS} ${QUIET} "${RESTART_SUCCESS_OPT}" ${JMX_MON} -DIGNITE_HOME="${IGNITE_HOME}" -DIGNITE_PROG_NAME="$0" ${JVM_XOPTS} -cp "${CP}" ${MAIN_CLASS} "${CONFIG}"

I tried different configurations. I would appreciate any help.

Re: Huge Query order by PK causes OOMKilled

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

Have you tried setting lazy=true for this query? Ignite has couple of ways
to do SQL, but each of those allows you to set lazy=true.

If this does not help, please try to add explicit SQL index on the field
that you're doing ORDER BY on.

Regards,
-- 
Ilya Kasnacheev


пт, 14 сент. 2018 г. в 23:49, Renato Melo <re...@yahoo.com.br>:

> I am running on Kubertes three nodes. 32 caches are replicated across
> these 3 nodes.
>
> Currently each node is Limited to 10 cpus and 30 Gigabits of memory.
> Caches are persisted. Max heap memory is set to 18g.
>
> Before running the query, with all nodes up and connect, I ran top and I
> see that Ignite Java process is using 11 GB of RAM (RS) and 18 GB (Virt)
>
> When I run a query in on table that has 1350000 records and ordering the
> data by the primary key (ORDER BY PK), the memory of one nodes is consumed
> until the node crashes.
>
> The data is need to CDC (Change-data-capture) thus the select must return
> all rows order by PK.
>
> During the query execution I got this logs in the node that failed:
>
> [16:51:27,258][WARNING][client-connector-#1437][IgniteH2Indexing] Query
> execution is too long [time=9638 ms, sql='SELECT
>
> Then:
>
> [20:24:39,552][INFO][grid-timeout-worker-#199][IgniteKernal]
> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
>     ^-- Node [id=ea6f72d6, uptime=03:19:00.981]
>     ^-- H/N/C [hosts=3, nodes=3, CPUs=288]
>     ^-- CPU [cur=0.47%, avg=0.03%, GC=0%]
>     ^-- PageMemory [pages=1163734]
>     ^-- Heap [used=3530MB, free=80.84%, comm=5120MB]
>     ^-- Non heap [used=74MB, free=-1%, comm=75MB]
>     ^-- Outbound messages queue [size=0]
>     ^-- Public thread pool [active=0, idle=0, qSize=0]
>     ^-- System thread pool [active=0, idle=6, qSize=0]
> [20:24:43,859][WARNING][jvm-pause-detector-worker][] Possible too long JVM
> pause: 1443 milliseconds.
> [20:24:54,859][WARNING][jvm-pause-detector-worker][] Possible too long JVM
> pause: 1179 milliseconds.
> [20:25:01,181][WARNING][jvm-pause-detector-worker][] Possible too long JVM
> pause: 979 milliseconds.
> [20:25:03,897][WARNING][jvm-pause-detector-worker][] Possible too long JVM
> pause: 508 milliseconds.
> [20:25:17,503][WARNING][jvm-pause-detector-worker][] Possible too long JVM
> pause: 1337 milliseconds.
> [20:25:21,520][WARNING][jvm-pause-detector-worker][] Possible too long JVM
> pause: 2661 milliseconds.
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007ff5463758e5, pid=373, tid=0x00007fe02ffff700
> #
> # JRE version: OpenJDK Runtime Environment (8.0_171-b11) (build
> 1.8.0_171-8u171-b11-1~deb9u1-b11)
> # Java VM: OpenJDK 64-Bit Server VM (25.171-b11 mixed mode linux-amd64
> compressed oops)
> # Problematic frame:
> # J 7881 C2
> org.apache.ignite.internal.binary.BinaryObjectImpl.fieldByOrder(I)Ljava/lang/Object;
> (766 bytes) @ 0x00007ff5463758e5 [0x00007ff546374e80+0xa65]
> #
> # Core dump written. Default location: /opt/ignite/core or core.373
> #
> # An error report file with more information is saved as:
> # /opt/ignite/hs_err_pid373.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> /opt/ignite/apache-ignite-fabric-2.6.0-bin/bin/ignite.sh: line 181:   373
> Aborted                 (core dumped) "$JAVA" ${JVM_OPTS} ${QUIET}
> "${RESTART_SUCCESS_OPT}" ${JMX_MON} -DIGNITE_HOME="${IGNITE_HOME}"
> -DIGNITE_PROG_NAME="$0" ${JVM_XOPTS} -cp "${CP}" ${MAIN_CLASS} "${CONFIG}"
>
> I tried different configurations. I would appreciate any help.
>
>
>