You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Shane Kinsella <sh...@aspect.com> on 2016/01/18 14:04:58 UTC

Execution hangs at IgniteRDD.savePairs()

Hi All, I've created a StackOverflow post
(http://stackoverflow.com/questions/34815652/igniterdd-freezes-at-savepairs)
but thought I might share it here also.



I have a Spark cluster of three machines and am trying to use Apache Ignite
for caching data. On each Spark machine I have an Ignite node running and am
using the Spark REPL for testing (problem originally found using
Spark-submit so it not the REPL).

The problem is that my execution freezes at IgniteRDD.savePairs. Here is my
CacheConfig:

    <bean class="org.apache.ignite.configuration.CacheConfiguration">
        <property name="name" value="myRddCache"/>
        <property name="cacheMode" value="PARTITIONED"/>
    </bean>

I previously had this working but ran out of memory in Ignite so I
(temporarily) added some options for tiered storage:

    <!- Store cache entries on-heap. -!->
    <property name="memoryMode" value="ONHEAP_TIERED"/> 

    <!- Enable Off-Heap memory with max size of 10 Gigabytes (0 for
unlimited). -!->
    <property name="offHeapMaxMemory" value="#{10 * 1024L * 1024L *
1024L}"/>

    <!- Configure eviction policy. -!->
    <property name="evictionPolicy">
        <bean
class="org.apache.ignite.cache.eviction.fifo.FifoEvictionPolicy">
            <!- Evict to off-heap after cache size reaches maxSize. -!->
            <property name="maxSize" value="100000"/>
        </bean>
    </property>

These were removed to try to debug the current issue. After adding these
changes was when savePairs stopped working. I have not found anything in the
logs.

Has anyone came across this issue, any work-arounds/solutions?

I believe there could be some hidden state involved. Is there a way to
restore my cluster (delete certain directory etc.)? I have restarted the
whole cluster numerous times.

Notes: HDFS is configured as the under filesystem. When I create my
IgniteContext (in the REPL) with an Ignite Node running on the same machine
I get a warning that the IGFS/IGFS-management endpoints already in use: I
have tested this with and without an Ignite node running on driver machine.

P.S. Here is the thread trace:

Frozen threads found (potential deadlock)

It seems that the following threads have not changed their stack for more
than 10 seconds.
These threads are possibly (but not necessarily!) in a deadlock or hung.

shmem-worker-#93%null% <--- Frozen for at least 21 sec
org.apache.ignite.internal.util.ipc.shmem.IpcSharedMemoryUtils.readSharedMemory(long,
byte[], long, long, long) IpcSharedMemoryUtils.java (native)
org.apache.ignite.internal.util.ipc.shmem.IpcSharedMemorySpace.read(byte[],
int, int, long) IpcSharedMemorySpace.java:220
org.apache.ignite.internal.util.ipc.shmem.IpcSharedMemoryInputStream.read(byte[],
int, int) IpcSharedMemoryInputStream.java:62
org.apache.ignite.internal.util.ipc.IpcToNioAdapter.serve()
IpcToNioAdapter.java:114
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$ShmemWorker.body()
TcpCommunicationSpi.java:2943
org.apache.ignite.internal.util.worker.GridWorker.run() GridWorker.java:110
java.lang.Thread.run() Thread.java:745

I see references to Shmem; FYI the endpoint that is configured is a TCP
endpoint.

P.P.S. Forget the 21 seconds above: the thread was still hung after an hour.




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Execution-hangs-at-IgniteRDD-savePairs-tp2615.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Execution hangs at IgniteRDD.savePairs()

Posted by Shane Kinsella <sh...@aspect.com>.
Val,

That has worked. I haven’t frozen since. Cheers!

Shane



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Execution-hangs-at-IgniteRDD-savePairs-tp2615p2638.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Execution hangs at IgniteRDD.savePairs()

Posted by vkulichenko <va...@gmail.com>.
Hi Shane,

This shared memory call is for communication between Ignite nodes, not for
the endpoint. It can cause stability issues in some cases, so can please you
try to add the snippet below to your Ignite configuration file and check if
it helps? This will completely disable shared memory and switch to TCP.

<property name="communicationSpi">
    <bean
class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
        <property name="sharedMemoryPort" value="-1"/>
    </bean>
</property>

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Execution-hangs-at-IgniteRDD-savePairs-tp2615p2627.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.