You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by atricuix <as...@gmail.com> on 2017/02/24 04:25:44 UTC

Failure to start a bigger ignite cluster.

hi,we have a 4 member ignite cluster starting up well. but when we start
extending above and as we add more nodes, we have couple of situations.1.
Adding more nodes to the cluster takes time - sometimes upto
180-210seconds.2. Post 7 nodes - we can't add more - nodeCount oscilates
between 7 and 6. The errors observed in one of the nodes are either"Node
FAILED: TcpDiscoveryNode [id=01eb4646-67ca-4305-a2e8-8f100c958cea,
addrs=[127.0.0.1, 192.168.10.21], sockAddrs=[/192.168.10.21:47500,
/127.0.0.1:47500], discPort=47500, order=218, intOrder=113,
lastExchangeTime=1487907499639" or "Failed to wait for partition map
exchange ".we tried increasing ignite.failure.detection.timeout to 60000 but
in vain.Also played with ignite.enable.network.timeout=true / false and
ignite.network.timeout=30000.Searched the forums and found a similar post -
but it said this is fixed in ignite 1.8 which we useWe are using
ver=1.8.0#20161205..attached the log file from one of the nodes. 
ignite_out.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/n10852/ignite_out.txt> 
any suggestions or if we are missing something here we don't know, would be
a great help.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Failure-to-start-a-bigger-ignite-cluster-tp10852.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Failure to start a bigger ignite cluster.

Posted by atricuix <as...@gmail.com>.
Hi Andrey,

my bad. We had some other bad commits in there which we are resolving to
avoid continous restarts. We will put in these changes and let u know how it
goes.

Regards,
Aswin



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Failure-to-start-a-bigger-ignite-cluster-tp10852p10997.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Failure to start a bigger ignite cluster.

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi Aswin,


What do you mean? JVM fails and starts again?
Would you please share logs?

On Wed, Mar 1, 2017 at 7:24 PM, atricuix <as...@gmail.com> wrote:

> Hi Andrey,
>
> We changed to set setLocalPortRange(0) and and disabled sharedMemory. The
> JVM is doing continous restarts now with this change.
>
> Regards,
> Aswin
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/Failure-to-start-a-bigger-ignite-
> cluster-tp10852p10972.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>



-- 
Best regards,
Andrey V. Mashenkov

Re: Failure to start a bigger ignite cluster.

Posted by atricuix <as...@gmail.com>.
Hi Andrey,

We changed to set setLocalPortRange(0) and and disabled sharedMemory. The
JVM is doing continous restarts now with this change. 

Regards,
Aswin



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Failure-to-start-a-bigger-ignite-cluster-tp10852p10972.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Failure to start a bigger ignite cluster.

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi Aswin,

As you are using one JVM per box, enabling\disabling sharedMemory should
not have effect as it shouldn't be used by Ignite. AFAIK, shared memory can
cause problems in some cases.

Why do you need port range >0 if each of Ignite instances have unique IP?
Ignite can scan all discovery port range to find other nodes.

On Wed, Mar 1, 2017 at 1:00 AM, atricuix <as...@gmail.com> wrote:

> Hi Andrey,
>
> Thanks for looking into this.
>
> A bit confused. We use only one JVM per box and each ignite server JVM has
> a
> different IP address.
> So by default localPort() also uses 100 ports starting 47100 / 47500. We
> have already opened the entire range in our ingress security rules.
>
> I did not understand the sharedMemory part. Are you suggesting to disable
> it
> as we use a VM?
>
> Thanks,
> Aswin
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/Failure-to-start-a-bigger-ignite-
> cluster-tp10852p10959.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>



-- 
Best regards,
Andrey V. Mashenkov

Re: Failure to start a bigger ignite cluster.

Posted by atricuix <as...@gmail.com>.
Hi Andrey,

Thanks for looking into this.

A bit confused. We use only one JVM per box and each ignite server JVM has a
different IP address.
So by default localPort() also uses 100 ports starting 47100 / 47500. We
have already opened the entire range in our ingress security rules.

I did not understand the sharedMemory part. Are you suggesting to disable it
as we use a VM?

Thanks,
Aswin



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Failure-to-start-a-bigger-ignite-cluster-tp10852p10959.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Failure to start a bigger ignite cluster.

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi Aswin,

You use different addresses for your nodes. Try to set localPortRange for
Discovery and Communication SPI.
Also you can try to disable shared memory by setting sharedMemoryPort=-1

On Tue, Feb 28, 2017 at 1:31 AM, atricuix <as...@gmail.com> wrote:

> Hi Andrey,
>
> Attached the logs and config as requested.
>
> Regards,
> Aswin
>
> igniteConfig.igniteConfig
> <http://apache-ignite-users.70518.x6.nabble.com/file/n10925/igniteConfig.
> igniteConfig>
> ignite.log
> <http://apache-ignite-users.70518.x6.nabble.com/file/n10925/ignite.log>
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/Failure-to-start-a-bigger-ignite-
> cluster-tp10852p10925.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>



-- 
Best regards,
Andrey V. Mashenkov

Re: Failure to start a bigger ignite cluster.

Posted by atricuix <as...@gmail.com>.
Hi Andrey,

Attached the logs and config as requested.

Regards,
Aswin

igniteConfig.igniteConfig
<http://apache-ignite-users.70518.x6.nabble.com/file/n10925/igniteConfig.igniteConfig>  
ignite.log
<http://apache-ignite-users.70518.x6.nabble.com/file/n10925/ignite.log>  



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Failure-to-start-a-bigger-ignite-cluster-tp10852p10925.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Failure to start a bigger ignite cluster.

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi,

Would you please attach full logs and grid configuration?

On Fri, Feb 24, 2017 at 10:54 AM, atricuix <as...@gmail.com> wrote:

> upon enabling debug logs - observing the below on the node which fails to
> start after the initial set of 6-7 nodes.
>
> {"@timestamp":"2017-02-24T02:47:46.913-05:00","@version":1,
> "message":"Caught
> exception on message read
> [sock=Socket[addr=/XXXX,port=57440,localport=47500],
> locNodeId=20fbcd3d-d502-48fd-ba4c-55a733717f03,
> rmtNodeId=69ace020-528c-4ed9-bf1c-9a98a8435e7f]","logger_
> name":"org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi"
> ,"thread_name":"tcp-disco-sock-reader-#6%null%","level":
> "ERROR","level_value":40000,"stack_trace":"org.apache.
> ignite.IgniteCheckedException:
> Failed to deserialize object with given class loader:
> org.springframework.boot.loader.LaunchedURLClassLoader@27716f4\n\tat
> org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(
> JdkMarshaller.java:128)\n\tat
> org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(
> AbstractNodeNameAwareMarshaller.java:94)\n\tat
> org.apache.ignite.internal.util.IgniteUtils.unmarshal(
> IgniteUtils.java:9724)\n\tat
> org.apache.ignite.spi.discovery.tcp.ServerImpl$
> SocketReader.body(ServerImpl.java:5764)\n\tat
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)\nCaused
> by: java.io.EOFException: null\n\tat
> java.io.ObjectInputStream$PeekInputStream.readFully(
> ObjectInputStream.java:2335)\n\tat
> java.io.ObjectInputStream$BlockDataInputStream.
> readShort(ObjectInputStream.java:2804)\n\tat
> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:802)\n\
> tat
> java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)\n\tat
> org.apache.ignite.marshaller.jdk.JdkMarshallerObjectInputStream.<init>(
> JdkMarshallerObjectInputStream.java:39)\n\tat
> org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(
> JdkMarshaller.java:118)\n\t...
> 4 common frames omitted\n","HOSTNAME":"XXXX"}
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/Failure-to-start-a-bigger-ignite-
> cluster-tp10852p10857.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>



-- 
Best regards,
Andrey V. Mashenkov

Re: Failure to start a bigger ignite cluster.

Posted by atricuix <as...@gmail.com>.
upon enabling debug logs - observing the below on the node which fails to
start after the initial set of 6-7 nodes.

{"@timestamp":"2017-02-24T02:47:46.913-05:00","@version":1,"message":"Caught
exception on message read
[sock=Socket[addr=/XXXX,port=57440,localport=47500],
locNodeId=20fbcd3d-d502-48fd-ba4c-55a733717f03,
rmtNodeId=69ace020-528c-4ed9-bf1c-9a98a8435e7f]","logger_name":"org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi","thread_name":"tcp-disco-sock-reader-#6%null%","level":"ERROR","level_value":40000,"stack_trace":"org.apache.ignite.IgniteCheckedException:
Failed to deserialize object with given class loader:
org.springframework.boot.loader.LaunchedURLClassLoader@27716f4\n\tat
org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:128)\n\tat
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:94)\n\tat
org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:9724)\n\tat
org.apache.ignite.spi.discovery.tcp.ServerImpl$SocketReader.body(ServerImpl.java:5764)\n\tat
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)\nCaused
by: java.io.EOFException: null\n\tat
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2335)\n\tat
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2804)\n\tat
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:802)\n\tat
java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)\n\tat
org.apache.ignite.marshaller.jdk.JdkMarshallerObjectInputStream.<init>(JdkMarshallerObjectInputStream.java:39)\n\tat
org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:118)\n\t...
4 common frames omitted\n","HOSTNAME":"XXXX"}



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Failure-to-start-a-bigger-ignite-cluster-tp10852p10857.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.