You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by F7753 <ma...@foxmail.com> on 2016/04/05 12:51:30 UTC

Why the client and server behaves like that?

Here I launched 3 ignite node using ${IGNITE_HOME}/bin/ignite.sh, but the
output of the control like below:
-------------------------------------------------------------------------------------------------------------
[18:51:09] Topology snapshot [ver=4, servers=3, clients=1, CPUs=96,
heap=53.0GB]
[18:51:16] Topology snapshot [ver=5, servers=3, clients=2, CPUs=96,
heap=100.0GB]
[18:51:16] Topology snapshot [ver=6, servers=3, clients=3, CPUs=96,
heap=150.0GB]
[18:51:16] Topology snapshot [ver=7, servers=3, clients=4, CPUs=96,
heap=200.0GB]
-------------------------------------------------------------------------------------------------------------
what does the server and client mean? I have one driver and 3 worker in my
spark cluster, and I run the  ${IGNITE_HOME}/bin/ignite.sh on my worker
node.
Then after a while, it throws:
-------------------------------------------------------------------------------------------------------------
[18:52:28,869][SEVERE][tcp-client-disco-reconnector-#8%null%][TcpDiscoverySpi]
Failed to reconnect
class org.apache.ignite.IgniteCheckedException: Failed to deserialize object
with given class loader: sun.misc.Launcher$AppClassLoader@26f44031
	at
org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal(JdkMarshaller.java:105)
	at
org.apache.ignite.spi.discovery.tcp.ClientImpl$Reconnector.body(ClientImpl.java:1213)
	at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
Caused by: java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:152)
	at java.net.SocketInputStream.read(SocketInputStream.java:122)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
	at
org.apache.ignite.marshaller.jdk.JdkMarshallerInputStreamWrapper.read(JdkMarshallerInputStreamWrapper.java:53)
	at
java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2310)
	at
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2323)
	at
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
	at
org.apache.ignite.marshaller.jdk.JdkMarshallerObjectInputStream.<init>(JdkMarshallerObjectInputStream.java:39)
	at
org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal(JdkMarshaller.java:100)
	... 2 more
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure:
Lost task 1.3 in stage 0.0 (TID 8, nobida144): class
org.apache.ignite.IgniteClientDisconnectedException: Client node
disconnected: null
	at
org.apache.ignite.internal.GridKernalGatewayImpl.readLock(GridKernalGatewayImpl.java:87)
	at org.apache.ignite.internal.IgniteKernal.guard(IgniteKernal.java:3017)
	at
org.apache.ignite.internal.IgniteKernal.getOrCreateCache(IgniteKernal.java:2467)
	at
org.apache.ignite.spark.impl.IgniteAbstractRDD.ensureCache(IgniteAbstractRDD.scala:35)
	at
org.apache.ignite.spark.IgniteRDD$$anonfun$savePairs$1.apply(IgniteRDD.scala:174)
	at
org.apache.ignite.spark.IgniteRDD$$anonfun$savePairs$1.apply(IgniteRDD.scala:170)
	at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
	at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
	at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
	at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
	at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
	at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
	at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
	at scala.Option.foreach(Option.scala:236)
	at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
	at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
	at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
	at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
	at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:920)
	at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:918)
	at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:918)
	at org.apache.ignite.spark.IgniteRDD.savePairs(IgniteRDD.scala:170)
	at main.scala.StreamingJoin$.main(StreamingJoin.scala:241)
	at main.scala.StreamingJoin.main(StreamingJoin.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
^C[18:53:01] Ignite node stopped OK [uptime=00:01:52:668]

-------------------------------------------------------------------------------------------------------------



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Why-the-client-and-server-behaves-like-that-tp3935.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Why the client and server behaves like that?

Posted by F7753 <ma...@foxmail.com>.

Thanks a lot to let me know that. I think I'd use some time to refer to the
ignite doc more carefully.
And I created another topic about the GC OOM in my cluster:
http://apache-ignite-users.70518.x6.nabble.com/How-to-end-up-the-GC-overhead-problem-in-the-IgniteRDD-tc3945.html




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Why-the-client-and-server-behaves-like-that-tp3935p3949.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Why the client and server behaves like that?

Posted by vkulichenko <va...@gmail.com>.

Hi,

Server is the node that can store the data. In your case you start them with
ignite.sh scripts. All clients are started automatically by IgniteContext -
one client per worker and the fourth one on the driver.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Why-the-client-and-server-behaves-like-that-tp3935p3942.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Why the client and server behaves like that?

Posted by F7753 <ma...@foxmail.com>.

I found that GC was the main problem in my circumstance, each of my node
throws the GC exception:
----------------------------------------------------------------------------------------------------------------
Exception in thread "shmem-worker-#175%null%" java.lang.OutOfMemoryError: GC
overhead limit exceeded
[05-Apr-2016 19:05:26][ERROR][shmem-worker-#176%null%][TcpCommunicationSpi]
Runtime error caught during grid runnable execution: ShmemWorker
[endpoint=IpcSharedMemoryClientEndpoint [inSpace=IpcSharedMemorySpace
[opSize=262144, shmemPtr=139664015134784, shmemId=2883604, semId=2392067,
closed=true, isReader=true, writerPid=11421, readerPid=11230,
tokFileName=/opt/apache-ignite-1.5.0.final-src/work/ipc/shmem/a3bcd536-31b5-47f6-b248-80f5a43e50dc-11230/gg-shmem-space-46-11421-262144,
closed=true], outSpace=IpcSharedMemorySpace [opSize=262144,
shmemPtr=139663894958144, shmemId=2916373, semId=2424836, closed=true,
isReader=false, writerPid=11230, readerPid=11421,
tokFileName=/opt/apache-ignite-1.5.0.final-src/work/ipc/shmem/a3bcd536-31b5-47f6-b248-80f5a43e50dc-11230/gg-shmem-space-47-11421-262144,
closed=true], checkIn=true, checkOut=true]]
java.lang.OutOfMemoryError: GC overhead limit exceeded
----------------------------------------------------------------------------------------------------------------



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Why-the-client-and-server-behaves-like-that-tp3935p3937.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.