You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Stanislav Lukyanov <st...@gmail.com> on 2018/10/12 17:34:15 UTC

RE: Thin client vs client node performance in Spark

On your questions
> 1) How does one increase write throuput without increasing number of clients (the server nodes are underutilized at the moment)
Actually, adding more clients is the supposed way of increasing throughput if servers have capacity.

> 2) We have use cases where we many have many clients writing from different sources
That’s fine.

In general, thick clients have better performance than thin ones.
The reason is exactly what you said – thick clients use affinity-based p2p communication, and thin clients always go through a node they’re connected to.
The only exception is C++ thin client which has “best-effort affinity” – it’ll try to distribute requests according to affinity, although it may miss when topology changes.
AFAIK best-effort affinity is supposed to be added to other clients as well, eventually.

The problem with thick clients is not having a lot of them, it’s restarting them.
Connecting with a thick client is longer than with a thin one, and when a client starts or stops it affects the whole cluster
(because the topology changes are global).
AFAIU in your case you’re restarting the clients and that’s impacts the performance.

Stan

From: eugene miretsky
Sent: 24 августа 2018 г. 19:45
To: user@ignite.apache.org
Subject: Re: Thin client vs client node performance in Spark

So I decreased the number of spark executors to 2, and the problem went away. 
However, what's the general guidline about number of nodes/clients that can write to the cluster at the same time? 
1) How does one increase write throuput without increasing number of clients (the server nodes are underutilized at the moment)
2) We have use cases where we many have many clients writing from different sources

Cheers,
Eugene

On Fri, Aug 24, 2018 at 11:51 AM, eugene miretsky <eu...@gmail.com> wrote:
Attached is the error I get from ignitevisorcmd.sh after calling the cache command (the command just hangs). 
To me it looks like all the spark executrors (10 in my test) start a new client node, and some of those nodes get terminated and restarted as the executor die. This seems to really confuse Ignite. 

[15:45:10,741][INFO][grid-nio-worker-tcp-comm-0-#23%console%][TcpCommunicationSpi] Established outgoing communication connection [locAddr=/127.0.0.1:40984, rmtAddr=/127.0.0.1:47101]
[15:45:10,741][INFO][grid-nio-worker-tcp-comm-1-#24%console%][TcpCommunicationSpi] Established outgoing communication connection [locAddr=/127.0.0.1:49872, rmtAddr=/127.0.0.1:47100]
[15:45:10,742][INFO][grid-nio-worker-tcp-comm-3-#26%console%][TcpCommunicationSpi] Established outgoing communication connection [locAddr=/127.0.0.1:40988, rmtAddr=/127.0.0.1:47101]
[15:45:10,743][INFO][grid-nio-worker-tcp-comm-1-#24%console%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/127.0.0.1:47101, rmtAddr=/127.0.0.1:40992]
[15:45:10,745][INFO][grid-nio-worker-tcp-comm-0-#23%console%][TcpCommunicationSpi] Established outgoing communication connection [locAddr=/127.0.0.1:49876, rmtAddr=/127.0.0.1:47100]
[15:45:11,725][SEVERE][grid-nio-worker-tcp-comm-2-#25%console%][TcpCommunicationSpi] Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=2, bytesRcvd=180, bytesSent=18, bytesRcvd0=18, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-2, igniteInstanceName=console, finished=false, hashCode=1827979135, interrupted=false, runner=grid-nio-worker-tcp-comm-2-#25%console%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=166400 cap=166400], readBuf=java.nio.DirectByteBuffer[pos=18 lim=18 cap=117948], inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/172.21.85.37:39942, rmtAddr=ip-172-21-85-213.ap-south-1.compute.internal/172.21.85.213:47100, createTime=1535125510724, closeTime=0, bytesSent=0, bytesRcvd=18, bytesSent0=0, bytesRcvd0=18, sndSchedTime=1535125510724, lastSndTime=1535125510724, lastRcvTime=1535125510724, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser@7ae6182a, directMode=true], GridConnectionBytesVerifyFilter], accepted=false]]]
java.lang.NullPointerException
        at org.apache.ignite.internal.util.nio.GridNioServer.cancelConnect(GridNioServer.java:885)
        at org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$SingleAddressConnectFuture.cancel(TcpCommunicationConnectionCheckFuture.java:338)
        at org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$MultipleAddressesConnectFuture.cancelFutures(TcpCommunicationConnectionCheckFuture.java:475)
        at org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$MultipleAddressesConnectFuture.receivedAddressStatus(TcpCommunicationConnectionCheckFuture.java:494)
        at org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$MultipleAddressesConnectFuture$1.onStatusReceived(TcpCommunicationConnectionCheckFuture.java:433)
        at org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$SingleAddressConnectFuture.finish(TcpCommunicationConnectionCheckFuture.java:362)
        at org.apache.ignite.spi.communication.tcp.internal.TcpCommunicationConnectionCheckFuture$SingleAddressConnectFuture.onConnected(TcpCommunicationConnectionCheckFuture.java:348)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2.onMessage(TcpCommunicationSpi.java:773)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2.onMessage(TcpCommunicationSpi.java:383)
        at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onMessageReceived(GridNioFilterChain.java:279)
        at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)
        at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onMessageReceived(GridNioCodecFilter.java:117)
        at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)
        at org.apache.ignite.internal.util.nio.GridConnectionBytesVerifyFilter.onMessageReceived(GridConnectionBytesVerifyFilter.java:88)
        at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)
        at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onMessageReceived(GridNioServer.java:3490)


On Fri, Aug 24, 2018 at 11:18 AM, eugene miretsky <eu...@gmail.com> wrote:
 Thanks, 

So the way I understand it, thick client will use the affinitly key to send data to the right node, and hence will split the traiffic between all the nodes, the thin client will just send the data to one node, and that node will be responsible to send it to the actual node that owns the 'shard'? 

I keep getting the following error when using the Spark driver, the driver keeps writing, but very slowly. Any idea what is causing the error, or how to fix it? 

Cheers,
Eugene

"
[15:04:58,030][SEVERE][data-streamer-stripe-10-#43%Server%][DataStreamProcessor] Failed to respond to node [nodeId=78af5d88-cbfa-4529-aaee-ff4982985cdf, res=DataStreamerResponse [reqId=192, forceLocDep=true]]
class org.apache.ignite.IgniteCheckedException: Failed to send message (node may have left the grid or TCP connection cannot be established due to firewall issues) [node=ZookeeperClusterNode [id=78af5d88-cbfa-4529-aaee-ff4982985cdf, addrs=[127.0.0.1], order=377, loc=false, client=true], topic=T1 [topic=TOPIC_DATASTREAM, id=b8d675c6561-78af5d88-cbfa-4529-aaee-ff4982985cdf], msg=DataStreamerResponse [reqId=192, forceLocDep=true], policy=9]
        at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1651)
        at org.apache.ignite.internal.managers.communication.GridIoManager.sendToCustomTopic(GridIoManager.java:1703)
        at org.apache.ignite.internal.managers.communication.GridIoManager.sendToCustomTopic(GridIoManager.java:1673)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.sendResponse(DataStreamProcessor.java:440)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.localUpdate(DataStreamProcessor.java:402)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:305)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access$000(DataStreamProcessor.java:60)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor$1.onMessage(DataStreamProcessor.java:90)
        at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
        at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
        at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
        at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
        at org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:511)
        at java.lang.Thread.run(Thread.java:748)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send message to remote node: ZookeeperClusterNode [id=78af5d88-cbfa-4529-aaee-ff4982985cdf, addrs=[127.0.0.1], order=377, loc=false, client=true]
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2718)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2651)
        at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1643)
        ... 13 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each ComputeTask and cache Transaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=78af5d88-cbfa-4529-aaee-ff4982985cdf, addrs=[/127.0.0.1:47101]]
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3422)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2958)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2841)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2692)
        ... 15 more
        Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address [addr=/127.0.0.1:47101, err=Connection refused]
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3425)
                ... 18 more
        Caused by: java.net.ConnectException: Connection refused
                at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
                at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
                at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111)
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3262)
                ... 18 more

"

On Tue, Aug 14, 2018 at 4:39 PM, akurbanov <an...@gmail.com> wrote:
Hi,

Spark integration was implemented before java thin client was released and
thick client performs better than thin one in general. Is your question
related to existence of benchmarks for thin vs thick clients in Spark
integration or just a comparison of these two options?

Thin clients' functionality is limited compared to thick client, also it
generally should be a bit slower as it is communicates not with whole
cluster, but only with a single node and is not partition-aware. This
introduces additional network costs which may affect performance compared to
thick client in the simplest and ideal conditions where network transfer is
a major part of workload.

However this performance decrease may be completely irrelevant depending on
use case and workload, so you should always measure peformance and do
benchmarks for a specific use case and make a decision which option suits
your needs more.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/