You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Grégory Jevardat de Fombelle <gr...@unige.ch> on 2019/01/11 09:25:54 UTC

java.lang.OutOfMemoryError: Java heap space on server node during cache querying on same node by multiple clients nodes

Hello

We are facing this exception when multiple clients are trying to read a big cached object using the standard value = cache.get(key).

The cached object is a big serialized object that can reach hundreds of MB in size. The server node has 16GB of heap which should be fairly enough for this use case.

The setup to reproduce the issue is simple.

I launch one server node with 16GB heap
then one producer client node that populate the cache with this big object
then multiple Ignite consumer clients are simultaneously launched and get the cached value.

Result in my case I can launch 2 clients in parallel, but if fails with three.
If the clients are launched in sequence with enough idle time between them, there is no problem, the heap max size is not reached, given that heap is not requested by the network transfer.

I correlated the heap size augmentation with the serialisation process of the cached object on the network. It seems that the serialisation process consume heap memory at will until OOME happens when 2 many transfers are occurring in parallel.

So simply put it does scale at all, because I have same issue with large number of clients and servers. Even with 100 server nodes, at some point 2 or 3 clients will try to request on the same node which will trigger the OOME

What can I do to solve this issue in the very short term ?

Can I configure the network transfer on the caches to limit number of simultaneous request i.e a kind of queuing of cache get request per server node ?

In the long term we'll change the architecture to avoid the spawning of hundreds of simultaneous clients but in any case it would be nice to have a solution to this issue.

Thanks for your help.

Re: java.lang.OutOfMemoryError: Java heap space on server node during cache querying on same node by multiple clients nodes

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

We were able to debug underlying problem, which was that Communication will
hold references to those (large) messages once they were sent.

The solution for such cases where GridNioServer will hold on to large
messages is to decrease TcpCommunicationSpi#setAckSendThreshold
<https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/spi/communication/tcp/TcpCommunicationSpi.html#setAckSendThreshold-int->

By default it's 32 but something like 4 might help.

Regards,
-- 
Ilya Kasnacheev


пт, 11 янв. 2019 г. в 12:26, Grégory Jevardat de Fombelle <
gregory.jevardat@unige.ch>:

> Hello
>
> We are facing this exception when multiple clients are trying to read a
> big cached object using the standard  value = cache.get(key).
>
>
> The cached object is a big serialized  object that can reach hundreds of
> MB in size. The server node has 16GB of heap which should be fairly enough
> for this use case.
>
> The setup to reproduce the issue is simple.
>
>
>    - I launch one server node with 16GB heap
>    - then one producer client node that populate the cache with this big
>    object
>    - then multiple Ignite consumer clients are simultaneously launched
>    and get the cached value.
>
>
> Result in my case I can launch 2 clients in parallel, but if fails with
> three.
> If the clients are launched in sequence with enough idle time between
> them, there is no problem, the heap max size is not reached, given that
> heap is not requested by the network transfer.
>
>
> I correlated the heap size augmentation with the serialisation process of
> the cached object on the network. It seems that the serialisation process
> consume heap memory at will until OOME happens when 2 many transfers are
> occurring in parallel.
>
> So simply put it does scale at all, because I have same issue with large
> number of clients and servers. Even with 100 server nodes, at some point 2
> or 3 clients will try to request on the same node which will trigger the
> OOME
>
> What can I do to solve this issue in the very short term ?
>
>  Can I configure the network transfer on the caches to limit number of
> simultaneous request i.e a kind of queuing of cache get request per server
> node ?
>
> In the long term we'll change the architecture to avoid the spawning of
> hundreds of simultaneous clients but in any case it would be nice to have a
> solution to this issue.
>
>
> Thanks for your help.
>
>