You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ksenia Rybakova (JIRA)" <ji...@apache.org> on 2017/07/06 11:53:00 UTC

[jira] [Updated] (IGNITE-5707) Client can't resume streaming even after topology got stable during load test

     [ https://issues.apache.org/jira/browse/IGNITE-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ksenia Rybakova updated IGNITE-5707:
------------------------------------
    Description: 
Load test config:
- CacheRandomOperationBenchmark
- 8 clients, 48 servers at 8 hosts
- 26 physical caches of different types with different memory policies + 30 groups with 10 partitioned caches each + 20 groups with 10 replicated caches each. Total 526 caches.
- Preloading amount: 50K, key range: 60K
Complete configs are attached.

3 of 8 clients have following messages during preloading:
{noformat}
[12:17:56] (err) Failed to execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, false, false]][12:17:56] (err) Failed to
execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, true, false, false, false, false, false, false, false, false, false,
 false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false]][12:17:56] (err) Failed
to execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, true, false, false, false, false, false, false, false, false, fal
se, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false]]class org.apache.igni
te.IgniteCheckedException: DataStreamer request failed [node=16a20d0c-4009-4bfa-ad6e-0261d9e3b2a3]
        at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$Buffer.onResponse(DataStreamerImpl.java:1785)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$3.onMessage(DataStreamerImpl.java:333)
        at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
        at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
        at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
        at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.IgniteCheckedException: DataStreamer will retry data transfer at stable topology [reqTop=AffinityTopologyVersion [topVer=56, minorTopVer=0], topVer=AffinityTopologyVersion
[topVer=56, minorTopVer=1], node=remote]
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.localUpdate(DataStreamProcessor.java:343)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:301)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access$000(DataStreamProcessor.java:58)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor$1.onMessage(DataStreamProcessor.java:88)
        ... 7 more
{noformat}
2 drivers were able to resume streaming after some time, but 1 didn't (error messages continued to be printed). This driver had high heap utilization, that resulted in long GC pause. Finally it was considered failed by other nodes.

Also all clients have in log:
{noformat}
[2017-07-06 12:18:11,780][INFO ][exchange-worker-#41%null%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=56, minorTopVer=1], crd=false, evt=18, node=TcpDiscoveryNode [id=71174cde-6bcf-43b6-a97b-02e0c987a8da, addrs=[127.0.0.1, 172.25.1.31], sockAddrs=[testagent01.gridgain.local/172.25.1.31:0, /127.0.0.1:0], discPort=0, order=49, intOrder=0, lastExchangeTime=1499332605385, loc=true, ver=2.1.0#20170705-sha1:ad42f620, isClient=true], evtNode=TcpDiscoveryNode [id=71174cde-6bcf-43b6-a97b-02e0c987a8da, addrs=[127.0.0.1, 172.25.1.31], sockAddrs=[testagent01.gridgain.local/172.25.1.31:0, /127.0.0.1:0], discPort=0, order=49, intOrder=0, lastExchangeTime=1499332605385, loc=true, ver=2.1.0#20170705-sha1:ad42f620, isClient=true], customEvt=CacheAffinityChangeMessage [id=b5a8f271d51-f9bb8d96-c609-4de4-b32f-761c2a33ad10, topVer=AffinityTopologyVersion [topVer=48, minorTopVer=0], exchId=null, partsMsg=null, exchangeNeeded=true]]
[2017-07-06 12:18:12,284][WARN ][pool-5-thread-2][GridDhtPartitionTopologyImpl] Requested topology version does not match calculated diff, will require full iteration tocalculate mapping [topVer=AffinityTopologyVersion [topVer=56, minorTopVer=0], diffVer=AffinityTopologyVersion [topVer=56, minorTopVer=1]]
[2017-07-06 12:18:12,642][WARN ][sys-#56%null%][GridDhtPartitionTopologyImpl] Requested topology version does not match calculated diff, will require full iteration tocalculate mapping [topVer=AffinityTopologyVersion [topVer=56, minorTopVer=1], diffVer=AffinityTopologyVersion [topVer=56, minorTopVer=0]]
[2017-07-06 12:18:12,919][INFO ][exchange-worker-#41%null%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=56, minorTopVer=1], crd=false]
{noformat}

  was:
Load test config:
- CacheRandomOperationBenchmark
- 8 clients, 48 servers at 8 hosts
- 26 physical caches of different types with different memory policies + 30 groups with 10 partitioned caches each + 20 groups with 10 replicated caches each. Total 526 caches.
- Preloading amount: 50K, key range: 60K
Complete configs are attached.

3 of 8 clients have following messages during preloading:
{noformat}
[12:17:56] (err) Failed to execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, false, false]][12:17:56] (err) Failed to
execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, true, false, false, false, false, false, false, false, false, false,
 false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false]][12:17:56] (err) Failed
to execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, true, false, false, false, false, false, false, false, false, fal
se, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false]]class org.apache.igni
te.IgniteCheckedException: DataStreamer request failed [node=16a20d0c-4009-4bfa-ad6e-0261d9e3b2a3]
        at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$Buffer.onResponse(DataStreamerImpl.java:1785)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$3.onMessage(DataStreamerImpl.java:333)
        at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
        at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
        at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
        at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.IgniteCheckedException: DataStreamer will retry data transfer at stable topology [reqTop=AffinityTopologyVersion [topVer=56, minorTopVer=0], topVer=AffinityTopologyVersion
[topVer=56, minorTopVer=1], node=remote]
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.localUpdate(DataStreamProcessor.java:343)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:301)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access$000(DataStreamProcessor.java:58)
        at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor$1.onMessage(DataStreamProcessor.java:88)
        ... 7 more
{noformat}
2 drivers were able to resume streaming after some time, but 1 didn't (error messages continued to be printed). This driver had high heap utilization, that resulted in long GC pause. Finally it was considered failed by other nodes.

Also it was observed that all clients have in log:
{noformat}
[2017-07-06 12:18:11,780][INFO ][exchange-worker-#41%null%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=56, minorTopVer=1], crd=false, evt=18, node=TcpDiscoveryNode [id=71174cde-6bcf-43b6-a97b-02e0c987a8da, addrs=[127.0.0.1, 172.25.1.31], sockAddrs=[testagent01.gridgain.local/172.25.1.31:0, /127.0.0.1:0], discPort=0, order=49, intOrder=0, lastExchangeTime=1499332605385, loc=true, ver=2.1.0#20170705-sha1:ad42f620, isClient=true], evtNode=TcpDiscoveryNode [id=71174cde-6bcf-43b6-a97b-02e0c987a8da, addrs=[127.0.0.1, 172.25.1.31], sockAddrs=[testagent01.gridgain.local/172.25.1.31:0, /127.0.0.1:0], discPort=0, order=49, intOrder=0, lastExchangeTime=1499332605385, loc=true, ver=2.1.0#20170705-sha1:ad42f620, isClient=true], customEvt=CacheAffinityChangeMessage [id=b5a8f271d51-f9bb8d96-c609-4de4-b32f-761c2a33ad10, topVer=AffinityTopologyVersion [topVer=48, minorTopVer=0], exchId=null, partsMsg=null, exchangeNeeded=true]]
[2017-07-06 12:18:12,284][WARN ][pool-5-thread-2][GridDhtPartitionTopologyImpl] Requested topology version does not match calculated diff, will require full iteration tocalculate mapping [topVer=AffinityTopologyVersion [topVer=56, minorTopVer=0], diffVer=AffinityTopologyVersion [topVer=56, minorTopVer=1]]
[2017-07-06 12:18:12,642][WARN ][sys-#56%null%][GridDhtPartitionTopologyImpl] Requested topology version does not match calculated diff, will require full iteration tocalculate mapping [topVer=AffinityTopologyVersion [topVer=56, minorTopVer=1], diffVer=AffinityTopologyVersion [topVer=56, minorTopVer=0]]
[2017-07-06 12:18:12,919][INFO ][exchange-worker-#41%null%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=56, minorTopVer=1], crd=false]
{noformat}


> Client can't resume streaming even after topology got stable during load test
> -----------------------------------------------------------------------------
>
>                 Key: IGNITE-5707
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5707
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.1
>            Reporter: Ksenia Rybakova
>
> Load test config:
> - CacheRandomOperationBenchmark
> - 8 clients, 48 servers at 8 hosts
> - 26 physical caches of different types with different memory policies + 30 groups with 10 partitioned caches each + 20 groups with 10 replicated caches each. Total 526 caches.
> - Preloading amount: 50K, key range: 60K
> Complete configs are attached.
> 3 of 8 clients have following messages during preloading:
> {noformat}
> [12:17:56] (err) Failed to execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, false, false]][12:17:56] (err) Failed to
> execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, true, false, false, false, false, false, false, false, false, false,
>  false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false]][12:17:56] (err) Failed
> to execute compound future reducer: GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[true, true, false, false, false, false, false, false, false, false, fal
> se, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false]]class org.apache.igni
> te.IgniteCheckedException: DataStreamer request failed [node=16a20d0c-4009-4bfa-ad6e-0261d9e3b2a3]
>         at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$Buffer.onResponse(DataStreamerImpl.java:1785)
>         at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$3.onMessage(DataStreamerImpl.java:333)
>         at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
>         at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
>         at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
>         at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: class org.apache.ignite.IgniteCheckedException: DataStreamer will retry data transfer at stable topology [reqTop=AffinityTopologyVersion [topVer=56, minorTopVer=0], topVer=AffinityTopologyVersion
> [topVer=56, minorTopVer=1], node=remote]
>         at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.localUpdate(DataStreamProcessor.java:343)
>         at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:301)
>         at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access$000(DataStreamProcessor.java:58)
>         at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor$1.onMessage(DataStreamProcessor.java:88)
>         ... 7 more
> {noformat}
> 2 drivers were able to resume streaming after some time, but 1 didn't (error messages continued to be printed). This driver had high heap utilization, that resulted in long GC pause. Finally it was considered failed by other nodes.
> Also all clients have in log:
> {noformat}
> [2017-07-06 12:18:11,780][INFO ][exchange-worker-#41%null%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=56, minorTopVer=1], crd=false, evt=18, node=TcpDiscoveryNode [id=71174cde-6bcf-43b6-a97b-02e0c987a8da, addrs=[127.0.0.1, 172.25.1.31], sockAddrs=[testagent01.gridgain.local/172.25.1.31:0, /127.0.0.1:0], discPort=0, order=49, intOrder=0, lastExchangeTime=1499332605385, loc=true, ver=2.1.0#20170705-sha1:ad42f620, isClient=true], evtNode=TcpDiscoveryNode [id=71174cde-6bcf-43b6-a97b-02e0c987a8da, addrs=[127.0.0.1, 172.25.1.31], sockAddrs=[testagent01.gridgain.local/172.25.1.31:0, /127.0.0.1:0], discPort=0, order=49, intOrder=0, lastExchangeTime=1499332605385, loc=true, ver=2.1.0#20170705-sha1:ad42f620, isClient=true], customEvt=CacheAffinityChangeMessage [id=b5a8f271d51-f9bb8d96-c609-4de4-b32f-761c2a33ad10, topVer=AffinityTopologyVersion [topVer=48, minorTopVer=0], exchId=null, partsMsg=null, exchangeNeeded=true]]
> [2017-07-06 12:18:12,284][WARN ][pool-5-thread-2][GridDhtPartitionTopologyImpl] Requested topology version does not match calculated diff, will require full iteration tocalculate mapping [topVer=AffinityTopologyVersion [topVer=56, minorTopVer=0], diffVer=AffinityTopologyVersion [topVer=56, minorTopVer=1]]
> [2017-07-06 12:18:12,642][WARN ][sys-#56%null%][GridDhtPartitionTopologyImpl] Requested topology version does not match calculated diff, will require full iteration tocalculate mapping [topVer=AffinityTopologyVersion [topVer=56, minorTopVer=1], diffVer=AffinityTopologyVersion [topVer=56, minorTopVer=0]]
> [2017-07-06 12:18:12,919][INFO ][exchange-worker-#41%null%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=56, minorTopVer=1], crd=false]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)