You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by rushi_rashi <ru...@gmail.com> on 2017/04/18 14:42:53 UTC

what does remaps mean in DataStreamerImpl.java

Getting below error,
class org.apache.ignite.IgniteCheckedException: Failed to finish operation
(too many remaps): 32
at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$5.apply(DataStreamerImpl.java:863)

Caused by: class org.apache.ignite.IgniteCheckedException: DataStreamer
request failed [node=fc022bb4-a25a-4fb5-8acb-ac9aadbf5e2a]

I got this error when I am receiving the data from kafka in apache ignite
and in between I kill one node using kill -9.
Still looking for the reason, if it is due to topology unstable or cache
rebalancing or Expiry/eviction policy (which i have on one cache).
But my question is in the DataStreamerImpl code what it is trying to remap?




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/what-does-remaps-mean-in-DataStreamerImpl-java-tp12033.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: what does remaps mean in DataStreamerImpl.java

Posted by vdpyatkov <vl...@gmail.com>.

Although, I'm not right (it should by remaper).
Could you provide reproducer?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/what-does-remaps-mean-in-DataStreamerImpl-java-tp12033p12079.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: what does remaps mean in DataStreamerImpl.java

Posted by vdpyatkov <vl...@gmail.com>.

Hi,

I have see that your grid suffer from long running operations:

[19:20:19,285][WARNING][grid-timeout-worker-#63%null%][GridCachePartitionExchangeManager]
Found long running cache future [startTime=19:18:55.302,
curTime=19:20:19.280, fut=GridDhtAtomicSingleUpdateFuture
[key=MyProject.TokenSecondMap [idHash=1526386777, hash=2035509954,
processingSecond=1174038983, token=38888], nearReaderEntry=null]]
[19:20:19,286][WARNING][grid-timeout-worker-#63%null%][GridCachePartitionExchangeManager]
Found long running cache future [startTime=19:18:55.302,
curTime=19:20:19.280,
 ...

by that reason it can not changing topology fast.

This is topology before killing nodes:

[19:17:42,341][INFO][disco-event-worker-#100%null%][GridDiscoveryManager]
Topology snapshot [ver=30, servers=30, clients=0, CPUs=168, heap=120.0GB]

and the same minor topology version after remaping failed out:

[19:20:41,302][WARNING][exchange-worker-#103%null%][GridCachePartitionExchangeManager]
Ready affinity version: AffinityTopologyVersion [topVer=30, minorTopVer=10]

But actual exchange future have required 33 version already:

[topVer=AffinityTopologyVersion [topVer=33, minorTopVer=0], nodeId=7484419b,
evt=NODE_FAILED]

Remap was failed, because grid had not been in time to change topology.

You should avoid message about long running operations (reduce the load or
change logic), otherwise exchange will going slow and you got the exception.
Also, long exchange can hang whole grid, not only lead to the streamer
exception.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/what-does-remaps-mean-in-DataStreamerImpl-java-tp12033p12440.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: what does remaps mean in DataStreamerImpl.java

Posted by rishi007bansod <ri...@gmail.com>.

Hi,
    I have attached log file for one ignite node  ignite-b6321e0a.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/n12414/ignite-b6321e0a.zip>  

Thanks



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/what-does-remaps-mean-in-DataStreamerImpl-java-tp12033p12414.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: what does remaps mean in DataStreamerImpl.java

Posted by vdpyatkov <vl...@gmail.com>.

At first if you are having only one backup you surely data lose when kill 3
nodes (you can to do that when only kill one by one and waiting rebalancing
complete after each).

Could you please attache a full log file at leas one node where the remap
failed messages are present?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/what-does-remaps-mean-in-DataStreamerImpl-java-tp12033p12222.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: what does remaps mean in DataStreamerImpl.java

Posted by rishi007bansod <ri...@gmail.com>.

Hi,
       In our case we killed 3 ignite nodes among the 30 nodes we have
started in grid to check fault tolerance of ignite i.e. whether remaining 27
nodes remap to kafka consumers or not. fe6409cb-88a2-43da-9eb7-9b17cf5debcb
node is one of the nodes we have killed to check fault tolerance. Another
thing, kafka consumers keep consuming messages when these 3 nodes are killed
and we get *Data streamer has been closed* message displayed continuously
along with *class org.apache.ignite.IgniteCheckedException: Failed to finish
operation (too many remaps): 32 * error.


Thanks,
Rishikesh



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/what-does-remaps-mean-in-DataStreamerImpl-java-tp12033p12213.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: what does remaps mean in DataStreamerImpl.java

Posted by vdpyatkov <vl...@gmail.com>.

How often sessions are timedouted?

Is it node have been alive fe6409cb-88a2-43da-9eb7-9b17cf5debcb when message
are appear?
This seem as topology are breaks down.

Please provide all logs from each nodes.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/what-does-remaps-mean-in-DataStreamerImpl-java-tp12033p12202.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: what does remaps mean in DataStreamerImpl.java

Posted by rishi007bansod <ri...@gmail.com>.

*Hi,
   In our initial setting we have kept allowoverwrite to default i.e false.
Now we tried setting allowoverwrite property of ignitedatastreamer to
"true". Initially when we killed ignite instances we got following error.
Due to this error some of ignite nodes/instances got disconnected.
*
[17:40:30,386][WARNING][grid-nio-worker-tcp-comm-0-#65%null%][TcpCommunicationSpi]
Communication SPI session write timed out (consider increasing
'socketWriteTimeout' configuration property)
[remoteAddr=/192.168.140.44:42370, writeTimeout=2000]
[17:40:42,397][INFO][grid-timeout-worker-#63%null%][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=ced80027, name=null, uptime=00:16:00:157]
    ^-- H/N/C [hosts=3, nodes=24, CPUs=168]
    ^-- CPU [cur=0.13%, avg=5%, GC=0%]
    ^-- Heap [used=2929MB, free=28.48%, comm=4096MB]
    ^-- Non heap [used=94MB, free=93.77%, comm=96MB]
    ^-- Public thread pool [active=0, idle=56, qSize=0]
    ^-- System thread pool [active=0, idle=55, qSize=0]
    ^-- Outbound messages queue [size=13]
[17:41:06,346][WARNING][grid-timeout-worker-#63%null%][GridCachePartitionExchangeManager]
Found long running cache future [startTime=17:32:16.736,
curTime=17:41:06.340, fut=GridDhtAtomicSingleUpdateFuture
[key=KeyCacheObjectImpl [val=8563779624, hasValBytes=true],
nearReaderEntry=null]]
[17:41:06,347][WARNING][grid-timeout-worker-#63%null%][GridCachePartitionExchangeManager]
Found long running cache future [startTime=17:33:11.642,
curTime=17:41:06.340, fut=GridDhtAtomicSingleUpdateFuture
[key=KeyCacheObjectImpl [val=9092344352, hasValBytes=true],
nearReaderEntry=null]]
[17:41:06,347][WARNING][grid-timeout-worker-#63%null%][GridCachePartitionExchangeManager]
Found long running cache future [startTime=17:34:20.429,
curTime=17:41:06.340, fut=GridDhtAtomicSingleUpdateFuture
[key=KeyCacheObjectImpl [val=9569548832, hasValBytes=true],
nearReaderEntry=null]]
[17:41:06,347][WARNING][grid-timeout-worker-#63%null%][GridCachePartitionExchangeManager]
Found long running cache future [startTime=17:32:38.946,
curTime=17:41:06.340, fut=GridDhtAtomicSingleUpdateFuture
[key=KeyCacheObjectImpl [val=8278360104, hasValBytes=true],
nearReaderEntry=null]]
[17:41:06,347][WARNING][grid-timeout-worker-#63%null%][GridCachePartitionExchangeManager]
Found long running cache future [startTime=17:32:16.736,
curTime=17:41:06.340, fut=GridDhtAtomicSingleUpdateFuture
[key=KeyCacheObjectImpl [val=8514297071, hasValBytes=true],
nearReaderEntry=null]]
[17:41:06,347][WARNING][grid-timeout-worker-#63%null%][GridCachePartitionExchangeManager]
Found long running cache future [startTime=17:32:26.144,
curTime=17:41:06.340, fut=GridDhtAtomicSingleUpdateFuture
[key=KeyCacheObjectImpl [val=8099247144, hasValBytes=true],
nearReaderEntry=null]]
[17:41:06,347][WARNING][grid-timeout-worker-#63%null%][GridCachePartitionExchangeManager]
Found long running cache future [startTime=17:32:46.651,
curTime=17:41:06.340, fut=GridDhtAtomicSingleUpdateFuture
[key=KeyCacheObjectImpl [val=8399928360, hasValBytes=true],
nearReaderEntry=null]]
[17:41:06,347][WARNING][grid-timeout-worker-#63%null%][GridCachePartitionExchangeManager]
Found long running cache future [startTime=17:32:32.752,
curTime=17:41:06.340, fut=GridDhtAtomicSingleUpdateFuture
[key=KeyCacheObjectImpl [val=8326161647, hasValBytes=true],
nearReaderEntry=null]]
[17:41:06,347][WARNING][grid-timeout-worker-#63%null%][GridCachePartitionExchangeManager]
Found long running cache future [startTime=17:33:11.642,
curTime=17:41:06.340, fut=GridDhtAtomicSingleUpdateFuture
[key=KeyCacheObjectImpl [val=9185813024, hasValBytes=true],
nearReaderEntry=null]]
[17:41:06,347][WARNING][grid-timeout-worker-#63%null%][GridCachePartitionExchangeManager]
Found long running cache future [startTime=17:36:02.190,
curTime=17:41:06.340, fut=GridDhtAtomicSingleUpdateFuture
[key=KeyCacheObjectImpl [val=8426645544, hasValBytes=true],
nearReaderEntry=null]]
[17:41:06,347][WARNING][grid-timeout-worker-#63%null%][GridCachePartitionExchangeManager]
Found long running cache future [startTime=17:32:23.224,
curTime=17:41:06.340, fut=GridDhtAtomicSingleUpdateFuture
[key=KeyCacheObjectImpl [val=8103641327, hasValBytes=true],
nearReaderEntry=null]]


*So we increased parameter "socketWriteTimeout" to 20 seconds. But now again
we got following remaps error,
*



[19:20:27,185][SEVERE][sys-#338%null%][DataStreamerImpl] DataStreamer
operation failed.
class org.apache.ignite.IgniteCheckedException: Failed to finish operation
(too many remaps): 32
        at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$5.apply(DataStreamerImpl.java:863)
        at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$5.apply(DataStreamerImpl.java:828)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter$ArrayListener.apply(GridFutureAdapter.java:456)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter$ArrayListener.apply(GridFutureAdapter.java:439)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:259)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:389)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:355)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:343)
        at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$Buffer.submit(DataStreamerImpl.java:1716)
        at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$Buffer.update(DataStreamerImpl.java:1416)
        at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.load0(DataStreamerImpl.java:932)
        at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.access$1100(DataStreamerImpl.java:121)
        at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$5$1.run(DataStreamerImpl.java:876)
        at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$5$2.call(DataStreamerImpl.java:903)
        at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$5$2.call(DataStreamerImpl.java:891)
        at
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6605)
        at
org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:925)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: class
org.apache.ignite.internal.cluster.ClusterTopologyCheckedException: Failed
to send request (node has left): fe6409cb-88a2-43da-9eb7-9b17cf5debcb
        ... 13 more
[19:20:27,226][SEVERE][pool-3-thread-1][] Message is ignored due to an error
[msg=MessageAndMetadata(test24042017_14,3,Message(magic = 1, attributes = 0,
CreateTime = -1, crc = 127561807, key = java.nio.HeapByteBuffer[pos=0 lim=4
cap=527], payload = java.nio.HeapByteBuffer[pos=0 lim=519
cap=519]),1426777,kafka.serializer.DefaultDecoder@77fcbb26,kafka.serializer.DefaultDecoder@5ca3dd48,-1,CreateTime)]
java.lang.IllegalStateException: Data streamer has been closed.
        at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.enterBusy(DataStreamerImpl.java:406)
        at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.addDataInternal(DataStreamerImpl.java:613)
        at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.addData(DataStreamerImpl.java:667)
        at
tcs.poc.surveillance.SurveillanceAlert$2.extract(SurveillanceAlert.java:496)
        at
tcs.poc.surveillance.SurveillanceAlert$2.extract(SurveillanceAlert.java:1)
        at
org.apache.ignite.stream.StreamAdapter.addMessage(StreamAdapter.java:181)
        at
org.apache.ignite.stream.kafka.KafkaStreamer.access$100(KafkaStreamer.java:47)
        at
org.apache.ignite.stream.kafka.KafkaStreamer$1.run(KafkaStreamer.java:156)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/what-does-remaps-mean-in-DataStreamerImpl-java-tp12033p12196.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: what does remaps mean in DataStreamerImpl.java

Posted by Vladislav Pyatkov <vl...@gmail.com>.

What is the value IgniteDataStreamer#allowOverwrite(boolean)?
The property changes of algorithm of work DataStreamer, in particular order
of remap node.
Try to set it true.

On Wed, Apr 19, 2017 at 7:29 PM, rishi007bansod <ri...@gmail.com>
wrote:

> Hi Below is our architecture,
>
> 1. Ignite receives data via Kafka Streamer
> 2. Tuple Extractor is implemented in ignite code
> Everything works fine till this step.
> 3. We stop kafka. No error yet.
> 4. We kill 2 instance (out of n instance) of ignite.
> 5. Kafka Consumer remapping also happens without any issue.
> 6. Cache rebalancing also seems completed (via log)
> 6. Post 2 mins, kafka is again started.
> And then we get the below error
> class org.apache.ignite.IgniteCheckedException: Failed to finish operation
> (too many remaps): 32
>
> Even after Cache rebalancing and consumer mapping seems completed and when
> ignite receives new data through kafka then only error is coming.
> So not able to understand what remap ignite is doing?
>
> One point just for info, we are using Data streamer to add data in one
> cache
> and ignite visitor to process data.
> All caches are partitioned and have backup set to 1.
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/what-does-remaps-mean-in-DataStreamerImpl-java-
> tp12033p12085.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>



-- 
Vladislav Pyatkov

Re: what does remaps mean in DataStreamerImpl.java

Posted by rishi007bansod <ri...@gmail.com>.

Hi Below is our architecture,

1. Ignite receives data via Kafka Streamer
2. Tuple Extractor is implemented in ignite code
Everything works fine till this step.
3. We stop kafka. No error yet.
4. We kill 2 instance (out of n instance) of ignite. 
5. Kafka Consumer remapping also happens without any issue.
6. Cache rebalancing also seems completed (via log)
6. Post 2 mins, kafka is again started.
And then we get the below error
class org.apache.ignite.IgniteCheckedException: Failed to finish operation
(too many remaps): 32

Even after Cache rebalancing and consumer mapping seems completed and when
ignite receives new data through kafka then only error is coming.
So not able to understand what remap ignite is doing?

One point just for info, we are using Data streamer to add data in one cache
and ignite visitor to process data.
All caches are partitioned and have backup set to 1.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/what-does-remaps-mean-in-DataStreamerImpl-java-tp12033p12085.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: what does remaps mean in DataStreamerImpl.java

Posted by rishi007bansod <ri...@gmail.com>.

Hi, the backup is in our case is 1.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/what-does-remaps-mean-in-DataStreamerImpl-java-tp12033p12083.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: what does remaps mean in DataStreamerImpl.java

Posted by vdpyatkov <vl...@gmail.com>.

Hi,

How many backups (o.a.i.configuration.CacheConfiguration#setBackups) do you
use?
If your cluster does not contain backups, a batch will not remaped, until
rebalance finished.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/what-does-remaps-mean-in-DataStreamerImpl-java-tp12033p12078.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: what does remaps mean in DataStreamerImpl.java

Posted by rushi_rashi <ru...@gmail.com>.

1. So considering example from post 1, does it mean that when an ignite
instance was killed, Data streamer had some data which it was going to put
in to cache but that cache's instance was killed and hence the error might
have occurred.
2. If not, than can you throw some light on the remap error from post 1
because we are trying to check fault tolerance but whenever ignite instance
is killed then the remap error occurs. We are unable to identify the root
cause.

Thanks,
Rushi



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/what-does-remaps-mean-in-DataStreamerImpl-java-tp12033p12068.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: what does remaps mean in DataStreamerImpl.java

Posted by vdpyatkov <vl...@gmail.com>.

Hi,

DataStreamer packs all cache operations to batch request to grid server
node. If this batch was not executed correct in particular node, it will by
remapped to other.

Maximum of remaps cunt is 32 (DFLT_MAX_REMAP_CNT).



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/what-does-remaps-mean-in-DataStreamerImpl-java-tp12033p12036.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: what does remaps mean in DataStreamerImpl.java

Posted by rushi_rashi <ru...@gmail.com>.

I had subscribed to mailing list as per the mail I have received but still
the query has not accepted by mailing list.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/what-does-remaps-mean-in-DataStreamerImpl-java-tp12033p12034.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.