You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Ilya Kasnacheev <il...@gmail.com> on 2020/02/03 10:41:08 UTC

Re: Partition/Fault tolerance and availability of Apache Ignite

Hello!

Can you please show more logs/full stack trace?

Data Streamer is not especially fault tolerant, but it should survive a
server node leaving.

How many backups do you have? What is partition loss policy?

Regards,
-- 
Ilya Kasnacheev


пт, 31 янв. 2020 г. в 11:02, userx <ga...@gmail.com>:

> Hi team,
>
> I performed a simple check of CAP theorem on an Apache Ignite cluster and
> observed a few things related to tolerance and availability of the system.
>
> Here are the steps
> 1) Created a cluster of three Ignite servers - S1, S2, S3, say S1 is
> started
> first so it is the coordinator.
> 2) Topology version : 3
> 3) 13 clients (C1 to C13) connect to the cluster say, sporadically
> 4) Topology version: 16 = 3+13
>
> Let's say the clients start writing into their respective distinct caches.
> After 7 or 8 minutes into this activity, I kill S2 by doing a kill -9. What
> I have observed is that I start getting the following errors for any cache
> writes occurring afterwards
>
> 50008_116305_11951_2_12472_978_1_0_2 javax.cache.CacheException: class
> org.apache.ignite.IgniteCheckedException: Some of DataStreamer operations
> failed [failedCount=1]
>         at
>
> org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
>         at
>
> org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.close(DataStreamerImpl.java:1287)
>         at
>
> org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.close(DataStreamerImpl.java:1388)
>         at
> com.abc.datagrid.DataGridClient.writeAll(DataGridClient.java:209)
>
> Therefore the observation is that it is not partition or fault tolerant and
> in such a situation, rest of the cluster does not seem available for
> writing.
>
> Can someone throw some light here ? I can share more logs.
>
>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Partition/Fault tolerance and availability of Apache Ignite

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

Can you please collect complete logs from all nodes, not just a screenful
of logs?

Regards,
-- 
Ilya Kasnacheev


вт, 4 февр. 2020 г. в 09:20, userx <ga...@gmail.com>:

> Fault_tolerance.zip
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t1165/Fault_tolerance.zip>
>
>
> Hi Ilya,
>
> Thank you for your reply, I have attached the logs of the coordinator(S1)
> and S3. During the demonstration, I killed S2.
>
> I do not have a partition loss policy which means by default it should be
> IGNORE.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Partition/Fault tolerance and availability of Apache Ignite

Posted by userx <ga...@gmail.com>.

Fault_tolerance.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1165/Fault_tolerance.zip>  

Hi Ilya,

Thank you for your reply, I have attached the logs of the coordinator(S1)
and S3. During the demonstration, I killed S2.

I do not have a partition loss policy which means by default it should be
IGNORE.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/