You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Cody Yancey <ya...@uber.com> on 2017/01/26 16:14:16 UTC

Reloading from Persistent Store after Losing a Node

Hello Ignite users!

I have a use case where I am doing SQL queries on a sharded cache, and I
need to ensure that SQL queries always return The Right Answer even if some
nodes in the ring are lost. As I have rigorously confirmed, SQL queries
only apply to data in the cache (as opposed to in the write-through
persistent store but lost from the cache). Also, when you lose a node, you
don't lose persisted data, but data IS now gone from the cache (unless
there is an in-cache backup of the relevant cache partitions).

Now, I *could* do this by just increasing the backup factor for the cache
equal to the number of nodes I can stand to lose, and then setting a
TopologyValidator on the cache to ensure I always have more nodes in the
ring than that number. If the TopologyValidator ever returns a number of
nodes below this survivability threshold, I crash the app and let
everything get reloaded from the persistent store when the nodes
automatically start back up.

This technique has a lot of false positives, where we lose too many nodes,
but slowly enough that Ignite is well-able to shift the data around to
avoid data loss and so we shouldn't have had to crash the app.

Therefore, I would rather be a little smarter about this for the sake of
uptime.

Ideally, in the TopologyValidator logic, while reads and writes to the
cache are blocked, I would be able to:

1.) Detect when a lost partition has no viable backup,
2.) Reload from the persistent store.

The problem I am facing is, I can't find a clean and efficient way of
figuring out #1 from the information the ToplogyValidator gives you.

And even if I could, #2 hangs forever, which makes sense because the cache
isn't readable or writeable until AFTER the topology has been validated.

Has anyone faced a similar challenge and has some wisdom to share? Am I
making this way more complicated than it needs to be?

Thanks in advance,
Cody

Re: Reloading from Persistent Store after Losing a Node

Posted by Cody Yancey <ya...@uber.com>.

Ah thank you. This is exactly what I was looking for!

Thanks,
Cody

On Tue, Jan 31, 2017 at 8:07 AM, Vladimir Ozerov <vo...@gridgain.com>
wrote:

> Hi Cody,
>
> I think you can try using EventType.EVT_CACHE_REBALANCE_PART_DATA_LOST. It
> is fired when data is lost.
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/Reloading-from-Persistent-Store-after-
> Losing-a-Node-tp10259p10339.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>

Re: Reloading from Persistent Store after Losing a Node

Posted by Vladimir Ozerov <vo...@gridgain.com>.

Hi Cody,

I think you can try using EventType.EVT_CACHE_REBALANCE_PART_DATA_LOST. It
is fired when data is lost.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Reloading-from-Persistent-Store-after-Losing-a-Node-tp10259p10339.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.