You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Павлухин Иван <vo...@gmail.com> on 2019/02/15 10:56:24 UTC
Re: MVCC and IgniteDataStreamer

Hi,

It is time to continue DataStreamer for MVCC caches discussion. Main
focus is on allowOverwrite=false mode. Currently there is a problem
related to partition update counters.

BACKGROUND
As you might know MVCC transactions update partition counters on
transaction finish phase and on backups counters are applied as
intervals (low, high). Basically, transaction on primary partition
counts number of updates it done and increments a partition counter
locally by that number during finish stage. So, it happens that
primary transaction updates counter from _low_ to _high_. And that
(low, high) interval is sent to backups. If a counter on particular
backup is equal to _low_ value than counter is incremented to _high_.
Otherwise if current value is lesser than an interval is put into a
queue. It will be applied when current value becomes equal to _low_.
This technique leads us to a situation when partition counters are
incremented on backups in the same order as on primary. Let's consider
a simple example. Assume that we have partition counter 10 at some
point and 2 transactions finish concurrently. Each have made e.g. 5
updates. Partition counter is updated in some order on primary and
backup receives messages from primary in reversed order.
Primary [10]             | Backup [10]
Tx1 (10 -> 15) [15]    |
Tx2 (15 -> 20) [20]    |
                               | Receives (15, 20) [10]
                               | // (15, 20) enqueued
                               | Receives (10, 15) [20]
                               | // (10, 15) applied, (15, 20)
dequeued and applied
(Partition counter value in square brackets)

But in contrast data streamer updates counters right at a time of
inserting an entry into cache. And it totally breaks the idea of
interval counters application. If we have data steamer and transaction
modifying the same partition counter concurrently we can get following
situation (initially the counter is 10):
1. Tx updated counter (10 -> 15) on primary and send a (10, 15) to backup.
2. Streamer inserted an entry and updated counter (15 -> 16) on primary.
3. Streamer inserted an entry and updated counter (10 -> 11) on backup.
4. Backup receives (10, 15) from tx and does not know what to do with
it as it has counter equal to 11 now.

And we can have other unexpected effects, e.g. in case of 2
transactions and a streamer the order of counter application by
transactions might be reordered.

PROPOSAL
It looks like that streamer should apply counters by intervals to
resolve the inconsistency. To do so we need stream through primary
partition because counters should be "reserved" in a single place. So,
following could be done when we are working with MVCC cache:
1. Send batches from streamer only to primary partition owners.
2. Remember partition counter updates made on primary.
3. Forward batches to backups along with counter intervals.

What do you think?

вт, 14 авг. 2018 г. в 15:21, Dmitriy Setrakyan <ds...@apache.org>:
>
> On Tue, Aug 14, 2018 at 4:30 AM, Vladimir Ozerov <vo...@gridgain.com>
> wrote:
>
> > Bypassing WAL will make the whole cache data vulnerable to complete loss in
> > case of node failure. I would not do this automatically.
> >
>
> Well, in this case I would expect a log message suggesting that there is an
> option to turn off WAL which could significantly improve performance. We
> could just print it out once.



-- 
Best regards,
Ivan Pavlukhin