You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Seliverstov Igor <gv...@gmail.com> on 2018/07/09 08:24:29 UTC

MVCC and IgniteDataStreamer

Hi Igniters,

You know, we're currently developing fair MVCC for transactional caches (
https://issues-test.apache.org/jira/plugins/servlet/mobile#issue/IGNITE-4191
).

At now we're trying to make work IgniteDataStreamer with MVCC.

The problem is that MVCC requires linking between versions for checking
rows with visibility rules (see
https://cwiki.apache.org/confluence/display/IGNITE/Distributed+MVCC+And+Transactional+SQL+Design
for details). We cannot do that without write lock on a data row (there are
no guarantees the linking is consistent otherwise). But locks will affect
data streamer performance dramatically.

So, there are two ways, fast one and right one:

1) we introduce a special version which is lower than any other. All
streamed rows are written with this version. All other versions of row are
cleaned up. All pending transactions that involves these rows are marked as
rollback only. Repeatable read simanthyc is broken for reads (since initial
version is always visible, readers see data streamer dirty writes). User
has to ensure there is no other read/write operations while loading.

2) Data Streamer uses it's own mvcc version. All data streamer operations
become transactional. Data streamer acquires table lock before streaming
(write lock). Readers are not affected and see consistent snapshot while
data is loading.

Initially we're going to implement first approach (fast one) and as soon
table locks are introduced (there is an appropriate IEP) we'll do things
right.

What do you think?

Re: MVCC and IgniteDataStreamer

Posted by Seliverstov Igor <gv...@gmail.com>.
Yakov,

We can introduce several modes:

1) initial loading which replaces data (allowOverwrite=true) with initial version or leaves it as is (allowOverwrite=false) and requires exclusive table lock (fastest one)
2) continuous loading which has its own version and links the data as regular transaction (allowOverwrite=true) or leaves it as is (allowOverwrite=false), doesn’t affect concurrent readers but still requires write lock on a table (less fast than previous)
3) batch loading which acts as a sequence of regular transaction with all possible optimizations, doesn’t affect concurrent readers and writers, but causes possible lock conflicts with subsequent retries, links the data as regular transaction (allowOverwrite=true) or leaves it as is (allowOverwrite=false), doesn’t cause write conflicts (like READ_COMMITTED txs) (slowest one).

All the modes require table locks.

Your thoughts?

> 9 июля 2018 г., в 12:55, Yakov Zhdanov <yz...@apache.org> написал(а):
> 
> Igor,
> 
> I can't say if I agree with any of the suggestions. I would like us to
> start from answering the question - what is data streamer used for?
> 
> First of all, for initial data loading. This should be super fast mode
> probably ignoring all transactional semantics, but providing certain
> guarantees for data passed into streamer to be loaded.
> 
> Second, for continuously streaming updates to some tables (from more than 1
> streamer) and running some analytics over data, probably, with some
> modifications from non-streamer side (user transactions). This way
> streamers should not rollback user txs or do any kind of unexpected
> visibility tricks. I think we can think of proper streamer tx on batch or
> key level.
> 
> Third case I see is a combination of the above - we stream portions of data
> to an existing table let's say once a day (which may be some market data
> after closing or offloaded operations data set) with or without any other
> concurrent non-streamer operations. This mode may involve table locks or do
> the same as 2nd mode which should be up to user to decide.
> 
> So, planned changes to streamer should support at least these 3 scenarios.
> What do you think?
> 
> Igniters, feel free sharing your thoughts on this. Question is pretty
> important for us.
> 
> --Yakov


Re: MVCC and IgniteDataStreamer

Posted by Yakov Zhdanov <yz...@apache.org>.
Igor,

I can't say if I agree with any of the suggestions. I would like us to
start from answering the question - what is data streamer used for?

First of all, for initial data loading. This should be super fast mode
probably ignoring all transactional semantics, but providing certain
guarantees for data passed into streamer to be loaded.

Second, for continuously streaming updates to some tables (from more than 1
streamer) and running some analytics over data, probably, with some
modifications from non-streamer side (user transactions). This way
streamers should not rollback user txs or do any kind of unexpected
visibility tricks. I think we can think of proper streamer tx on batch or
key level.

Third case I see is a combination of the above - we stream portions of data
to an existing table let's say once a day (which may be some market data
after closing or offloaded operations data set) with or without any other
concurrent non-streamer operations. This mode may involve table locks or do
the same as 2nd mode which should be up to user to decide.

So, planned changes to streamer should support at least these 3 scenarios.
What do you think?

Igniters, feel free sharing your thoughts on this. Question is pretty
important for us.

--Yakov