You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Pavel Pereslegin <xx...@gmail.com> on 2018/02/08 14:12:44 UTC

Re: IgniteSet implementation: changes required

Hello, Igniters!

We have some issues with current IgniteSet implementation ([1], [2], [3], [4]).

As was already described in this conversation, the main problem is
that current IgniteSet implementation maintains plain Java sets on
every node (see CacheDataStructuresManager.setDataMap). These sets
duplicate backing-cache entries, both primary and backup. size() and
iterator() calls issue distributed queries to collect/filter data from
all setDataMap's.

I believe we can solve specified issues if each instance of IgniteSet
will have separate internal cache that will be destroyed on close.

What do you think about such major change? Do you have any thoughts or
objections?

[1] https://issues.apache.org/jira/browse/IGNITE-7565
[2] https://issues.apache.org/jira/browse/IGNITE-5370
[3] https://issues.apache.org/jira/browse/IGNITE-5553
[4] https://issues.apache.org/jira/browse/IGNITE-6474


2017-10-31 5:53 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> Hi Andrey,
>
> Thanks for a detailed email. I think your suggestions do make sense. Ignite
> cannot afford to have a distributed set that is not fail-safe. Can you
> please focus only on solutions that provide consistent behavior in case of
> topology changes and failures and document them in the ticket?
>
> https://issues.apache.org/jira/browse/IGNITE-5553
>
> D.
>
> On Mon, Oct 30, 2017 at 3:07 AM, Andrey Kuznetsov <st...@gmail.com> wrote:
>
>> Hi, Igniters!
>>
>> Current implementation of IgniteSet is fragile with respect to cluster
>> recovery from a checkpoint. We have an issue (IGNITE-5553) that addresses
>> set's size() behavior, but the problem is slightly broader. The text below
>> is my comment from Jira issue. I encourage you to discuss it.
>>
>> We can put current set size into set header cache entry. This will fix
>> size(), but we have broken iterator() implementation as well.
>>
>> Currently, set implementation maintains plain Java sets on every node, see
>> CacheDataStructuresManager.setDataMap. These sets duplicate backing-cache
>> entries, both primary and backup. size() and iterator() calls issue
>> distributed queries to collect/filter data from all setDataMap's. And
>> setDataMaps remain empty after cluster is recovered from checkpoint.
>>
>> Now I see the following options to fix the issue.
>>
>> #1 - Naive. Iterate over all datastructure-backing caches entries during
>> recover from checkpoint procedure, filter set-related entries and refill
>> setDataMap's.
>> Pros: easy to implement
>> Cons: inpredictable time/memory overhead.
>>
>> #2 - More realistic. Avoid node-local copies of cache data. Maintain linked
>> list in datastructure-backing cache: key is set item, value is next set
>> item. List head is stored in set header cache entry (this set item is
>> youngest one). Iterators build on top of this structure are fail-fast.
>> Pros: less memory overhead, no need to maintain node-local mirrors of cache
>> data
>> Cons: iterators are not fail-safe.
>>
>> #3 - Option #2 modified. We can store reference counter and 'removed' flag
>> along with next item reference. This allows to make iterators fail safe.
>> Pros: iterators are fail-safe
>> Cons: slightly more complicated implementation, may affect performance,
>> also I see no way to handle active iterators on remote nodes failures.
>>
>>
>> Best regards,
>>
>> Andrey.
>>

Re: IgniteSet implementation: changes required

Posted by Pavel Pereslegin <xx...@gmail.com>.

Hello Igniters.

As was discussed, IgniteSet implementation was based on on-heap data
duplication (setDataMap), as a result, the data was not recovered after
cluster restart and in the case of large data sets, this led to a
significant heap growing and gc pressure.

We changed the implementation so that this structure works well without
duplicating the data [1]. To reduce performance drop and speed up large
data sets, non-collocated version of IgniteSet now uses separate cache [2].

[1] https://issues.apache.org/jira/browse/IGNITE-5553
[2] https://issues.apache.org/jira/browse/IGNITE-7823



ср, 27 июн. 2018 г. в 23:26, Amir Akhmedov <am...@gmail.com>:

> Yes, you are right.
>
> Thanks,
> Amir
>
>
> On Wed, Jun 27, 2018 at 1:15 PM Denis Magda <dm...@apache.org> wrote:
>
> > Got you. If it's about redundant data duplication in onheap region then
> no
> > any concerns from my side.
> >
> > Anyway, considering that the data structure will be interacting with the
> > page memory directly then its entries can be stored in Ignite persistence
> > automatically (if the latter is on). Does it mean that the data structure
> > will be fully recovered after a restart and its entries can be pulled
> from
> > disk on demand?
> >
> > --
> > Denis
> >
> >
> > On Tue, Jun 26, 2018 at 1:49 PM Amir Akhmedov <am...@gmail.com>
> > wrote:
> >
> > > I also think it will better to remove setDataMap support cause
> > > 1. It's making extra pressure on GC by keeping entries on heap
> > > 2. It has difficult logic to support with lots of nuances
> > > 3. To maintain setDataMap today GridCacheMapEntry calls
> > > cctx.dataStructures().onEntryUpdated() on each entry mutation. I think
> > it's
> > > unnecessary cohesion.
> > > 4. For the case with single Ignite cache for all collocated
> > datastructure,
> > > an iterator creation will not be much slower than current
> implementation
> > > since we can run affinity call on the node where all entries reside.
> > Also,
> > > we can create a better affinity mapper to fairly distribute
> > datastructures
> > > across a cluster rather than mapping by datastructure's name.
> > >
> > > Thanks,
> > > Amir
> > >
> > >
> > > On Tue, Jun 26, 2018 at 8:10 AM Anton Vinogradov <av...@apache.org>
> wrote:
> > >
> > > > Denis,
> > > >
> > > > I think that better case is to remove onheap
> optimisation/duplication.
> > > > This brings no drop to frequently used operations (put/remove), but
> > even
> > > > will make it slightly faster.
> > > >
> > > > The only one question we have here is "is it possible to restore
> onheap
> > > map
> > > > in easy way?".
> > > > Seems that answer is no, so, I vote for setDataMap removal.
> > > >
> > > > вт, 26 июн. 2018 г. в 15:00, Denis Magda <dm...@apache.org>:
> > > >
> > > > > Anton,
> > > > >
> > > > > Will it be possible to reuse such a functionality for the rest of
> > data
> > > > > structures? I would invest our time in this if all data structures
> > > would
> > > > be
> > > > > able to work with Ignite persistence this way.
> > > > >
> > > > > --
> > > > > Denis
> > > > >
> > > > > On Tue, Jun 26, 2018 at 1:53 AM Anton Vinogradov <av...@apache.org>
> > > wrote:
> > > > >
> > > > > > >> Why don't we read data straight from the persistence layer
> > warming
> > > > RAM
> > > > > > up
> > > > > > >> in the background?
> > > > > > Because it's not a trivial task to finish such loading on
> unstable
> > > > > > topology.
> > > > > > That's possible, ofcourse, but solution and complexity will be
> > almost
> > > > > > equals to WAL enable/disable.
> > > > > >
> > > > > > пн, 25 июн. 2018 г. в 22:13, Denis Magda <dm...@apache.org>:
> > > > > >
> > > > > > > Folks,
> > > > > > >
> > > > > > > Why don't we read data straight from the persistence layer
> > warming
> > > > RAM
> > > > > up
> > > > > > > in the background? (like we do for SQL and other APIs). If
> it's a
> > > > > > question
> > > > > > > of time, then I would suggest us not to hurry up and do it in a
> > > right
> > > > > > way.
> > > > > > >
> > > > > > > --
> > > > > > > Denis
> > > > > > >
> > > > > > > On Mon, Jun 25, 2018 at 6:20 AM Anton Vinogradov <
> av@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > > +1 to removal in case there is no easy, fast and consistent
> way
> > > to
> > > > > > > restore
> > > > > > > > setDataMap on node restart.
> > > > > > > > I see that we'll gain some performance drop on size() or
> > keys(),
> > > > but
> > > > > > > these
> > > > > > > > methods are rarely used.
> > > > > > > >
> > > > > > > > пн, 25 июн. 2018 г. в 16:07, Pavel Pereslegin <
> > xxtern@gmail.com
> > > >:
> > > > > > > >
> > > > > > > > > Hello, Igniters.
> > > > > > > > >
> > > > > > > > > I tried to implement IgniteSet data recovery when
> persistence
> > > > > enabled
> > > > > > > > > [1] using trivial cache scanning, however I cannot find
> > optimal
> > > > way
> > > > > > to
> > > > > > > > > do that because of the following reasons:
> > > > > > > > > - Performing operations on IgniteSet requires completion of
> > > data
> > > > > > > > > loading (restoring of setDataMap) on all nodes. Do this
> > during
> > > > > > > > > partition map exchange is too long.
> > > > > > > > > - The prohibition of operations on IgniteSet before the
> > > > completion
> > > > > of
> > > > > > > > > asynchronous cache scanning on all nodes looks rather
> > > > complicated,
> > > > > > > > > because It is necessary to support all situations of
> unstable
> > > > > > > > > topology.
> > > > > > > > >
> > > > > > > > > So I see one option to fix data loss on node restart -
> remove
> > > the
> > > > > > > > > entire optimization (setDataMap) and rework the iterator
> > > > > > > > > implementation to perform cache scanning.
> > > > > > > > >
> > > > > > > > > Thoughts?
> > > > > > > > >
> > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-5553
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2018-03-17 8:20 GMT+03:00 Andrey Kuznetsov <
> > stkuzma@gmail.com
> > > >:
> > > > > > > > > > Thanks, Dmitry. I agree ultimately, DS API uniformity is
> a
> > > > > weighty
> > > > > > > > > reason.
> > > > > > > > > >
> > > > > > > > > > 2018-03-17 3:54 GMT+03:00 Dmitriy Setrakyan <
> > > > > dsetrakyan@apache.org
> > > > > > >:
> > > > > > > > > >
> > > > > > > > > >> On Fri, Mar 16, 2018 at 7:39 AM, Andrey Kuznetsov <
> > > > > > > stkuzma@gmail.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Dmitry, your way allows to reuse existing
> > {{Ignite.set()}}
> > > > API
> > > > > > to
> > > > > > > > > create
> > > > > > > > > >> > both set flavors. We can adopt it unless somebody in
> the
> > > > > > community
> > > > > > > > > >> objects.
> > > > > > > > > >> > Personally, I like {{IgniteCache.asSet()}} approach
> > > proposed
> > > > > by
> > > > > > > > > Vladimir
> > > > > > > > > >> O.
> > > > > > > > > >> > more, since it emphasizes the difference between sets
> > > being
> > > > > > > created,
> > > > > > > > > but
> > > > > > > > > >> > this will require API extension.
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >> Andrey, I am suggesting that Ignite.set(...) in
> > > non-collocated
> > > > > > mode
> > > > > > > > > behaves
> > > > > > > > > >> exactly the same as the proposed IgniteCache.asSet()
> > > method. I
> > > > > do
> > > > > > > not
> > > > > > > > > like
> > > > > > > > > >> the IgniteCache.asSet() API because it is inconsistent
> > with
> > > > > Ignite
> > > > > > > > data
> > > > > > > > > >> structure design. All data structures are provided on
> > Ignite
> > > > API
> > > > > > > > > directly
> > > > > > > > > >> and we should not change that.
> > > > > > > > > >>
> > > > > > > > > >> D.
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best regards,
> > > > > > > > > >   Andrey Kuznetsov.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: IgniteSet implementation: changes required

Posted by Amir Akhmedov <am...@gmail.com>.

Yes, you are right.

Thanks,
Amir


On Wed, Jun 27, 2018 at 1:15 PM Denis Magda <dm...@apache.org> wrote:

> Got you. If it's about redundant data duplication in onheap region then no
> any concerns from my side.
>
> Anyway, considering that the data structure will be interacting with the
> page memory directly then its entries can be stored in Ignite persistence
> automatically (if the latter is on). Does it mean that the data structure
> will be fully recovered after a restart and its entries can be pulled from
> disk on demand?
>
> --
> Denis
>
>
> On Tue, Jun 26, 2018 at 1:49 PM Amir Akhmedov <am...@gmail.com>
> wrote:
>
> > I also think it will better to remove setDataMap support cause
> > 1. It's making extra pressure on GC by keeping entries on heap
> > 2. It has difficult logic to support with lots of nuances
> > 3. To maintain setDataMap today GridCacheMapEntry calls
> > cctx.dataStructures().onEntryUpdated() on each entry mutation. I think
> it's
> > unnecessary cohesion.
> > 4. For the case with single Ignite cache for all collocated
> datastructure,
> > an iterator creation will not be much slower than current implementation
> > since we can run affinity call on the node where all entries reside.
> Also,
> > we can create a better affinity mapper to fairly distribute
> datastructures
> > across a cluster rather than mapping by datastructure's name.
> >
> > Thanks,
> > Amir
> >
> >
> > On Tue, Jun 26, 2018 at 8:10 AM Anton Vinogradov <av...@apache.org> wrote:
> >
> > > Denis,
> > >
> > > I think that better case is to remove onheap optimisation/duplication.
> > > This brings no drop to frequently used operations (put/remove), but
> even
> > > will make it slightly faster.
> > >
> > > The only one question we have here is "is it possible to restore onheap
> > map
> > > in easy way?".
> > > Seems that answer is no, so, I vote for setDataMap removal.
> > >
> > > вт, 26 июн. 2018 г. в 15:00, Denis Magda <dm...@apache.org>:
> > >
> > > > Anton,
> > > >
> > > > Will it be possible to reuse such a functionality for the rest of
> data
> > > > structures? I would invest our time in this if all data structures
> > would
> > > be
> > > > able to work with Ignite persistence this way.
> > > >
> > > > --
> > > > Denis
> > > >
> > > > On Tue, Jun 26, 2018 at 1:53 AM Anton Vinogradov <av...@apache.org>
> > wrote:
> > > >
> > > > > >> Why don't we read data straight from the persistence layer
> warming
> > > RAM
> > > > > up
> > > > > >> in the background?
> > > > > Because it's not a trivial task to finish such loading on unstable
> > > > > topology.
> > > > > That's possible, ofcourse, but solution and complexity will be
> almost
> > > > > equals to WAL enable/disable.
> > > > >
> > > > > пн, 25 июн. 2018 г. в 22:13, Denis Magda <dm...@apache.org>:
> > > > >
> > > > > > Folks,
> > > > > >
> > > > > > Why don't we read data straight from the persistence layer
> warming
> > > RAM
> > > > up
> > > > > > in the background? (like we do for SQL and other APIs). If it's a
> > > > > question
> > > > > > of time, then I would suggest us not to hurry up and do it in a
> > right
> > > > > way.
> > > > > >
> > > > > > --
> > > > > > Denis
> > > > > >
> > > > > > On Mon, Jun 25, 2018 at 6:20 AM Anton Vinogradov <av...@apache.org>
> > > > wrote:
> > > > > >
> > > > > > > +1 to removal in case there is no easy, fast and consistent way
> > to
> > > > > > restore
> > > > > > > setDataMap on node restart.
> > > > > > > I see that we'll gain some performance drop on size() or
> keys(),
> > > but
> > > > > > these
> > > > > > > methods are rarely used.
> > > > > > >
> > > > > > > пн, 25 июн. 2018 г. в 16:07, Pavel Pereslegin <
> xxtern@gmail.com
> > >:
> > > > > > >
> > > > > > > > Hello, Igniters.
> > > > > > > >
> > > > > > > > I tried to implement IgniteSet data recovery when persistence
> > > > enabled
> > > > > > > > [1] using trivial cache scanning, however I cannot find
> optimal
> > > way
> > > > > to
> > > > > > > > do that because of the following reasons:
> > > > > > > > - Performing operations on IgniteSet requires completion of
> > data
> > > > > > > > loading (restoring of setDataMap) on all nodes. Do this
> during
> > > > > > > > partition map exchange is too long.
> > > > > > > > - The prohibition of operations on IgniteSet before the
> > > completion
> > > > of
> > > > > > > > asynchronous cache scanning on all nodes looks rather
> > > complicated,
> > > > > > > > because It is necessary to support all situations of unstable
> > > > > > > > topology.
> > > > > > > >
> > > > > > > > So I see one option to fix data loss on node restart - remove
> > the
> > > > > > > > entire optimization (setDataMap) and rework the iterator
> > > > > > > > implementation to perform cache scanning.
> > > > > > > >
> > > > > > > > Thoughts?
> > > > > > > >
> > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-5553
> > > > > > > >
> > > > > > > >
> > > > > > > > 2018-03-17 8:20 GMT+03:00 Andrey Kuznetsov <
> stkuzma@gmail.com
> > >:
> > > > > > > > > Thanks, Dmitry. I agree ultimately, DS API uniformity is a
> > > > weighty
> > > > > > > > reason.
> > > > > > > > >
> > > > > > > > > 2018-03-17 3:54 GMT+03:00 Dmitriy Setrakyan <
> > > > dsetrakyan@apache.org
> > > > > >:
> > > > > > > > >
> > > > > > > > >> On Fri, Mar 16, 2018 at 7:39 AM, Andrey Kuznetsov <
> > > > > > stkuzma@gmail.com>
> > > > > > > > >> wrote:
> > > > > > > > >>
> > > > > > > > >> > Dmitry, your way allows to reuse existing
> {{Ignite.set()}}
> > > API
> > > > > to
> > > > > > > > create
> > > > > > > > >> > both set flavors. We can adopt it unless somebody in the
> > > > > community
> > > > > > > > >> objects.
> > > > > > > > >> > Personally, I like {{IgniteCache.asSet()}} approach
> > proposed
> > > > by
> > > > > > > > Vladimir
> > > > > > > > >> O.
> > > > > > > > >> > more, since it emphasizes the difference between sets
> > being
> > > > > > created,
> > > > > > > > but
> > > > > > > > >> > this will require API extension.
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >> Andrey, I am suggesting that Ignite.set(...) in
> > non-collocated
> > > > > mode
> > > > > > > > behaves
> > > > > > > > >> exactly the same as the proposed IgniteCache.asSet()
> > method. I
> > > > do
> > > > > > not
> > > > > > > > like
> > > > > > > > >> the IgniteCache.asSet() API because it is inconsistent
> with
> > > > Ignite
> > > > > > > data
> > > > > > > > >> structure design. All data structures are provided on
> Ignite
> > > API
> > > > > > > > directly
> > > > > > > > >> and we should not change that.
> > > > > > > > >>
> > > > > > > > >> D.
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best regards,
> > > > > > > > >   Andrey Kuznetsov.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: IgniteSet implementation: changes required

Posted by Denis Magda <dm...@apache.org>.

Got you. If it's about redundant data duplication in onheap region then no
any concerns from my side.

Anyway, considering that the data structure will be interacting with the
page memory directly then its entries can be stored in Ignite persistence
automatically (if the latter is on). Does it mean that the data structure
will be fully recovered after a restart and its entries can be pulled from
disk on demand?

--
Denis


On Tue, Jun 26, 2018 at 1:49 PM Amir Akhmedov <am...@gmail.com>
wrote:

> I also think it will better to remove setDataMap support cause
> 1. It's making extra pressure on GC by keeping entries on heap
> 2. It has difficult logic to support with lots of nuances
> 3. To maintain setDataMap today GridCacheMapEntry calls
> cctx.dataStructures().onEntryUpdated() on each entry mutation. I think it's
> unnecessary cohesion.
> 4. For the case with single Ignite cache for all collocated datastructure,
> an iterator creation will not be much slower than current implementation
> since we can run affinity call on the node where all entries reside. Also,
> we can create a better affinity mapper to fairly distribute datastructures
> across a cluster rather than mapping by datastructure's name.
>
> Thanks,
> Amir
>
>
> On Tue, Jun 26, 2018 at 8:10 AM Anton Vinogradov <av...@apache.org> wrote:
>
> > Denis,
> >
> > I think that better case is to remove onheap optimisation/duplication.
> > This brings no drop to frequently used operations (put/remove), but even
> > will make it slightly faster.
> >
> > The only one question we have here is "is it possible to restore onheap
> map
> > in easy way?".
> > Seems that answer is no, so, I vote for setDataMap removal.
> >
> > вт, 26 июн. 2018 г. в 15:00, Denis Magda <dm...@apache.org>:
> >
> > > Anton,
> > >
> > > Will it be possible to reuse such a functionality for the rest of data
> > > structures? I would invest our time in this if all data structures
> would
> > be
> > > able to work with Ignite persistence this way.
> > >
> > > --
> > > Denis
> > >
> > > On Tue, Jun 26, 2018 at 1:53 AM Anton Vinogradov <av...@apache.org>
> wrote:
> > >
> > > > >> Why don't we read data straight from the persistence layer warming
> > RAM
> > > > up
> > > > >> in the background?
> > > > Because it's not a trivial task to finish such loading on unstable
> > > > topology.
> > > > That's possible, ofcourse, but solution and complexity will be almost
> > > > equals to WAL enable/disable.
> > > >
> > > > пн, 25 июн. 2018 г. в 22:13, Denis Magda <dm...@apache.org>:
> > > >
> > > > > Folks,
> > > > >
> > > > > Why don't we read data straight from the persistence layer warming
> > RAM
> > > up
> > > > > in the background? (like we do for SQL and other APIs). If it's a
> > > > question
> > > > > of time, then I would suggest us not to hurry up and do it in a
> right
> > > > way.
> > > > >
> > > > > --
> > > > > Denis
> > > > >
> > > > > On Mon, Jun 25, 2018 at 6:20 AM Anton Vinogradov <av...@apache.org>
> > > wrote:
> > > > >
> > > > > > +1 to removal in case there is no easy, fast and consistent way
> to
> > > > > restore
> > > > > > setDataMap on node restart.
> > > > > > I see that we'll gain some performance drop on size() or keys(),
> > but
> > > > > these
> > > > > > methods are rarely used.
> > > > > >
> > > > > > пн, 25 июн. 2018 г. в 16:07, Pavel Pereslegin <xxtern@gmail.com
> >:
> > > > > >
> > > > > > > Hello, Igniters.
> > > > > > >
> > > > > > > I tried to implement IgniteSet data recovery when persistence
> > > enabled
> > > > > > > [1] using trivial cache scanning, however I cannot find optimal
> > way
> > > > to
> > > > > > > do that because of the following reasons:
> > > > > > > - Performing operations on IgniteSet requires completion of
> data
> > > > > > > loading (restoring of setDataMap) on all nodes. Do this during
> > > > > > > partition map exchange is too long.
> > > > > > > - The prohibition of operations on IgniteSet before the
> > completion
> > > of
> > > > > > > asynchronous cache scanning on all nodes looks rather
> > complicated,
> > > > > > > because It is necessary to support all situations of unstable
> > > > > > > topology.
> > > > > > >
> > > > > > > So I see one option to fix data loss on node restart - remove
> the
> > > > > > > entire optimization (setDataMap) and rework the iterator
> > > > > > > implementation to perform cache scanning.
> > > > > > >
> > > > > > > Thoughts?
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-5553
> > > > > > >
> > > > > > >
> > > > > > > 2018-03-17 8:20 GMT+03:00 Andrey Kuznetsov <stkuzma@gmail.com
> >:
> > > > > > > > Thanks, Dmitry. I agree ultimately, DS API uniformity is a
> > > weighty
> > > > > > > reason.
> > > > > > > >
> > > > > > > > 2018-03-17 3:54 GMT+03:00 Dmitriy Setrakyan <
> > > dsetrakyan@apache.org
> > > > >:
> > > > > > > >
> > > > > > > >> On Fri, Mar 16, 2018 at 7:39 AM, Andrey Kuznetsov <
> > > > > stkuzma@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Dmitry, your way allows to reuse existing {{Ignite.set()}}
> > API
> > > > to
> > > > > > > create
> > > > > > > >> > both set flavors. We can adopt it unless somebody in the
> > > > community
> > > > > > > >> objects.
> > > > > > > >> > Personally, I like {{IgniteCache.asSet()}} approach
> proposed
> > > by
> > > > > > > Vladimir
> > > > > > > >> O.
> > > > > > > >> > more, since it emphasizes the difference between sets
> being
> > > > > created,
> > > > > > > but
> > > > > > > >> > this will require API extension.
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >> Andrey, I am suggesting that Ignite.set(...) in
> non-collocated
> > > > mode
> > > > > > > behaves
> > > > > > > >> exactly the same as the proposed IgniteCache.asSet()
> method. I
> > > do
> > > > > not
> > > > > > > like
> > > > > > > >> the IgniteCache.asSet() API because it is inconsistent with
> > > Ignite
> > > > > > data
> > > > > > > >> structure design. All data structures are provided on Ignite
> > API
> > > > > > > directly
> > > > > > > >> and we should not change that.
> > > > > > > >>
> > > > > > > >> D.
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best regards,
> > > > > > > >   Andrey Kuznetsov.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: IgniteSet implementation: changes required

Posted by Amir Akhmedov <am...@gmail.com>.

I also think it will better to remove setDataMap support cause
1. It's making extra pressure on GC by keeping entries on heap
2. It has difficult logic to support with lots of nuances
3. To maintain setDataMap today GridCacheMapEntry calls
cctx.dataStructures().onEntryUpdated() on each entry mutation. I think it's
unnecessary cohesion.
4. For the case with single Ignite cache for all collocated datastructure,
an iterator creation will not be much slower than current implementation
since we can run affinity call on the node where all entries reside. Also,
we can create a better affinity mapper to fairly distribute datastructures
across a cluster rather than mapping by datastructure's name.

Thanks,
Amir


On Tue, Jun 26, 2018 at 8:10 AM Anton Vinogradov <av...@apache.org> wrote:

> Denis,
>
> I think that better case is to remove onheap optimisation/duplication.
> This brings no drop to frequently used operations (put/remove), but even
> will make it slightly faster.
>
> The only one question we have here is "is it possible to restore onheap map
> in easy way?".
> Seems that answer is no, so, I vote for setDataMap removal.
>
> вт, 26 июн. 2018 г. в 15:00, Denis Magda <dm...@apache.org>:
>
> > Anton,
> >
> > Will it be possible to reuse such a functionality for the rest of data
> > structures? I would invest our time in this if all data structures would
> be
> > able to work with Ignite persistence this way.
> >
> > --
> > Denis
> >
> > On Tue, Jun 26, 2018 at 1:53 AM Anton Vinogradov <av...@apache.org> wrote:
> >
> > > >> Why don't we read data straight from the persistence layer warming
> RAM
> > > up
> > > >> in the background?
> > > Because it's not a trivial task to finish such loading on unstable
> > > topology.
> > > That's possible, ofcourse, but solution and complexity will be almost
> > > equals to WAL enable/disable.
> > >
> > > пн, 25 июн. 2018 г. в 22:13, Denis Magda <dm...@apache.org>:
> > >
> > > > Folks,
> > > >
> > > > Why don't we read data straight from the persistence layer warming
> RAM
> > up
> > > > in the background? (like we do for SQL and other APIs). If it's a
> > > question
> > > > of time, then I would suggest us not to hurry up and do it in a right
> > > way.
> > > >
> > > > --
> > > > Denis
> > > >
> > > > On Mon, Jun 25, 2018 at 6:20 AM Anton Vinogradov <av...@apache.org>
> > wrote:
> > > >
> > > > > +1 to removal in case there is no easy, fast and consistent way to
> > > > restore
> > > > > setDataMap on node restart.
> > > > > I see that we'll gain some performance drop on size() or keys(),
> but
> > > > these
> > > > > methods are rarely used.
> > > > >
> > > > > пн, 25 июн. 2018 г. в 16:07, Pavel Pereslegin <xx...@gmail.com>:
> > > > >
> > > > > > Hello, Igniters.
> > > > > >
> > > > > > I tried to implement IgniteSet data recovery when persistence
> > enabled
> > > > > > [1] using trivial cache scanning, however I cannot find optimal
> way
> > > to
> > > > > > do that because of the following reasons:
> > > > > > - Performing operations on IgniteSet requires completion of data
> > > > > > loading (restoring of setDataMap) on all nodes. Do this during
> > > > > > partition map exchange is too long.
> > > > > > - The prohibition of operations on IgniteSet before the
> completion
> > of
> > > > > > asynchronous cache scanning on all nodes looks rather
> complicated,
> > > > > > because It is necessary to support all situations of unstable
> > > > > > topology.
> > > > > >
> > > > > > So I see one option to fix data loss on node restart - remove the
> > > > > > entire optimization (setDataMap) and rework the iterator
> > > > > > implementation to perform cache scanning.
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-5553
> > > > > >
> > > > > >
> > > > > > 2018-03-17 8:20 GMT+03:00 Andrey Kuznetsov <st...@gmail.com>:
> > > > > > > Thanks, Dmitry. I agree ultimately, DS API uniformity is a
> > weighty
> > > > > > reason.
> > > > > > >
> > > > > > > 2018-03-17 3:54 GMT+03:00 Dmitriy Setrakyan <
> > dsetrakyan@apache.org
> > > >:
> > > > > > >
> > > > > > >> On Fri, Mar 16, 2018 at 7:39 AM, Andrey Kuznetsov <
> > > > stkuzma@gmail.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Dmitry, your way allows to reuse existing {{Ignite.set()}}
> API
> > > to
> > > > > > create
> > > > > > >> > both set flavors. We can adopt it unless somebody in the
> > > community
> > > > > > >> objects.
> > > > > > >> > Personally, I like {{IgniteCache.asSet()}} approach proposed
> > by
> > > > > > Vladimir
> > > > > > >> O.
> > > > > > >> > more, since it emphasizes the difference between sets being
> > > > created,
> > > > > > but
> > > > > > >> > this will require API extension.
> > > > > > >> >
> > > > > > >>
> > > > > > >> Andrey, I am suggesting that Ignite.set(...) in non-collocated
> > > mode
> > > > > > behaves
> > > > > > >> exactly the same as the proposed IgniteCache.asSet() method. I
> > do
> > > > not
> > > > > > like
> > > > > > >> the IgniteCache.asSet() API because it is inconsistent with
> > Ignite
> > > > > data
> > > > > > >> structure design. All data structures are provided on Ignite
> API
> > > > > > directly
> > > > > > >> and we should not change that.
> > > > > > >>
> > > > > > >> D.
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > >   Andrey Kuznetsov.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: IgniteSet implementation: changes required

Posted by Anton Vinogradov <av...@apache.org>.

Denis,

I think that better case is to remove onheap optimisation/duplication.
This brings no drop to frequently used operations (put/remove), but even
will make it slightly faster.

The only one question we have here is "is it possible to restore onheap map
in easy way?".
Seems that answer is no, so, I vote for setDataMap removal.

вт, 26 июн. 2018 г. в 15:00, Denis Magda <dm...@apache.org>:

> Anton,
>
> Will it be possible to reuse such a functionality for the rest of data
> structures? I would invest our time in this if all data structures would be
> able to work with Ignite persistence this way.
>
> --
> Denis
>
> On Tue, Jun 26, 2018 at 1:53 AM Anton Vinogradov <av...@apache.org> wrote:
>
> > >> Why don't we read data straight from the persistence layer warming RAM
> > up
> > >> in the background?
> > Because it's not a trivial task to finish such loading on unstable
> > topology.
> > That's possible, ofcourse, but solution and complexity will be almost
> > equals to WAL enable/disable.
> >
> > пн, 25 июн. 2018 г. в 22:13, Denis Magda <dm...@apache.org>:
> >
> > > Folks,
> > >
> > > Why don't we read data straight from the persistence layer warming RAM
> up
> > > in the background? (like we do for SQL and other APIs). If it's a
> > question
> > > of time, then I would suggest us not to hurry up and do it in a right
> > way.
> > >
> > > --
> > > Denis
> > >
> > > On Mon, Jun 25, 2018 at 6:20 AM Anton Vinogradov <av...@apache.org>
> wrote:
> > >
> > > > +1 to removal in case there is no easy, fast and consistent way to
> > > restore
> > > > setDataMap on node restart.
> > > > I see that we'll gain some performance drop on size() or keys(), but
> > > these
> > > > methods are rarely used.
> > > >
> > > > пн, 25 июн. 2018 г. в 16:07, Pavel Pereslegin <xx...@gmail.com>:
> > > >
> > > > > Hello, Igniters.
> > > > >
> > > > > I tried to implement IgniteSet data recovery when persistence
> enabled
> > > > > [1] using trivial cache scanning, however I cannot find optimal way
> > to
> > > > > do that because of the following reasons:
> > > > > - Performing operations on IgniteSet requires completion of data
> > > > > loading (restoring of setDataMap) on all nodes. Do this during
> > > > > partition map exchange is too long.
> > > > > - The prohibition of operations on IgniteSet before the completion
> of
> > > > > asynchronous cache scanning on all nodes looks rather complicated,
> > > > > because It is necessary to support all situations of unstable
> > > > > topology.
> > > > >
> > > > > So I see one option to fix data loss on node restart - remove the
> > > > > entire optimization (setDataMap) and rework the iterator
> > > > > implementation to perform cache scanning.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-5553
> > > > >
> > > > >
> > > > > 2018-03-17 8:20 GMT+03:00 Andrey Kuznetsov <st...@gmail.com>:
> > > > > > Thanks, Dmitry. I agree ultimately, DS API uniformity is a
> weighty
> > > > > reason.
> > > > > >
> > > > > > 2018-03-17 3:54 GMT+03:00 Dmitriy Setrakyan <
> dsetrakyan@apache.org
> > >:
> > > > > >
> > > > > >> On Fri, Mar 16, 2018 at 7:39 AM, Andrey Kuznetsov <
> > > stkuzma@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Dmitry, your way allows to reuse existing {{Ignite.set()}} API
> > to
> > > > > create
> > > > > >> > both set flavors. We can adopt it unless somebody in the
> > community
> > > > > >> objects.
> > > > > >> > Personally, I like {{IgniteCache.asSet()}} approach proposed
> by
> > > > > Vladimir
> > > > > >> O.
> > > > > >> > more, since it emphasizes the difference between sets being
> > > created,
> > > > > but
> > > > > >> > this will require API extension.
> > > > > >> >
> > > > > >>
> > > > > >> Andrey, I am suggesting that Ignite.set(...) in non-collocated
> > mode
> > > > > behaves
> > > > > >> exactly the same as the proposed IgniteCache.asSet() method. I
> do
> > > not
> > > > > like
> > > > > >> the IgniteCache.asSet() API because it is inconsistent with
> Ignite
> > > > data
> > > > > >> structure design. All data structures are provided on Ignite API
> > > > > directly
> > > > > >> and we should not change that.
> > > > > >>
> > > > > >> D.
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > >   Andrey Kuznetsov.
> > > > >
> > > >
> > >
> >
>

Re: IgniteSet implementation: changes required

Posted by Denis Magda <dm...@apache.org>.

Anton,

Will it be possible to reuse such a functionality for the rest of data
structures? I would invest our time in this if all data structures would be
able to work with Ignite persistence this way.

--
Denis

On Tue, Jun 26, 2018 at 1:53 AM Anton Vinogradov <av...@apache.org> wrote:

> >> Why don't we read data straight from the persistence layer warming RAM
> up
> >> in the background?
> Because it's not a trivial task to finish such loading on unstable
> topology.
> That's possible, ofcourse, but solution and complexity will be almost
> equals to WAL enable/disable.
>
> пн, 25 июн. 2018 г. в 22:13, Denis Magda <dm...@apache.org>:
>
> > Folks,
> >
> > Why don't we read data straight from the persistence layer warming RAM up
> > in the background? (like we do for SQL and other APIs). If it's a
> question
> > of time, then I would suggest us not to hurry up and do it in a right
> way.
> >
> > --
> > Denis
> >
> > On Mon, Jun 25, 2018 at 6:20 AM Anton Vinogradov <av...@apache.org> wrote:
> >
> > > +1 to removal in case there is no easy, fast and consistent way to
> > restore
> > > setDataMap on node restart.
> > > I see that we'll gain some performance drop on size() or keys(), but
> > these
> > > methods are rarely used.
> > >
> > > пн, 25 июн. 2018 г. в 16:07, Pavel Pereslegin <xx...@gmail.com>:
> > >
> > > > Hello, Igniters.
> > > >
> > > > I tried to implement IgniteSet data recovery when persistence enabled
> > > > [1] using trivial cache scanning, however I cannot find optimal way
> to
> > > > do that because of the following reasons:
> > > > - Performing operations on IgniteSet requires completion of data
> > > > loading (restoring of setDataMap) on all nodes. Do this during
> > > > partition map exchange is too long.
> > > > - The prohibition of operations on IgniteSet before the completion of
> > > > asynchronous cache scanning on all nodes looks rather complicated,
> > > > because It is necessary to support all situations of unstable
> > > > topology.
> > > >
> > > > So I see one option to fix data loss on node restart - remove the
> > > > entire optimization (setDataMap) and rework the iterator
> > > > implementation to perform cache scanning.
> > > >
> > > > Thoughts?
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-5553
> > > >
> > > >
> > > > 2018-03-17 8:20 GMT+03:00 Andrey Kuznetsov <st...@gmail.com>:
> > > > > Thanks, Dmitry. I agree ultimately, DS API uniformity is a weighty
> > > > reason.
> > > > >
> > > > > 2018-03-17 3:54 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org
> >:
> > > > >
> > > > >> On Fri, Mar 16, 2018 at 7:39 AM, Andrey Kuznetsov <
> > stkuzma@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Dmitry, your way allows to reuse existing {{Ignite.set()}} API
> to
> > > > create
> > > > >> > both set flavors. We can adopt it unless somebody in the
> community
> > > > >> objects.
> > > > >> > Personally, I like {{IgniteCache.asSet()}} approach proposed by
> > > > Vladimir
> > > > >> O.
> > > > >> > more, since it emphasizes the difference between sets being
> > created,
> > > > but
> > > > >> > this will require API extension.
> > > > >> >
> > > > >>
> > > > >> Andrey, I am suggesting that Ignite.set(...) in non-collocated
> mode
> > > > behaves
> > > > >> exactly the same as the proposed IgniteCache.asSet() method. I do
> > not
> > > > like
> > > > >> the IgniteCache.asSet() API because it is inconsistent with Ignite
> > > data
> > > > >> structure design. All data structures are provided on Ignite API
> > > > directly
> > > > >> and we should not change that.
> > > > >>
> > > > >> D.
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > >   Andrey Kuznetsov.
> > > >
> > >
> >
>

Re: IgniteSet implementation: changes required

Posted by Anton Vinogradov <av...@apache.org>.

>> Why don't we read data straight from the persistence layer warming RAM up
>> in the background?
Because it's not a trivial task to finish such loading on unstable
topology.
That's possible, ofcourse, but solution and complexity will be almost
equals to WAL enable/disable.

пн, 25 июн. 2018 г. в 22:13, Denis Magda <dm...@apache.org>:

> Folks,
>
> Why don't we read data straight from the persistence layer warming RAM up
> in the background? (like we do for SQL and other APIs). If it's a question
> of time, then I would suggest us not to hurry up and do it in a right way.
>
> --
> Denis
>
> On Mon, Jun 25, 2018 at 6:20 AM Anton Vinogradov <av...@apache.org> wrote:
>
> > +1 to removal in case there is no easy, fast and consistent way to
> restore
> > setDataMap on node restart.
> > I see that we'll gain some performance drop on size() or keys(), but
> these
> > methods are rarely used.
> >
> > пн, 25 июн. 2018 г. в 16:07, Pavel Pereslegin <xx...@gmail.com>:
> >
> > > Hello, Igniters.
> > >
> > > I tried to implement IgniteSet data recovery when persistence enabled
> > > [1] using trivial cache scanning, however I cannot find optimal way to
> > > do that because of the following reasons:
> > > - Performing operations on IgniteSet requires completion of data
> > > loading (restoring of setDataMap) on all nodes. Do this during
> > > partition map exchange is too long.
> > > - The prohibition of operations on IgniteSet before the completion of
> > > asynchronous cache scanning on all nodes looks rather complicated,
> > > because It is necessary to support all situations of unstable
> > > topology.
> > >
> > > So I see one option to fix data loss on node restart - remove the
> > > entire optimization (setDataMap) and rework the iterator
> > > implementation to perform cache scanning.
> > >
> > > Thoughts?
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-5553
> > >
> > >
> > > 2018-03-17 8:20 GMT+03:00 Andrey Kuznetsov <st...@gmail.com>:
> > > > Thanks, Dmitry. I agree ultimately, DS API uniformity is a weighty
> > > reason.
> > > >
> > > > 2018-03-17 3:54 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> > > >
> > > >> On Fri, Mar 16, 2018 at 7:39 AM, Andrey Kuznetsov <
> stkuzma@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Dmitry, your way allows to reuse existing {{Ignite.set()}} API to
> > > create
> > > >> > both set flavors. We can adopt it unless somebody in the community
> > > >> objects.
> > > >> > Personally, I like {{IgniteCache.asSet()}} approach proposed by
> > > Vladimir
> > > >> O.
> > > >> > more, since it emphasizes the difference between sets being
> created,
> > > but
> > > >> > this will require API extension.
> > > >> >
> > > >>
> > > >> Andrey, I am suggesting that Ignite.set(...) in non-collocated mode
> > > behaves
> > > >> exactly the same as the proposed IgniteCache.asSet() method. I do
> not
> > > like
> > > >> the IgniteCache.asSet() API because it is inconsistent with Ignite
> > data
> > > >> structure design. All data structures are provided on Ignite API
> > > directly
> > > >> and we should not change that.
> > > >>
> > > >> D.
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >   Andrey Kuznetsov.
> > >
> >
>

Re: IgniteSet implementation: changes required

Posted by Denis Magda <dm...@apache.org>.

Folks,

Why don't we read data straight from the persistence layer warming RAM up
in the background? (like we do for SQL and other APIs). If it's a question
of time, then I would suggest us not to hurry up and do it in a right way.

--
Denis

On Mon, Jun 25, 2018 at 6:20 AM Anton Vinogradov <av...@apache.org> wrote:

> +1 to removal in case there is no easy, fast and consistent way to restore
> setDataMap on node restart.
> I see that we'll gain some performance drop on size() or keys(), but these
> methods are rarely used.
>
> пн, 25 июн. 2018 г. в 16:07, Pavel Pereslegin <xx...@gmail.com>:
>
> > Hello, Igniters.
> >
> > I tried to implement IgniteSet data recovery when persistence enabled
> > [1] using trivial cache scanning, however I cannot find optimal way to
> > do that because of the following reasons:
> > - Performing operations on IgniteSet requires completion of data
> > loading (restoring of setDataMap) on all nodes. Do this during
> > partition map exchange is too long.
> > - The prohibition of operations on IgniteSet before the completion of
> > asynchronous cache scanning on all nodes looks rather complicated,
> > because It is necessary to support all situations of unstable
> > topology.
> >
> > So I see one option to fix data loss on node restart - remove the
> > entire optimization (setDataMap) and rework the iterator
> > implementation to perform cache scanning.
> >
> > Thoughts?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-5553
> >
> >
> > 2018-03-17 8:20 GMT+03:00 Andrey Kuznetsov <st...@gmail.com>:
> > > Thanks, Dmitry. I agree ultimately, DS API uniformity is a weighty
> > reason.
> > >
> > > 2018-03-17 3:54 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> > >
> > >> On Fri, Mar 16, 2018 at 7:39 AM, Andrey Kuznetsov <st...@gmail.com>
> > >> wrote:
> > >>
> > >> > Dmitry, your way allows to reuse existing {{Ignite.set()}} API to
> > create
> > >> > both set flavors. We can adopt it unless somebody in the community
> > >> objects.
> > >> > Personally, I like {{IgniteCache.asSet()}} approach proposed by
> > Vladimir
> > >> O.
> > >> > more, since it emphasizes the difference between sets being created,
> > but
> > >> > this will require API extension.
> > >> >
> > >>
> > >> Andrey, I am suggesting that Ignite.set(...) in non-collocated mode
> > behaves
> > >> exactly the same as the proposed IgniteCache.asSet() method. I do not
> > like
> > >> the IgniteCache.asSet() API because it is inconsistent with Ignite
> data
> > >> structure design. All data structures are provided on Ignite API
> > directly
> > >> and we should not change that.
> > >>
> > >> D.
> > >>
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >   Andrey Kuznetsov.
> >
>

Re: IgniteSet implementation: changes required

Posted by Anton Vinogradov <av...@apache.org>.

+1 to removal in case there is no easy, fast and consistent way to restore
setDataMap on node restart.
I see that we'll gain some performance drop on size() or keys(), but these
methods are rarely used.

пн, 25 июн. 2018 г. в 16:07, Pavel Pereslegin <xx...@gmail.com>:

> Hello, Igniters.
>
> I tried to implement IgniteSet data recovery when persistence enabled
> [1] using trivial cache scanning, however I cannot find optimal way to
> do that because of the following reasons:
> - Performing operations on IgniteSet requires completion of data
> loading (restoring of setDataMap) on all nodes. Do this during
> partition map exchange is too long.
> - The prohibition of operations on IgniteSet before the completion of
> asynchronous cache scanning on all nodes looks rather complicated,
> because It is necessary to support all situations of unstable
> topology.
>
> So I see one option to fix data loss on node restart - remove the
> entire optimization (setDataMap) and rework the iterator
> implementation to perform cache scanning.
>
> Thoughts?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-5553
>
>
> 2018-03-17 8:20 GMT+03:00 Andrey Kuznetsov <st...@gmail.com>:
> > Thanks, Dmitry. I agree ultimately, DS API uniformity is a weighty
> reason.
> >
> > 2018-03-17 3:54 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> >
> >> On Fri, Mar 16, 2018 at 7:39 AM, Andrey Kuznetsov <st...@gmail.com>
> >> wrote:
> >>
> >> > Dmitry, your way allows to reuse existing {{Ignite.set()}} API to
> create
> >> > both set flavors. We can adopt it unless somebody in the community
> >> objects.
> >> > Personally, I like {{IgniteCache.asSet()}} approach proposed by
> Vladimir
> >> O.
> >> > more, since it emphasizes the difference between sets being created,
> but
> >> > this will require API extension.
> >> >
> >>
> >> Andrey, I am suggesting that Ignite.set(...) in non-collocated mode
> behaves
> >> exactly the same as the proposed IgniteCache.asSet() method. I do not
> like
> >> the IgniteCache.asSet() API because it is inconsistent with Ignite data
> >> structure design. All data structures are provided on Ignite API
> directly
> >> and we should not change that.
> >>
> >> D.
> >>
> >
> >
> >
> > --
> > Best regards,
> >   Andrey Kuznetsov.
>

Re: IgniteSet implementation: changes required

Posted by Pavel Pereslegin <xx...@gmail.com>.

Hello, Igniters.

I tried to implement IgniteSet data recovery when persistence enabled
[1] using trivial cache scanning, however I cannot find optimal way to
do that because of the following reasons:
- Performing operations on IgniteSet requires completion of data
loading (restoring of setDataMap) on all nodes. Do this during
partition map exchange is too long.
- The prohibition of operations on IgniteSet before the completion of
asynchronous cache scanning on all nodes looks rather complicated,
because It is necessary to support all situations of unstable
topology.

So I see one option to fix data loss on node restart - remove the
entire optimization (setDataMap) and rework the iterator
implementation to perform cache scanning.

Thoughts?

[1] https://issues.apache.org/jira/browse/IGNITE-5553


2018-03-17 8:20 GMT+03:00 Andrey Kuznetsov <st...@gmail.com>:
> Thanks, Dmitry. I agree ultimately, DS API uniformity is a weighty reason.
>
> 2018-03-17 3:54 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>
>> On Fri, Mar 16, 2018 at 7:39 AM, Andrey Kuznetsov <st...@gmail.com>
>> wrote:
>>
>> > Dmitry, your way allows to reuse existing {{Ignite.set()}} API to create
>> > both set flavors. We can adopt it unless somebody in the community
>> objects.
>> > Personally, I like {{IgniteCache.asSet()}} approach proposed by Vladimir
>> O.
>> > more, since it emphasizes the difference between sets being created, but
>> > this will require API extension.
>> >
>>
>> Andrey, I am suggesting that Ignite.set(...) in non-collocated mode behaves
>> exactly the same as the proposed IgniteCache.asSet() method. I do not like
>> the IgniteCache.asSet() API because it is inconsistent with Ignite data
>> structure design. All data structures are provided on Ignite API directly
>> and we should not change that.
>>
>> D.
>>
>
>
>
> --
> Best regards,
>   Andrey Kuznetsov.

Re: IgniteSet implementation: changes required

Posted by Andrey Kuznetsov <st...@gmail.com>.

Thanks, Dmitry. I agree ultimately, DS API uniformity is a weighty reason.

2018-03-17 3:54 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> On Fri, Mar 16, 2018 at 7:39 AM, Andrey Kuznetsov <st...@gmail.com>
> wrote:
>
> > Dmitry, your way allows to reuse existing {{Ignite.set()}} API to create
> > both set flavors. We can adopt it unless somebody in the community
> objects.
> > Personally, I like {{IgniteCache.asSet()}} approach proposed by Vladimir
> O.
> > more, since it emphasizes the difference between sets being created, but
> > this will require API extension.
> >
>
> Andrey, I am suggesting that Ignite.set(...) in non-collocated mode behaves
> exactly the same as the proposed IgniteCache.asSet() method. I do not like
> the IgniteCache.asSet() API because it is inconsistent with Ignite data
> structure design. All data structures are provided on Ignite API directly
> and we should not change that.
>
> D.
>



-- 
Best regards,
  Andrey Kuznetsov.

Re: IgniteSet implementation: changes required

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Fri, Mar 16, 2018 at 7:39 AM, Andrey Kuznetsov <st...@gmail.com> wrote:

> Dmitry, your way allows to reuse existing {{Ignite.set()}} API to create
> both set flavors. We can adopt it unless somebody in the community objects.
> Personally, I like {{IgniteCache.asSet()}} approach proposed by Vladimir O.
> more, since it emphasizes the difference between sets being created, but
> this will require API extension.
>

Andrey, I am suggesting that Ignite.set(...) in non-collocated mode behaves
exactly the same as the proposed IgniteCache.asSet() method. I do not like
the IgniteCache.asSet() API because it is inconsistent with Ignite data
structure design. All data structures are provided on Ignite API directly
and we should not change that.

D.

Re: IgniteSet implementation: changes required

Posted by Andrey Kuznetsov <st...@gmail.com>.

Dmitry, your way allows to reuse existing {{Ignite.set()}} API to create
both set flavors. We can adopt it unless somebody in the community objects.
Personally, I like {{IgniteCache.asSet()}} approach proposed by Vladimir O.
more, since it emphasizes the difference between sets being created, but
this will require API extension.

2018-03-16 8:30 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> On Thu, Mar 15, 2018 at 12:24 AM, Andrey Kuznetsov <st...@gmail.com>
> wrote:
>
> > Dmitriy,
> >
> > It's technically possible to produce both kinds of sets with
> > {{Ignite.set()}} call, but this will require to one more argument
> ('small'
> > vs 'large'). Doesn't it look less inuitive than separate
> > {{IgniteCache.asSet()}} ?
> >
> > And of course, we don't want to leave existing implementation broken.
> Pavel
> > P. has prepared the fix as part of [1].
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-5553
>
>
> Andrey, I am suggesting that we change all non-collocated sets to be based
> on IgniteCache. In this case you do not need any additional parameters.
>
> Makes sense?
>
> D.
>

-- 
Best regards,
  Andrey Kuznetsov.

Re: IgniteSet implementation: changes required

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Thu, Mar 15, 2018 at 12:24 AM, Andrey Kuznetsov <st...@gmail.com>
wrote:

> Dmitriy,
>
> It's technically possible to produce both kinds of sets with
> {{Ignite.set()}} call, but this will require to one more argument ('small'
> vs 'large'). Doesn't it look less inuitive than separate
> {{IgniteCache.asSet()}} ?
>
> And of course, we don't want to leave existing implementation broken. Pavel
> P. has prepared the fix as part of [1].
>
> [1] https://issues.apache.org/jira/browse/IGNITE-5553

Andrey, I am suggesting that we change all non-collocated sets to be based
on IgniteCache. In this case you do not need any additional parameters.

Makes sense?

D.

Re: IgniteSet implementation: changes required

Posted by Andrey Kuznetsov <st...@gmail.com>.

Dmitriy,

It's technically possible to produce both kinds of sets with
{{Ignite.set()}} call, but this will require to one more argument ('small'
vs 'large'). Doesn't it look less inuitive than separate
{{IgniteCache.asSet()}} ?

And of course, we don't want to leave existing implementation broken. Pavel
P. has prepared the fix as part of [1].

[1] https://issues.apache.org/jira/browse/IGNITE-5553

2018-03-15 5:40 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> I am not sure I like the "asSet()" method. We already have Ignite.set(...)
> method and now introducing yet another one. The new design should fix the
> existing implementation. We cannot keep the broken implementation around
> and introduce yet another one. To be consistent, we should also stick to
> the IgniteSet abstraction.
>
> Just to be clear, I am in agreement that a non-collocated set should be
> based on a cache, but I do not see why the existing API cannot be re-used.
>
> D.
>
>
-- 
Best regards,
  Andrey Kuznetsov.

Re: IgniteSet implementation: changes required

Posted by Dmitriy Setrakyan <ds...@apache.org>.

I am not sure I like the "asSet()" method. We already have Ignite.set(...)
method and now introducing yet another one. The new design should fix the
existing implementation. We cannot keep the broken implementation around
and introduce yet another one. To be consistent, we should also stick to
the IgniteSet abstraction.

Just to be clear, I am in agreement that a non-collocated set should be
based on a cache, but I do not see why the existing API cannot be re-used.

D.


On Wed, Mar 14, 2018 at 1:26 PM, Andrey Kuznetsov <st...@gmail.com> wrote:

> Hi, Dmitry.
>
> The primary goal of the ticket is to implement {{IgniteCache::asSet}} view.
> The rationale is introduced earlier in this talk by Vladimir O. I've also
> mentioned the need to document that new method properly: it should be used
> to create large sets. Existing {{IgniteSets}} are good to represent small
> sets.
>
> 2018-03-14 20:05 GMT+03:00 Dmitry Pavlov <dp...@gmail.com>:
>
> > Hi Pavel,
> >
> > I did not quite understand the purpose of the new collection and the
> > purpose of introduced new set.
> >
> > Is ticket descriptoin states, that it is for better documentation?
> >
> > Can we improve the task description
> > https://issues.apache.org/jira/browse/IGNITE-7823 ?
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > ср, 14 мар. 2018 г. в 12:31, Pavel Pereslegin <xx...@gmail.com>:
> >
> > > Hello Igniters.
> > >
> > > I'm working on the implementation of the IgniteCache#asSet method [1]
> > > and I think it should return Set (not IgniteSet). Because IgniteSet
> > > was introduced mainly to add methods for the collocated version of
> > > IgniteSet.
> > >
> > > Any thoughts?
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-7823
> > >
> > >
> > > 2018-02-27 14:59 GMT+03:00 Andrey Kuznetsov <st...@gmail.com>:
> > > > As far as I know, Pavel P. is working on fixing existing sets
> > currently.
> > > >
> > > > As for {{asSet}} cache adapter, I filed the ticket [1].
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-7823
> > > >
> > > > 2018-02-27 11:20 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:
> > > >
> > > >> I think the root issue is that we are trying to mix different cases
> > in a
> > > >> single solution. What is the common usage patterns of sets?
> > > >> 1) Small mostly-read sets - current implementation is ideal for
> them -
> > > >> everything is available locally, on-heap and in deserialized form
> > > >> 2) Big data sets - IgniteCache API is the best candidate here; we
> can
> > > >> simply add a method "Set<K> IgniteCache.asSet()" which will return
> > thin
> > > >> wrapper for Set interface around cache. An you will have recovery
> and
> > > >> heap-resistance with almost no additional efforts.
> > > >> There is no need to choose between one or another solution. We can
> > > support
> > > >> both.
> > > >>
> > > >> As far as current igniteSet implementation - all issues described
> > above
> > > >> will go away if you use approach suggested by Dmitry. I do not see a
> > > need
> > > >> for any major changes.
> > > >>
> > > >> Vladimir.
> > > >>
> > > >
> > > > --
> > > > Best regards,
> > > >   Andrey Kuznetsov.
> > >
> >
>
>
>
> --
> Best regards,
>   Andrey Kuznetsov.
>

Re: IgniteSet implementation: changes required

Posted by Andrey Kuznetsov <st...@gmail.com>.

Hi, Dmitry.

The primary goal of the ticket is to implement {{IgniteCache::asSet}} view.
The rationale is introduced earlier in this talk by Vladimir O. I've also
mentioned the need to document that new method properly: it should be used
to create large sets. Existing {{IgniteSets}} are good to represent small
sets.

2018-03-14 20:05 GMT+03:00 Dmitry Pavlov <dp...@gmail.com>:

> Hi Pavel,
>
> I did not quite understand the purpose of the new collection and the
> purpose of introduced new set.
>
> Is ticket descriptoin states, that it is for better documentation?
>
> Can we improve the task description
> https://issues.apache.org/jira/browse/IGNITE-7823 ?
>
> Sincerely,
> Dmitriy Pavlov
>
> ср, 14 мар. 2018 г. в 12:31, Pavel Pereslegin <xx...@gmail.com>:
>
> > Hello Igniters.
> >
> > I'm working on the implementation of the IgniteCache#asSet method [1]
> > and I think it should return Set (not IgniteSet). Because IgniteSet
> > was introduced mainly to add methods for the collocated version of
> > IgniteSet.
> >
> > Any thoughts?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-7823
> >
> >
> > 2018-02-27 14:59 GMT+03:00 Andrey Kuznetsov <st...@gmail.com>:
> > > As far as I know, Pavel P. is working on fixing existing sets
> currently.
> > >
> > > As for {{asSet}} cache adapter, I filed the ticket [1].
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-7823
> > >
> > > 2018-02-27 11:20 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:
> > >
> > >> I think the root issue is that we are trying to mix different cases
> in a
> > >> single solution. What is the common usage patterns of sets?
> > >> 1) Small mostly-read sets - current implementation is ideal for them -
> > >> everything is available locally, on-heap and in deserialized form
> > >> 2) Big data sets - IgniteCache API is the best candidate here; we can
> > >> simply add a method "Set<K> IgniteCache.asSet()" which will return
> thin
> > >> wrapper for Set interface around cache. An you will have recovery and
> > >> heap-resistance with almost no additional efforts.
> > >> There is no need to choose between one or another solution. We can
> > support
> > >> both.
> > >>
> > >> As far as current igniteSet implementation - all issues described
> above
> > >> will go away if you use approach suggested by Dmitry. I do not see a
> > need
> > >> for any major changes.
> > >>
> > >> Vladimir.
> > >>
> > >
> > > --
> > > Best regards,
> > >   Andrey Kuznetsov.
> >
>



-- 
Best regards,
  Andrey Kuznetsov.

Re: IgniteSet implementation: changes required

Posted by Dmitry Pavlov <dp...@gmail.com>.

Hi Pavel,

I did not quite understand the purpose of the new collection and the
purpose of introduced new set.

Is ticket descriptoin states, that it is for better documentation?

Can we improve the task description
https://issues.apache.org/jira/browse/IGNITE-7823 ?

Sincerely,
Dmitriy Pavlov

ср, 14 мар. 2018 г. в 12:31, Pavel Pereslegin <xx...@gmail.com>:

> Hello Igniters.
>
> I'm working on the implementation of the IgniteCache#asSet method [1]
> and I think it should return Set (not IgniteSet). Because IgniteSet
> was introduced mainly to add methods for the collocated version of
> IgniteSet.
>
> Any thoughts?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-7823
>
>
> 2018-02-27 14:59 GMT+03:00 Andrey Kuznetsov <st...@gmail.com>:
> > As far as I know, Pavel P. is working on fixing existing sets currently.
> >
> > As for {{asSet}} cache adapter, I filed the ticket [1].
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-7823
> >
> > 2018-02-27 11:20 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:
> >
> >> I think the root issue is that we are trying to mix different cases in a
> >> single solution. What is the common usage patterns of sets?
> >> 1) Small mostly-read sets - current implementation is ideal for them -
> >> everything is available locally, on-heap and in deserialized form
> >> 2) Big data sets - IgniteCache API is the best candidate here; we can
> >> simply add a method "Set<K> IgniteCache.asSet()" which will return thin
> >> wrapper for Set interface around cache. An you will have recovery and
> >> heap-resistance with almost no additional efforts.
> >> There is no need to choose between one or another solution. We can
> support
> >> both.
> >>
> >> As far as current igniteSet implementation - all issues described above
> >> will go away if you use approach suggested by Dmitry. I do not see a
> need
> >> for any major changes.
> >>
> >> Vladimir.
> >>
> >
> > --
> > Best regards,
> >   Andrey Kuznetsov.
>

Re: IgniteSet implementation: changes required

Posted by Pavel Pereslegin <xx...@gmail.com>.

Hello Igniters.

I'm working on the implementation of the IgniteCache#asSet method [1]
and I think it should return Set (not IgniteSet). Because IgniteSet
was introduced mainly to add methods for the collocated version of
IgniteSet.

Any thoughts?

[1] https://issues.apache.org/jira/browse/IGNITE-7823


2018-02-27 14:59 GMT+03:00 Andrey Kuznetsov <st...@gmail.com>:
> As far as I know, Pavel P. is working on fixing existing sets currently.
>
> As for {{asSet}} cache adapter, I filed the ticket [1].
>
> [1] https://issues.apache.org/jira/browse/IGNITE-7823
>
> 2018-02-27 11:20 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:
>
>> I think the root issue is that we are trying to mix different cases in a
>> single solution. What is the common usage patterns of sets?
>> 1) Small mostly-read sets - current implementation is ideal for them -
>> everything is available locally, on-heap and in deserialized form
>> 2) Big data sets - IgniteCache API is the best candidate here; we can
>> simply add a method "Set<K> IgniteCache.asSet()" which will return thin
>> wrapper for Set interface around cache. An you will have recovery and
>> heap-resistance with almost no additional efforts.
>> There is no need to choose between one or another solution. We can support
>> both.
>>
>> As far as current igniteSet implementation - all issues described above
>> will go away if you use approach suggested by Dmitry. I do not see a need
>> for any major changes.
>>
>> Vladimir.
>>
>
> --
> Best regards,
>   Andrey Kuznetsov.

Re: IgniteSet implementation: changes required

Posted by Andrey Kuznetsov <st...@gmail.com>.

As far as I know, Pavel P. is working on fixing existing sets currently.

As for {{asSet}} cache adapter, I filed the ticket [1].

[1] https://issues.apache.org/jira/browse/IGNITE-7823

2018-02-27 11:20 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:

> I think the root issue is that we are trying to mix different cases in a
> single solution. What is the common usage patterns of sets?
> 1) Small mostly-read sets - current implementation is ideal for them -
> everything is available locally, on-heap and in deserialized form
> 2) Big data sets - IgniteCache API is the best candidate here; we can
> simply add a method "Set<K> IgniteCache.asSet()" which will return thin
> wrapper for Set interface around cache. An you will have recovery and
> heap-resistance with almost no additional efforts.
> There is no need to choose between one or another solution. We can support
> both.
>
> As far as current igniteSet implementation - all issues described above
> will go away if you use approach suggested by Dmitry. I do not see a need
> for any major changes.
>
> Vladimir.
>

-- 
Best regards,
  Andrey Kuznetsov.

Re: IgniteSet implementation: changes required

Posted by Vladimir Ozerov <vo...@gridgain.com>.

I think the root issue is that we are trying to mix different cases in a
single solution. What is the common usage patterns of sets?
1) Small mostly-read sets - current implementation is ideal for them -
everything is available locally, on-heap and in deserialized form
2) Big data sets - IgniteCache API is the best candidate here; we can
simply add a method "Set<K> IgniteCache.asSet()" which will return thin
wrapper for Set interface around cache. An you will have recovery and
heap-resistance with almost no additional efforts.
There is no need to choose between one or another solution. We can support
both.

As far as current igniteSet implementation - all issues described above
will go away if you use approach suggested by Dmitry. I do not see a need
for any major changes.

Vladimir.

On Mon, Feb 19, 2018 at 10:44 AM, Pavel Pereslegin <xx...@gmail.com> wrote:

> Hello Vladimir,
>
> > What we can do is to optionally disable on-heap caching for specific set
> at the cost of lower performance if user wants so.
>
> I want to make sure that we are speaking about the same thing. By
> "on-heap caching" I mean custom datastructure and not standard on-heap
> cache, we can't disable it now, because it is the part of current
> IgniteSet implementation. Is it acceptable to use GridOffHeapMap
> optionally instead of ConcurrentHashMap for "off-heap mode"?
>
>
> 2018-02-15 23:20 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> > On Thu, Feb 15, 2018 at 6:08 AM, Vladimir Ozerov <vo...@gridgain.com>
> > wrote:
> >
> >> I do not think indexes is the right approach - set do not have indexes,
> and
> >> you will have to maintain additional counter for it in order to know
> when
> >> to stop.
> >>
> >> From what I see there are two distinct problems:
> >> 1) Broken recovery - this is just a bug which needs to be fixed. As
> soon as
> >> data is stored in real persistent cache, recovery of data structure
> state
> >> should be trivial task.
> >> 2) Heap items - this should not be a problem in common case when set
> >> contains moderate number of elements. If set is excessively large, then
> >> this is not the right structure for your use case and you should use
> >> standard IgniteCache API instead. What we can do is to optionally
> disable
> >> on-heap caching for specific set at the cost of lower performance if
> user
> >> wants so.
> >>
> >
> > Vladimir, I am not sure I agree. In my view, set should be similar to
> > cache, just a different API. I am not sure why we should make an
> > assumptions that set data should be smaller than cache, especially given
> > that it is a trivial task to implement a set based on Ignite cache API
> (we
> > could just store key-key mappings in cache instead of key-value mappings
> > internally).
> >
> > Can you clarify why you believe that IgniteSet should need to have
> on-heap
> > entries?
> >
> > D.
>

Re: IgniteSet implementation: changes required

Posted by Pavel Pereslegin <xx...@gmail.com>.

Hello Vladimir,

> What we can do is to optionally disable on-heap caching for specific set at the cost of lower performance if user wants so.

I want to make sure that we are speaking about the same thing. By
"on-heap caching" I mean custom datastructure and not standard on-heap
cache, we can't disable it now, because it is the part of current
IgniteSet implementation. Is it acceptable to use GridOffHeapMap
optionally instead of ConcurrentHashMap for "off-heap mode"?


2018-02-15 23:20 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> On Thu, Feb 15, 2018 at 6:08 AM, Vladimir Ozerov <vo...@gridgain.com>
> wrote:
>
>> I do not think indexes is the right approach - set do not have indexes, and
>> you will have to maintain additional counter for it in order to know when
>> to stop.
>>
>> From what I see there are two distinct problems:
>> 1) Broken recovery - this is just a bug which needs to be fixed. As soon as
>> data is stored in real persistent cache, recovery of data structure state
>> should be trivial task.
>> 2) Heap items - this should not be a problem in common case when set
>> contains moderate number of elements. If set is excessively large, then
>> this is not the right structure for your use case and you should use
>> standard IgniteCache API instead. What we can do is to optionally disable
>> on-heap caching for specific set at the cost of lower performance if user
>> wants so.
>>
>
> Vladimir, I am not sure I agree. In my view, set should be similar to
> cache, just a different API. I am not sure why we should make an
> assumptions that set data should be smaller than cache, especially given
> that it is a trivial task to implement a set based on Ignite cache API (we
> could just store key-key mappings in cache instead of key-value mappings
> internally).
>
> Can you clarify why you believe that IgniteSet should need to have on-heap
> entries?
>
> D.

Re: IgniteSet implementation: changes required

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Thu, Feb 15, 2018 at 6:08 AM, Vladimir Ozerov <vo...@gridgain.com>
wrote:

> I do not think indexes is the right approach - set do not have indexes, and
> you will have to maintain additional counter for it in order to know when
> to stop.
>
> From what I see there are two distinct problems:
> 1) Broken recovery - this is just a bug which needs to be fixed. As soon as
> data is stored in real persistent cache, recovery of data structure state
> should be trivial task.
> 2) Heap items - this should not be a problem in common case when set
> contains moderate number of elements. If set is excessively large, then
> this is not the right structure for your use case and you should use
> standard IgniteCache API instead. What we can do is to optionally disable
> on-heap caching for specific set at the cost of lower performance if user
> wants so.
>

Vladimir, I am not sure I agree. In my view, set should be similar to
cache, just a different API. I am not sure why we should make an
assumptions that set data should be smaller than cache, especially given
that it is a trivial task to implement a set based on Ignite cache API (we
could just store key-key mappings in cache instead of key-value mappings
internally).

Can you clarify why you believe that IgniteSet should need to have on-heap
entries?

D.

Re: IgniteSet implementation: changes required

Posted by Vladimir Ozerov <vo...@gridgain.com>.

I do not think indexes is the right approach - set do not have indexes, and
you will have to maintain additional counter for it in order to know when
to stop.

From what I see there are two distinct problems:
1) Broken recovery - this is just a bug which needs to be fixed. As soon as
data is stored in real persistent cache, recovery of data structure state
should be trivial task.
2) Heap items - this should not be a problem in common case when set
contains moderate number of elements. If set is excessively large, then
this is not the right structure for your use case and you should use
standard IgniteCache API instead. What we can do is to optionally disable
on-heap caching for specific set at the cost of lower performance if user
wants so.

On Wed, Feb 14, 2018 at 4:51 PM, Pavel Pereslegin <xx...@gmail.com> wrote:

> Hello, Igniters!
>
> I agree that solution with separate caches is not acceptable for a
> large number of sets.
>
> So, I want to suggest one more way to implement IgniteSet that will
> introduce element indexes (similar to IgniteQueue). To implement this
> we can add head/tail indexes to IgniteSet header and for each
> IgniteSet element store two key-value pairs:
>  setKey, index
>  index, setKey
>
> Indexes are required to support iterator and they should be continuous.
>
> Please, see detailed description in JIRA comment [1].
>
> With such approach add/remove/contains operations will have O(1) time
> complexity, iterator should work similar to current IgniteQueue
> iterator, issues [2], [3] will be resolved, because PDS recovery will
> work "out of the box" and we will not use JVM heap for duplicated
> values.
>
> Btw, we can use this implementation only for collocated mode (map
> keys/indexes to IgniteSet name) and use separate caches for
> non-collocated mode.
>
> What do you think about this?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-5553#comment-16364043
> [2] https://issues.apache.org/jira/browse/IGNITE-5553
> [3] https://issues.apache.org/jira/browse/IGNITE-7565
>
>
> 2018-02-13 9:33 GMT+03:00 Andrey Kuznetsov <st...@gmail.com>:
> > Indeed, all sets, regardless of whether they collocated or not, share
> > single cache, and also use onheap data structures irresistable to
> > checkpointing/recovery.
> >
> > 2018-02-13 2:14 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> >
> >> On Fri, Feb 9, 2018 at 6:26 PM, Andrey Kuznetsov <st...@gmail.com>
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> > Current set implementation has significant flaw: all set data are
> >> > duplicated in onheap maps on _every_ node in order to make iterator()
> and
> >> > size(). For me it looks like simple yet ineffective implementation.
> >> > Currently, these maps are damaged by checkpointing/recovery, and we
> could
> >> > patch them somehow. Another future change to Ignite caches can damage
> >> them
> >> > again. This looks fragile when datastructure is not entirely backed by
> >> > caches. Pavel's proposal seems to be a reliable solution for
> >> non-collocated
> >> > sets.
> >> >
> >>
> >> I would agree, but I was under an impression that non-collocated sets
> are
> >> already implemented this way. Am I wrong?
> >>
> >
> >
> >
> > --
> > Best regards,
> >   Andrey Kuznetsov.
>

Re: IgniteSet implementation: changes required

Posted by Pavel Pereslegin <xx...@gmail.com>.

Hello, Igniters!

I agree that solution with separate caches is not acceptable for a
large number of sets.

So, I want to suggest one more way to implement IgniteSet that will
introduce element indexes (similar to IgniteQueue). To implement this
we can add head/tail indexes to IgniteSet header and for each
IgniteSet element store two key-value pairs:
 setKey, index
 index, setKey

Indexes are required to support iterator and they should be continuous.

Please, see detailed description in JIRA comment [1].

With such approach add/remove/contains operations will have O(1) time
complexity, iterator should work similar to current IgniteQueue
iterator, issues [2], [3] will be resolved, because PDS recovery will
work "out of the box" and we will not use JVM heap for duplicated
values.

Btw, we can use this implementation only for collocated mode (map
keys/indexes to IgniteSet name) and use separate caches for
non-collocated mode.

What do you think about this?

[1] https://issues.apache.org/jira/browse/IGNITE-5553#comment-16364043
[2] https://issues.apache.org/jira/browse/IGNITE-5553
[3] https://issues.apache.org/jira/browse/IGNITE-7565

2018-02-13 9:33 GMT+03:00 Andrey Kuznetsov <st...@gmail.com>:
> Indeed, all sets, regardless of whether they collocated or not, share
> single cache, and also use onheap data structures irresistable to
> checkpointing/recovery.
>
> 2018-02-13 2:14 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>
>> On Fri, Feb 9, 2018 at 6:26 PM, Andrey Kuznetsov <st...@gmail.com>
>> wrote:
>>
>> > Hi all,
>> >
>> > Current set implementation has significant flaw: all set data are
>> > duplicated in onheap maps on _every_ node in order to make iterator() and
>> > size(). For me it looks like simple yet ineffective implementation.
>> > Currently, these maps are damaged by checkpointing/recovery, and we could
>> > patch them somehow. Another future change to Ignite caches can damage
>> them
>> > again. This looks fragile when datastructure is not entirely backed by
>> > caches. Pavel's proposal seems to be a reliable solution for
>> non-collocated
>> > sets.
>> >
>>
>> I would agree, but I was under an impression that non-collocated sets are
>> already implemented this way. Am I wrong?
>>
>
>
>
> --
> Best regards,
>   Andrey Kuznetsov.

Re: IgniteSet implementation: changes required

Posted by Andrey Kuznetsov <st...@gmail.com>.

Indeed, all sets, regardless of whether they collocated or not, share
single cache, and also use onheap data structures irresistable to
checkpointing/recovery.

2018-02-13 2:14 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> On Fri, Feb 9, 2018 at 6:26 PM, Andrey Kuznetsov <st...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > Current set implementation has significant flaw: all set data are
> > duplicated in onheap maps on _every_ node in order to make iterator() and
> > size(). For me it looks like simple yet ineffective implementation.
> > Currently, these maps are damaged by checkpointing/recovery, and we could
> > patch them somehow. Another future change to Ignite caches can damage
> them
> > again. This looks fragile when datastructure is not entirely backed by
> > caches. Pavel's proposal seems to be a reliable solution for
> non-collocated
> > sets.
> >
>
> I would agree, but I was under an impression that non-collocated sets are
> already implemented this way. Am I wrong?
>



-- 
Best regards,
  Andrey Kuznetsov.

Re: IgniteSet implementation: changes required

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Fri, Feb 9, 2018 at 6:26 PM, Andrey Kuznetsov <st...@gmail.com> wrote:

> Hi all,
>
> Current set implementation has significant flaw: all set data are
> duplicated in onheap maps on _every_ node in order to make iterator() and
> size(). For me it looks like simple yet ineffective implementation.
> Currently, these maps are damaged by checkpointing/recovery, and we could
> patch them somehow. Another future change to Ignite caches can damage them
> again. This looks fragile when datastructure is not entirely backed by
> caches. Pavel's proposal seems to be a reliable solution for non-collocated
> sets.
>

I would agree, but I was under an impression that non-collocated sets are
already implemented this way. Am I wrong?

Re: IgniteSet implementation: changes required

Posted by Andrey Kuznetsov <st...@gmail.com>.

Hi all,

Current set implementation has significant flaw: all set data are
duplicated in onheap maps on _every_ node in order to make iterator() and
size(). For me it looks like simple yet ineffective implementation.
Currently, these maps are damaged by checkpointing/recovery, and we could
patch them somehow. Another future change to Ignite caches can damage them
again. This looks fragile when datastructure is not entirely backed by
caches. Pavel's proposal seems to be a reliable solution for non-collocated
sets.

9 февр. 2018 г. 22:46 пользователь "Valentin Kulichenko" <
valentin.kulichenko@gmail.com> написал:

Pavel,

I'm a bit confused. In my understanding, issue exists because we have local
in-memory maps which are used as the main source of truth about which
structures currently exist. During restart, we lose all this data even if
data structures cache(s) are persisted. Once we fix this, issue goes away,
regardless of weather we store data structure per cache or everything in
single cache. Am I missing something?

I also agree with Dmitry. While cache per set approach can make sense for
non-collocated sets, for collocated ones it definitely doesn't. So I would
fix the original issue first, and then change the architecture if it's
really needed.

-Val

Re: IgniteSet implementation: changes required

Posted by Valentin Kulichenko <va...@gmail.com>.

Pavel,

I'm a bit confused. In my understanding, issue exists because we have local
in-memory maps which are used as the main source of truth about which
structures currently exist. During restart, we lose all this data even if
data structures cache(s) are persisted. Once we fix this, issue goes away,
regardless of weather we store data structure per cache or everything in
single cache. Am I missing something?

I also agree with Dmitry. While cache per set approach can make sense for
non-collocated sets, for collocated ones it definitely doesn't. So I would
fix the original issue first, and then change the architecture if it's
really needed.

-Val

On Fri, Feb 9, 2018 at 10:39 AM, Dmitriy Setrakyan <ds...@apache.org>
wrote:

> Hi Pavel,
>
> We have 2 types of data structures, collocated and non-collocated. The
> difference between them is that the collocated set is generally smaller and
> will always end up on the same node. Users generally will have many
> colllocated sets. On the other hand, a non-collocated set can span multiple
> nodes and therefore is able to store a lot more data.
>
> I can see how cache-per-set strategy can be applied to the non-collocated
> set. As a matter of fact, I would be surprised if it is not implemented
> that way already.
>
> However, I do not see this strategy applied to the collocated sets. Users
> can have 1000s of collocated sets or more. Are you suggesting that this
> will translate into 1000s of caches?
>
> D.
>
> On Fri, Feb 9, 2018 at 8:10 AM, Pavel Pereslegin <xx...@gmail.com> wrote:
>
> > Hello, Valentin.
> >
> > Thank you for the reply.
> >
> > As mentioned in this conversation, for now we have at least two issues
> > with IgniteSet:
> > 1. Incorrect behavior after recovery from PDS [1].
> > 2. The data in the cache is duplicated on-heap [2], which is not
> > documented and lead to heap/GC overhead when using large Sets.
> >
> > Without significant changes, it is possible to solve [1] with the
> > workaround, proposed by Andrey Kuznetsov - iterate over all
> > datastructure-backing caches entries during recover from checkpoint
> > procedure, filter set-related entries and refill setDataMap's.
> > As a workaround for [2] we can add configuration option which data
> > structure to use for "local caching" (on-heap or off-heap).
> > If we go this way then cache data duplication will remain and some
> > kind of off-heap ConcurrentHashMap should be implemented in Ignite
> > (probably, already exists in some form, need to investigate this topic
> > properly).
> >
> > On the other hand, if we use separate cache for each IgniteSet instance:
> > 1. It will be not necessary to maintain redundant data stored
> > somewhere other than the cache.
> > 2. It will be not necessary to implement workaround for recovery from
> PDS.
> > For the collocated mode we can, for example, enforce REPLICATED cache
> mode.
> >
> > Why don't you like the idea with separate cache?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-7565
> > [2] https://issues.apache.org/jira/browse/IGNITE-5553
> >
> >
> > 2018-02-09 0:44 GMT+03:00 Valentin Kulichenko <
> > valentin.kulichenko@gmail.com>:
> > > Pavel,
> > >
> > > I don't like an idea of creating separate cache for each data
> structure,
> > > especially for collocated ones. Can actually, I'm not sure I understand
> > how
> > > that would help. It sounds like that we just need to properly persist
> the
> > > data structures cache and then reload on restart.
> > >
> > > -Val
> > >
> > > On Thu, Feb 8, 2018 at 6:12 AM, Pavel Pereslegin <xx...@gmail.com>
> > wrote:
> > >
> > >> Hello, Igniters!
> > >>
> > >> We have some issues with current IgniteSet implementation ([1], [2],
> > [3],
> > >> [4]).
> > >>
> > >> As was already described in this conversation, the main problem is
> > >> that current IgniteSet implementation maintains plain Java sets on
> > >> every node (see CacheDataStructuresManager.setDataMap). These sets
> > >> duplicate backing-cache entries, both primary and backup. size() and
> > >> iterator() calls issue distributed queries to collect/filter data from
> > >> all setDataMap's.
> > >>
> > >> I believe we can solve specified issues if each instance of IgniteSet
> > >> will have separate internal cache that will be destroyed on close.
> > >>
> > >> What do you think about such major change? Do you have any thoughts or
> > >> objections?
> > >>
> > >> [1] https://issues.apache.org/jira/browse/IGNITE-7565
> > >> [2] https://issues.apache.org/jira/browse/IGNITE-5370
> > >> [3] https://issues.apache.org/jira/browse/IGNITE-5553
> > >> [4] https://issues.apache.org/jira/browse/IGNITE-6474
> > >>
> > >>
> > >> 2017-10-31 5:53 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> > >> > Hi Andrey,
> > >> >
> > >> > Thanks for a detailed email. I think your suggestions do make sense.
> > >> Ignite
> > >> > cannot afford to have a distributed set that is not fail-safe. Can
> you
> > >> > please focus only on solutions that provide consistent behavior in
> > case
> > >> of
> > >> > topology changes and failures and document them in the ticket?
> > >> >
> > >> > https://issues.apache.org/jira/browse/IGNITE-5553
> > >> >
> > >> > D.
> > >> >
> > >> > On Mon, Oct 30, 2017 at 3:07 AM, Andrey Kuznetsov <
> stkuzma@gmail.com>
> > >> wrote:
> > >> >
> > >> >> Hi, Igniters!
> > >> >>
> > >> >> Current implementation of IgniteSet is fragile with respect to
> > cluster
> > >> >> recovery from a checkpoint. We have an issue (IGNITE-5553) that
> > >> addresses
> > >> >> set's size() behavior, but the problem is slightly broader. The
> text
> > >> below
> > >> >> is my comment from Jira issue. I encourage you to discuss it.
> > >> >>
> > >> >> We can put current set size into set header cache entry. This will
> > fix
> > >> >> size(), but we have broken iterator() implementation as well.
> > >> >>
> > >> >> Currently, set implementation maintains plain Java sets on every
> > node,
> > >> see
> > >> >> CacheDataStructuresManager.setDataMap. These sets duplicate
> > >> backing-cache
> > >> >> entries, both primary and backup. size() and iterator() calls issue
> > >> >> distributed queries to collect/filter data from all setDataMap's.
> And
> > >> >> setDataMaps remain empty after cluster is recovered from
> checkpoint.
> > >> >>
> > >> >> Now I see the following options to fix the issue.
> > >> >>
> > >> >> #1 - Naive. Iterate over all datastructure-backing caches entries
> > during
> > >> >> recover from checkpoint procedure, filter set-related entries and
> > refill
> > >> >> setDataMap's.
> > >> >> Pros: easy to implement
> > >> >> Cons: inpredictable time/memory overhead.
> > >> >>
> > >> >> #2 - More realistic. Avoid node-local copies of cache data.
> Maintain
> > >> linked
> > >> >> list in datastructure-backing cache: key is set item, value is next
> > set
> > >> >> item. List head is stored in set header cache entry (this set item
> is
> > >> >> youngest one). Iterators build on top of this structure are
> > fail-fast.
> > >> >> Pros: less memory overhead, no need to maintain node-local mirrors
> of
> > >> cache
> > >> >> data
> > >> >> Cons: iterators are not fail-safe.
> > >> >>
> > >> >> #3 - Option #2 modified. We can store reference counter and
> 'removed'
> > >> flag
> > >> >> along with next item reference. This allows to make iterators fail
> > safe.
> > >> >> Pros: iterators are fail-safe
> > >> >> Cons: slightly more complicated implementation, may affect
> > performance,
> > >> >> also I see no way to handle active iterators on remote nodes
> > failures.
> > >> >>
> > >> >>
> > >> >> Best regards,
> > >> >>
> > >> >> Andrey.
> > >> >>
> > >>
> >
>

Re: IgniteSet implementation: changes required

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Hi Pavel,

We have 2 types of data structures, collocated and non-collocated. The
difference between them is that the collocated set is generally smaller and
will always end up on the same node. Users generally will have many
colllocated sets. On the other hand, a non-collocated set can span multiple
nodes and therefore is able to store a lot more data.

I can see how cache-per-set strategy can be applied to the non-collocated
set. As a matter of fact, I would be surprised if it is not implemented
that way already.

However, I do not see this strategy applied to the collocated sets. Users
can have 1000s of collocated sets or more. Are you suggesting that this
will translate into 1000s of caches?

D.

On Fri, Feb 9, 2018 at 8:10 AM, Pavel Pereslegin <xx...@gmail.com> wrote:

> Hello, Valentin.
>
> Thank you for the reply.
>
> As mentioned in this conversation, for now we have at least two issues
> with IgniteSet:
> 1. Incorrect behavior after recovery from PDS [1].
> 2. The data in the cache is duplicated on-heap [2], which is not
> documented and lead to heap/GC overhead when using large Sets.
>
> Without significant changes, it is possible to solve [1] with the
> workaround, proposed by Andrey Kuznetsov - iterate over all
> datastructure-backing caches entries during recover from checkpoint
> procedure, filter set-related entries and refill setDataMap's.
> As a workaround for [2] we can add configuration option which data
> structure to use for "local caching" (on-heap or off-heap).
> If we go this way then cache data duplication will remain and some
> kind of off-heap ConcurrentHashMap should be implemented in Ignite
> (probably, already exists in some form, need to investigate this topic
> properly).
>
> On the other hand, if we use separate cache for each IgniteSet instance:
> 1. It will be not necessary to maintain redundant data stored
> somewhere other than the cache.
> 2. It will be not necessary to implement workaround for recovery from PDS.
> For the collocated mode we can, for example, enforce REPLICATED cache mode.
>
> Why don't you like the idea with separate cache?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-7565
> [2] https://issues.apache.org/jira/browse/IGNITE-5553
>
>
> 2018-02-09 0:44 GMT+03:00 Valentin Kulichenko <
> valentin.kulichenko@gmail.com>:
> > Pavel,
> >
> > I don't like an idea of creating separate cache for each data structure,
> > especially for collocated ones. Can actually, I'm not sure I understand
> how
> > that would help. It sounds like that we just need to properly persist the
> > data structures cache and then reload on restart.
> >
> > -Val
> >
> > On Thu, Feb 8, 2018 at 6:12 AM, Pavel Pereslegin <xx...@gmail.com>
> wrote:
> >
> >> Hello, Igniters!
> >>
> >> We have some issues with current IgniteSet implementation ([1], [2],
> [3],
> >> [4]).
> >>
> >> As was already described in this conversation, the main problem is
> >> that current IgniteSet implementation maintains plain Java sets on
> >> every node (see CacheDataStructuresManager.setDataMap). These sets
> >> duplicate backing-cache entries, both primary and backup. size() and
> >> iterator() calls issue distributed queries to collect/filter data from
> >> all setDataMap's.
> >>
> >> I believe we can solve specified issues if each instance of IgniteSet
> >> will have separate internal cache that will be destroyed on close.
> >>
> >> What do you think about such major change? Do you have any thoughts or
> >> objections?
> >>
> >> [1] https://issues.apache.org/jira/browse/IGNITE-7565
> >> [2] https://issues.apache.org/jira/browse/IGNITE-5370
> >> [3] https://issues.apache.org/jira/browse/IGNITE-5553
> >> [4] https://issues.apache.org/jira/browse/IGNITE-6474
> >>
> >>
> >> 2017-10-31 5:53 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> >> > Hi Andrey,
> >> >
> >> > Thanks for a detailed email. I think your suggestions do make sense.
> >> Ignite
> >> > cannot afford to have a distributed set that is not fail-safe. Can you
> >> > please focus only on solutions that provide consistent behavior in
> case
> >> of
> >> > topology changes and failures and document them in the ticket?
> >> >
> >> > https://issues.apache.org/jira/browse/IGNITE-5553
> >> >
> >> > D.
> >> >
> >> > On Mon, Oct 30, 2017 at 3:07 AM, Andrey Kuznetsov <st...@gmail.com>
> >> wrote:
> >> >
> >> >> Hi, Igniters!
> >> >>
> >> >> Current implementation of IgniteSet is fragile with respect to
> cluster
> >> >> recovery from a checkpoint. We have an issue (IGNITE-5553) that
> >> addresses
> >> >> set's size() behavior, but the problem is slightly broader. The text
> >> below
> >> >> is my comment from Jira issue. I encourage you to discuss it.
> >> >>
> >> >> We can put current set size into set header cache entry. This will
> fix
> >> >> size(), but we have broken iterator() implementation as well.
> >> >>
> >> >> Currently, set implementation maintains plain Java sets on every
> node,
> >> see
> >> >> CacheDataStructuresManager.setDataMap. These sets duplicate
> >> backing-cache
> >> >> entries, both primary and backup. size() and iterator() calls issue
> >> >> distributed queries to collect/filter data from all setDataMap's. And
> >> >> setDataMaps remain empty after cluster is recovered from checkpoint.
> >> >>
> >> >> Now I see the following options to fix the issue.
> >> >>
> >> >> #1 - Naive. Iterate over all datastructure-backing caches entries
> during
> >> >> recover from checkpoint procedure, filter set-related entries and
> refill
> >> >> setDataMap's.
> >> >> Pros: easy to implement
> >> >> Cons: inpredictable time/memory overhead.
> >> >>
> >> >> #2 - More realistic. Avoid node-local copies of cache data. Maintain
> >> linked
> >> >> list in datastructure-backing cache: key is set item, value is next
> set
> >> >> item. List head is stored in set header cache entry (this set item is
> >> >> youngest one). Iterators build on top of this structure are
> fail-fast.
> >> >> Pros: less memory overhead, no need to maintain node-local mirrors of
> >> cache
> >> >> data
> >> >> Cons: iterators are not fail-safe.
> >> >>
> >> >> #3 - Option #2 modified. We can store reference counter and 'removed'
> >> flag
> >> >> along with next item reference. This allows to make iterators fail
> safe.
> >> >> Pros: iterators are fail-safe
> >> >> Cons: slightly more complicated implementation, may affect
> performance,
> >> >> also I see no way to handle active iterators on remote nodes
> failures.
> >> >>
> >> >>
> >> >> Best regards,
> >> >>
> >> >> Andrey.
> >> >>
> >>
>

Re: IgniteSet implementation: changes required

Posted by Pavel Pereslegin <xx...@gmail.com>.

Hello, Valentin.

Thank you for the reply.

As mentioned in this conversation, for now we have at least two issues
with IgniteSet:
1. Incorrect behavior after recovery from PDS [1].
2. The data in the cache is duplicated on-heap [2], which is not
documented and lead to heap/GC overhead when using large Sets.

Without significant changes, it is possible to solve [1] with the
workaround, proposed by Andrey Kuznetsov - iterate over all
datastructure-backing caches entries during recover from checkpoint
procedure, filter set-related entries and refill setDataMap's.
As a workaround for [2] we can add configuration option which data
structure to use for "local caching" (on-heap or off-heap).
If we go this way then cache data duplication will remain and some
kind of off-heap ConcurrentHashMap should be implemented in Ignite
(probably, already exists in some form, need to investigate this topic
properly).

On the other hand, if we use separate cache for each IgniteSet instance:
1. It will be not necessary to maintain redundant data stored
somewhere other than the cache.
2. It will be not necessary to implement workaround for recovery from PDS.
For the collocated mode we can, for example, enforce REPLICATED cache mode.

Why don't you like the idea with separate cache?

[1] https://issues.apache.org/jira/browse/IGNITE-7565
[2] https://issues.apache.org/jira/browse/IGNITE-5553


2018-02-09 0:44 GMT+03:00 Valentin Kulichenko <va...@gmail.com>:
> Pavel,
>
> I don't like an idea of creating separate cache for each data structure,
> especially for collocated ones. Can actually, I'm not sure I understand how
> that would help. It sounds like that we just need to properly persist the
> data structures cache and then reload on restart.
>
> -Val
>
> On Thu, Feb 8, 2018 at 6:12 AM, Pavel Pereslegin <xx...@gmail.com> wrote:
>
>> Hello, Igniters!
>>
>> We have some issues with current IgniteSet implementation ([1], [2], [3],
>> [4]).
>>
>> As was already described in this conversation, the main problem is
>> that current IgniteSet implementation maintains plain Java sets on
>> every node (see CacheDataStructuresManager.setDataMap). These sets
>> duplicate backing-cache entries, both primary and backup. size() and
>> iterator() calls issue distributed queries to collect/filter data from
>> all setDataMap's.
>>
>> I believe we can solve specified issues if each instance of IgniteSet
>> will have separate internal cache that will be destroyed on close.
>>
>> What do you think about such major change? Do you have any thoughts or
>> objections?
>>
>> [1] https://issues.apache.org/jira/browse/IGNITE-7565
>> [2] https://issues.apache.org/jira/browse/IGNITE-5370
>> [3] https://issues.apache.org/jira/browse/IGNITE-5553
>> [4] https://issues.apache.org/jira/browse/IGNITE-6474
>>
>>
>> 2017-10-31 5:53 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>> > Hi Andrey,
>> >
>> > Thanks for a detailed email. I think your suggestions do make sense.
>> Ignite
>> > cannot afford to have a distributed set that is not fail-safe. Can you
>> > please focus only on solutions that provide consistent behavior in case
>> of
>> > topology changes and failures and document them in the ticket?
>> >
>> > https://issues.apache.org/jira/browse/IGNITE-5553
>> >
>> > D.
>> >
>> > On Mon, Oct 30, 2017 at 3:07 AM, Andrey Kuznetsov <st...@gmail.com>
>> wrote:
>> >
>> >> Hi, Igniters!
>> >>
>> >> Current implementation of IgniteSet is fragile with respect to cluster
>> >> recovery from a checkpoint. We have an issue (IGNITE-5553) that
>> addresses
>> >> set's size() behavior, but the problem is slightly broader. The text
>> below
>> >> is my comment from Jira issue. I encourage you to discuss it.
>> >>
>> >> We can put current set size into set header cache entry. This will fix
>> >> size(), but we have broken iterator() implementation as well.
>> >>
>> >> Currently, set implementation maintains plain Java sets on every node,
>> see
>> >> CacheDataStructuresManager.setDataMap. These sets duplicate
>> backing-cache
>> >> entries, both primary and backup. size() and iterator() calls issue
>> >> distributed queries to collect/filter data from all setDataMap's. And
>> >> setDataMaps remain empty after cluster is recovered from checkpoint.
>> >>
>> >> Now I see the following options to fix the issue.
>> >>
>> >> #1 - Naive. Iterate over all datastructure-backing caches entries during
>> >> recover from checkpoint procedure, filter set-related entries and refill
>> >> setDataMap's.
>> >> Pros: easy to implement
>> >> Cons: inpredictable time/memory overhead.
>> >>
>> >> #2 - More realistic. Avoid node-local copies of cache data. Maintain
>> linked
>> >> list in datastructure-backing cache: key is set item, value is next set
>> >> item. List head is stored in set header cache entry (this set item is
>> >> youngest one). Iterators build on top of this structure are fail-fast.
>> >> Pros: less memory overhead, no need to maintain node-local mirrors of
>> cache
>> >> data
>> >> Cons: iterators are not fail-safe.
>> >>
>> >> #3 - Option #2 modified. We can store reference counter and 'removed'
>> flag
>> >> along with next item reference. This allows to make iterators fail safe.
>> >> Pros: iterators are fail-safe
>> >> Cons: slightly more complicated implementation, may affect performance,
>> >> also I see no way to handle active iterators on remote nodes failures.
>> >>
>> >>
>> >> Best regards,
>> >>
>> >> Andrey.
>> >>
>>

Re: IgniteSet implementation: changes required

Posted by Valentin Kulichenko <va...@gmail.com>.

Pavel,

I don't like an idea of creating separate cache for each data structure,
especially for collocated ones. Can actually, I'm not sure I understand how
that would help. It sounds like that we just need to properly persist the
data structures cache and then reload on restart.

-Val

On Thu, Feb 8, 2018 at 6:12 AM, Pavel Pereslegin <xx...@gmail.com> wrote:

> Hello, Igniters!
>
> We have some issues with current IgniteSet implementation ([1], [2], [3],
> [4]).
>
> As was already described in this conversation, the main problem is
> that current IgniteSet implementation maintains plain Java sets on
> every node (see CacheDataStructuresManager.setDataMap). These sets
> duplicate backing-cache entries, both primary and backup. size() and
> iterator() calls issue distributed queries to collect/filter data from
> all setDataMap's.
>
> I believe we can solve specified issues if each instance of IgniteSet
> will have separate internal cache that will be destroyed on close.
>
> What do you think about such major change? Do you have any thoughts or
> objections?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-7565
> [2] https://issues.apache.org/jira/browse/IGNITE-5370
> [3] https://issues.apache.org/jira/browse/IGNITE-5553
> [4] https://issues.apache.org/jira/browse/IGNITE-6474
>
>
> 2017-10-31 5:53 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> > Hi Andrey,
> >
> > Thanks for a detailed email. I think your suggestions do make sense.
> Ignite
> > cannot afford to have a distributed set that is not fail-safe. Can you
> > please focus only on solutions that provide consistent behavior in case
> of
> > topology changes and failures and document them in the ticket?
> >
> > https://issues.apache.org/jira/browse/IGNITE-5553
> >
> > D.
> >
> > On Mon, Oct 30, 2017 at 3:07 AM, Andrey Kuznetsov <st...@gmail.com>
> wrote:
> >
> >> Hi, Igniters!
> >>
> >> Current implementation of IgniteSet is fragile with respect to cluster
> >> recovery from a checkpoint. We have an issue (IGNITE-5553) that
> addresses
> >> set's size() behavior, but the problem is slightly broader. The text
> below
> >> is my comment from Jira issue. I encourage you to discuss it.
> >>
> >> We can put current set size into set header cache entry. This will fix
> >> size(), but we have broken iterator() implementation as well.
> >>
> >> Currently, set implementation maintains plain Java sets on every node,
> see
> >> CacheDataStructuresManager.setDataMap. These sets duplicate
> backing-cache
> >> entries, both primary and backup. size() and iterator() calls issue
> >> distributed queries to collect/filter data from all setDataMap's. And
> >> setDataMaps remain empty after cluster is recovered from checkpoint.
> >>
> >> Now I see the following options to fix the issue.
> >>
> >> #1 - Naive. Iterate over all datastructure-backing caches entries during
> >> recover from checkpoint procedure, filter set-related entries and refill
> >> setDataMap's.
> >> Pros: easy to implement
> >> Cons: inpredictable time/memory overhead.
> >>
> >> #2 - More realistic. Avoid node-local copies of cache data. Maintain
> linked
> >> list in datastructure-backing cache: key is set item, value is next set
> >> item. List head is stored in set header cache entry (this set item is
> >> youngest one). Iterators build on top of this structure are fail-fast.
> >> Pros: less memory overhead, no need to maintain node-local mirrors of
> cache
> >> data
> >> Cons: iterators are not fail-safe.
> >>
> >> #3 - Option #2 modified. We can store reference counter and 'removed'
> flag
> >> along with next item reference. This allows to make iterators fail safe.
> >> Pros: iterators are fail-safe
> >> Cons: slightly more complicated implementation, may affect performance,
> >> also I see no way to handle active iterators on remote nodes failures.
> >>
> >>
> >> Best regards,
> >>
> >> Andrey.
> >>
>