You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Vladimir Ozerov <vo...@gridgain.com> on 2016/04/27 16:27:56 UTC

POJO store usability issues

Igniters,

We receive more and more questions about the same problem: "I have big a
database. How should I load it to Ignite?"

Obviously, users try to use POJO store as a most convenient approach, but
it cannot handle this case properly.
1) If user invoke *IgniteCache.loadCache()*, then the same request - and
usually this is full table scan - will be invoked on every node leading to
very poor performance. For instance, we have a report of a load of 47M
entries to cache on 16 nodes which took ... 8 hours!!!
2) If user invoke IgniteCache.localLoadCache(), then our internal cache
logic will filter out non-primary and non-backup entries. So this approach
doesn't work either.
3) User could try using *IgniteDataStreamer*, but in this case he had to
deal with all JDBC-related stuff on his own - not convenient.
4) Another approach I heard several times - "user should have an attribute
for affinity in the table ...". And the idea that this way user will be
able to divide the whole data set into several disjoint sets with specific
affinity. Doesn't work. Consider the user with some legacy database - the
most common use case. How is he going to work with affinity?

Bottom line: Ignite has *no convenient way *to load millions of entries
from a database.

We need to start thinking of possible solutions. Several ideas from my side:

1) POJO store must be much more flexible. We should be able to pass
different queries to different nodes when calling "loadCache".

2) Cache store could have additional mode when it will not ignore
non-primary non-backup entries, but rather *distribute *it to other nodes.
E.g. with help of data streamer.

Thoughts?

Vladimir.

Re: POJO store usability issues

Posted by Dmitriy Setrakyan <ds...@apache.org>.
On Wed, May 4, 2016 at 9:12 AM, Alexey Kuznetsov <ak...@gridgain.com>
wrote:

>
> May be, but sending closures is seems as workaround for me.
> This will require from user to write quite a lot of code.
> Just call of loadCache(...some descriptors...) will be much more user
> friendly.
>

I think documenting this properly will take us a long way and will help our
users. After this is documented, we can then come up with a better design
and include it into one of future releases.


> Also I'm not sure what will happen with key loaded from DB if it will be
> not affinity key for node where  localLoadCache will be executed.
> Will it be ignored or put into cache?
>

They will be ignored.


>
> Semen, Alexey Goncharuk - could you tell what will happen with such keys
> inside of localLoadCache ?
>
> On Tue, May 3, 2016 at 11:10 PM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
>
> > Alexey,
> >
> > What you are saying should already be possible by sending compute
> closures
> > to nodes and calling localLoadCache(“myCustomSqlStatement”).
> >
> > Isn’t it just a case of providing proper example and documentation?
> >
> > D.
> >
> > On Tue, May 3, 2016 at 7:22 AM, Alexey Kuznetsov <
> akuznetsov@gridgain.com>
> > wrote:
> >
> > > I totally agree with Vladimir.
> > >
> > > From JdbcPojo store side we could introduce support of some kind load
> > > descriptor
> > >  that will contains SQL to execute and node filter.
> > >
> > > On each node store will check node filter and execute SQL if node match
> > the
> > > filter.
> > >
> > > This will solve first problem - "do not load full database on each
> node"
> > .
> > >
> > > As for second problem - "not ignore non-primary non-backup entries" - I
> > > think this should be solved on a cache level,
> > >  because store does not know anything about primary / backup.
> > >
> > > Thoughts?
> > >
> > >
> > > On Wed, Apr 27, 2016 at 9:27 PM, Vladimir Ozerov <vozerov@gridgain.com
> >
> > > wrote:
> > >
> > > > Igniters,
> > > >
> > > > We receive more and more questions about the same problem: "I have
> big
> > a
> > > > database. How should I load it to Ignite?"
> > > >
> > > > Obviously, users try to use POJO store as a most convenient approach,
> > but
> > > > it cannot handle this case properly.
> > > > 1) If user invoke *IgniteCache.loadCache()*, then the same request -
> > and
> > > > usually this is full table scan - will be invoked on every node
> leading
> > > to
> > > > very poor performance. For instance, we have a report of a load of
> 47M
> > > > entries to cache on 16 nodes which took ... 8 hours!!!
> > > > 2) If user invoke IgniteCache.localLoadCache(), then our internal
> cache
> > > > logic will filter out non-primary and non-backup entries. So this
> > > approach
> > > > doesn't work either.
> > > > 3) User could try using *IgniteDataStreamer*, but in this case he had
> > to
> > > > deal with all JDBC-related stuff on his own - not convenient.
> > > > 4) Another approach I heard several times - "user should have an
> > > attribute
> > > > for affinity in the table ...". And the idea that this way user will
> be
> > > > able to divide the whole data set into several disjoint sets with
> > > specific
> > > > affinity. Doesn't work. Consider the user with some legacy database -
> > the
> > > > most common use case. How is he going to work with affinity?
> > > >
> > > > Bottom line: Ignite has *no convenient way *to load millions of
> entries
> > > > from a database.
> > > >
> > > > We need to start thinking of possible solutions. Several ideas from
> my
> > > > side:
> > > >
> > > > 1) POJO store must be much more flexible. We should be able to pass
> > > > different queries to different nodes when calling "loadCache".
> > > >
> > > > 2) Cache store could have additional mode when it will not ignore
> > > > non-primary non-backup entries, but rather *distribute *it to other
> > > nodes.
> > > > E.g. with help of data streamer.
> > > >
> > > > Thoughts?
> > > >
> > > > Vladimir.
> > > >
> > >
> > >
> > >
> > > --
> > > Alexey Kuznetsov
> > > GridGain Systems
> > > www.gridgain.com
> > >
> >
>
>
>
> --
> Alexey Kuznetsov
> GridGain Systems
> www.gridgain.com
>

Re: POJO store usability issues

Posted by Alexey Kuznetsov <ak...@gridgain.com>.
Dmitriy,

May be, but sending closures is seems as workaround for me.
This will require from user to write quite a lot of code.
Just call of loadCache(...some descriptors...) will be much more user
friendly.


Also I'm not sure what will happen with key loaded from DB if it will be
not affinity key for node where  localLoadCache will be executed.
Will it be ignored or put into cache?

Semen, Alexey Goncharuk - could you tell what will happen with such keys
inside of localLoadCache ?

On Tue, May 3, 2016 at 11:10 PM, Dmitriy Setrakyan <ds...@apache.org>
wrote:

> Alexey,
>
> What you are saying should already be possible by sending compute closures
> to nodes and calling localLoadCache(“myCustomSqlStatement”).
>
> Isn’t it just a case of providing proper example and documentation?
>
> D.
>
> On Tue, May 3, 2016 at 7:22 AM, Alexey Kuznetsov <ak...@gridgain.com>
> wrote:
>
> > I totally agree with Vladimir.
> >
> > From JdbcPojo store side we could introduce support of some kind load
> > descriptor
> >  that will contains SQL to execute and node filter.
> >
> > On each node store will check node filter and execute SQL if node match
> the
> > filter.
> >
> > This will solve first problem - "do not load full database on each node"
> .
> >
> > As for second problem - "not ignore non-primary non-backup entries" - I
> > think this should be solved on a cache level,
> >  because store does not know anything about primary / backup.
> >
> > Thoughts?
> >
> >
> > On Wed, Apr 27, 2016 at 9:27 PM, Vladimir Ozerov <vo...@gridgain.com>
> > wrote:
> >
> > > Igniters,
> > >
> > > We receive more and more questions about the same problem: "I have big
> a
> > > database. How should I load it to Ignite?"
> > >
> > > Obviously, users try to use POJO store as a most convenient approach,
> but
> > > it cannot handle this case properly.
> > > 1) If user invoke *IgniteCache.loadCache()*, then the same request -
> and
> > > usually this is full table scan - will be invoked on every node leading
> > to
> > > very poor performance. For instance, we have a report of a load of 47M
> > > entries to cache on 16 nodes which took ... 8 hours!!!
> > > 2) If user invoke IgniteCache.localLoadCache(), then our internal cache
> > > logic will filter out non-primary and non-backup entries. So this
> > approach
> > > doesn't work either.
> > > 3) User could try using *IgniteDataStreamer*, but in this case he had
> to
> > > deal with all JDBC-related stuff on his own - not convenient.
> > > 4) Another approach I heard several times - "user should have an
> > attribute
> > > for affinity in the table ...". And the idea that this way user will be
> > > able to divide the whole data set into several disjoint sets with
> > specific
> > > affinity. Doesn't work. Consider the user with some legacy database -
> the
> > > most common use case. How is he going to work with affinity?
> > >
> > > Bottom line: Ignite has *no convenient way *to load millions of entries
> > > from a database.
> > >
> > > We need to start thinking of possible solutions. Several ideas from my
> > > side:
> > >
> > > 1) POJO store must be much more flexible. We should be able to pass
> > > different queries to different nodes when calling "loadCache".
> > >
> > > 2) Cache store could have additional mode when it will not ignore
> > > non-primary non-backup entries, but rather *distribute *it to other
> > nodes.
> > > E.g. with help of data streamer.
> > >
> > > Thoughts?
> > >
> > > Vladimir.
> > >
> >
> >
> >
> > --
> > Alexey Kuznetsov
> > GridGain Systems
> > www.gridgain.com
> >
>



-- 
Alexey Kuznetsov
GridGain Systems
www.gridgain.com

Re: POJO store usability issues

Posted by Dmitriy Setrakyan <ds...@apache.org>.
Alexey,

What you are saying should already be possible by sending compute closures
to nodes and calling localLoadCache(“myCustomSqlStatement”).

Isn’t it just a case of providing proper example and documentation?

D.

On Tue, May 3, 2016 at 7:22 AM, Alexey Kuznetsov <ak...@gridgain.com>
wrote:

> I totally agree with Vladimir.
>
> From JdbcPojo store side we could introduce support of some kind load
> descriptor
>  that will contains SQL to execute and node filter.
>
> On each node store will check node filter and execute SQL if node match the
> filter.
>
> This will solve first problem - "do not load full database on each node" .
>
> As for second problem - "not ignore non-primary non-backup entries" - I
> think this should be solved on a cache level,
>  because store does not know anything about primary / backup.
>
> Thoughts?
>
>
> On Wed, Apr 27, 2016 at 9:27 PM, Vladimir Ozerov <vo...@gridgain.com>
> wrote:
>
> > Igniters,
> >
> > We receive more and more questions about the same problem: "I have big a
> > database. How should I load it to Ignite?"
> >
> > Obviously, users try to use POJO store as a most convenient approach, but
> > it cannot handle this case properly.
> > 1) If user invoke *IgniteCache.loadCache()*, then the same request - and
> > usually this is full table scan - will be invoked on every node leading
> to
> > very poor performance. For instance, we have a report of a load of 47M
> > entries to cache on 16 nodes which took ... 8 hours!!!
> > 2) If user invoke IgniteCache.localLoadCache(), then our internal cache
> > logic will filter out non-primary and non-backup entries. So this
> approach
> > doesn't work either.
> > 3) User could try using *IgniteDataStreamer*, but in this case he had to
> > deal with all JDBC-related stuff on his own - not convenient.
> > 4) Another approach I heard several times - "user should have an
> attribute
> > for affinity in the table ...". And the idea that this way user will be
> > able to divide the whole data set into several disjoint sets with
> specific
> > affinity. Doesn't work. Consider the user with some legacy database - the
> > most common use case. How is he going to work with affinity?
> >
> > Bottom line: Ignite has *no convenient way *to load millions of entries
> > from a database.
> >
> > We need to start thinking of possible solutions. Several ideas from my
> > side:
> >
> > 1) POJO store must be much more flexible. We should be able to pass
> > different queries to different nodes when calling "loadCache".
> >
> > 2) Cache store could have additional mode when it will not ignore
> > non-primary non-backup entries, but rather *distribute *it to other
> nodes.
> > E.g. with help of data streamer.
> >
> > Thoughts?
> >
> > Vladimir.
> >
>
>
>
> --
> Alexey Kuznetsov
> GridGain Systems
> www.gridgain.com
>

Re: POJO store usability issues

Posted by Alexey Kuznetsov <ak...@gridgain.com>.
I totally agree with Vladimir.

From JdbcPojo store side we could introduce support of some kind load
descriptor
 that will contains SQL to execute and node filter.

On each node store will check node filter and execute SQL if node match the
filter.

This will solve first problem - "do not load full database on each node" .

As for second problem - "not ignore non-primary non-backup entries" - I
think this should be solved on a cache level,
 because store does not know anything about primary / backup.

Thoughts?


On Wed, Apr 27, 2016 at 9:27 PM, Vladimir Ozerov <vo...@gridgain.com>
wrote:

> Igniters,
>
> We receive more and more questions about the same problem: "I have big a
> database. How should I load it to Ignite?"
>
> Obviously, users try to use POJO store as a most convenient approach, but
> it cannot handle this case properly.
> 1) If user invoke *IgniteCache.loadCache()*, then the same request - and
> usually this is full table scan - will be invoked on every node leading to
> very poor performance. For instance, we have a report of a load of 47M
> entries to cache on 16 nodes which took ... 8 hours!!!
> 2) If user invoke IgniteCache.localLoadCache(), then our internal cache
> logic will filter out non-primary and non-backup entries. So this approach
> doesn't work either.
> 3) User could try using *IgniteDataStreamer*, but in this case he had to
> deal with all JDBC-related stuff on his own - not convenient.
> 4) Another approach I heard several times - "user should have an attribute
> for affinity in the table ...". And the idea that this way user will be
> able to divide the whole data set into several disjoint sets with specific
> affinity. Doesn't work. Consider the user with some legacy database - the
> most common use case. How is he going to work with affinity?
>
> Bottom line: Ignite has *no convenient way *to load millions of entries
> from a database.
>
> We need to start thinking of possible solutions. Several ideas from my
> side:
>
> 1) POJO store must be much more flexible. We should be able to pass
> different queries to different nodes when calling "loadCache".
>
> 2) Cache store could have additional mode when it will not ignore
> non-primary non-backup entries, but rather *distribute *it to other nodes.
> E.g. with help of data streamer.
>
> Thoughts?
>
> Vladimir.
>



-- 
Alexey Kuznetsov
GridGain Systems
www.gridgain.com