You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Dmitriy Setrakyan <ds...@apache.org> on 2016/07/09 00:04:36 UTC

Re: kick off a discussion

Thanks Sasha!

Resending to the dev list.

D.

On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik <al...@boudnik.org>
wrote:

> Apache Ignite a great platform but it lacks of certain capabilities,
> which are common in RDMS world, such as:
> - Consistent on-line backup for data on entire cluster (or for
> specified set of caches)
> - Hierarchal snapshots for specified set caches
> - Transaction log
> - Restore cluster state as of certain point in time
> - Rolling forward from snapshot with ability to filter/modify transactions
> - Asynchronous replication based either on log shipment or snapshot
> shipment
> -- Between clusters
> -- Continues data export to let’s say RDMS
> It is also a necessity to reduce cold start time for huge clusters
> with strict SLAs.
>
> I'll put some implementation ideas in JIRA later on. I believe that
> this list is far from being complete, but I want the community to
> discuss these abovementioned use cases.
>
> --Sasha
>

Re: kick off a discussion

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Tue, Jul 12, 2016 at 6:44 PM, Konstantin Boudnik <co...@apache.org> wrote:

> On Mon, Jul 11, 2016 at 08:15AM, Dmitriy Setrakyan wrote:
> > >

> I think you mean data center replication here. It is not an easy feature
> to
> > implement, and so far has been handled by commercial vendors of Ignite,
> > e.g. GridGain.
>
> Apache Geode (incubating) has added the DC replication a couple months
> ago, so
> I don't see why Ignite shouldn't?
>
> Cos


Thanks, Cos! I didn’t know that. I will look into it.

Re: kick off a discussion

Posted by Konstantin Boudnik <co...@apache.org>.

On Mon, Jul 11, 2016 at 08:15AM, Dmitriy Setrakyan wrote:
> My answers are inline\u2026
> 
> On Sat, Jul 9, 2016 at 3:04 AM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
> 
> > Thanks Sasha!
> >
> > Resending to the dev list.
> >
> > D.
> >
> > On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik <al...@boudnik.org>
> > wrote:
> >
> >> Apache Ignite a great platform but it lacks of certain capabilities,
> >> which are common in RDMS world, such as:
> >> - Consistent on-line backup for data on entire cluster (or for
> >> specified set of caches)
> >>
> >
> I think you mean data center replication here. It is not an easy feature to
> implement, and so far has been handled by commercial vendors of Ignite,
> e.g. GridGain.

Apache Geode (incubating) has added the DC replication a couple months ago, so
I don't see why Ignite shouldn't?

Cos

> > - Hierarchal snapshots for specified set caches
> >>
> >
> What do you mean by hierarchical?
> 
> 
> > - Transaction log
> >>
> >
> Why does Ignite need it for in-memory transactions?
> 
> 
> > - Restore cluster state as of certain point in time
> >>
> >
> Given that such restorability may introduce lots of memory overhead, does
> it really make sense  for an in-memory cache?
> 
> 
> > - Rolling forward from snapshot with ability to filter/modify transactions
> >>
> >
> Same as above
> 
> 
> > - Asynchronous replication based either on log shipment or snapshot
> >> shipment
> >> -- Between clusters
> >>
> >
> This is the same as data center replication, no?
> 
> 
> > -- Continues data export to let\u2019s say RDMS
> >>
> >
> Don\u2019t we already support it with our write-through feature to a database?
> 
> 
> > It is also a necessity to reduce cold start time for huge clusters
> >> with strict SLAs.
> >>
> >
> What part are you trying to speed up here? Are you talking about loading
> data from databases?
> 
> 
> >
> >> I'll put some implementation ideas in JIRA later on. I believe that
> >> this list is far from being complete, but I want the community to
> >> discuss these abovementioned use cases.
> >>
> >> --Sasha
> >>
> >
> >

Re: kick off a discussion

Posted by Igor Rudyak <ir...@gmail.com>.

Dmitriy,

It looks like Konstantin is talking about specific case, when you specified
readThrough/writeThrough mode for your caches. In a such mode all your
WRITE operations and some portion of READ operation are inevitably
disk-based.

Thus all the suggested enhancements are about readThrough/writeThrough mode
only.

Igor Rudyak

On Thu, Jul 14, 2016 at 3:00 PM, Dmitriy Setrakyan <ds...@apache.org>
wrote:

> On Thu, Jul 14, 2016 at 9:07 PM, Konstantin Boudnik <co...@apache.org>
> wrote:
>
> > On Wed, Jul 13, 2016 at 05:30AM, Dmitriy Setrakyan wrote:
> > > Hi Alex,
> > >
> > > I believe most of your comments have to do with disk-based
> functionality,
> > > especially in regard to backups, snapshots, etc. However, Ignite is
> > > currently an in-memory system, at least for the nearest future. Let me
> > know
> > > if I misunderstood something.
> >
> > And the nearest future is defined by....? This is a collaborative
> project,
> > as
> > you all learned during the incubation, and the statements like "the X
> only
> > does bar for now" should be consensual. If there's a will to work on the
> > new
> > functionality which is demanded by the users, and the said functionality
> is
> > expected to expand the applicability of the technology - I don't really
> see
> > why and how it could be put to hold.
> >
> > Fortunately, there are a number of ways this development could be put
> > through,
> > and it doesn't really require much of the moving parts (in fact it is
> done
> > all
> > the time in the same way right now): let's put the new development on a
> > branch, and start moving. There's JIRA and there's the CI to help to
> > validate
> > and coordinate the work. Sounds like an easy decision to me.
> >
>
> Cos, the nearest future is defined by the community, of course. Take a look
> at the Ignite 2.0 discussion which is taking place on another thread [1].
>
> Any disk-based functionality will require some significant memory-model
> rearchitecture, which is already planned for Ignite 2.0 as part of
> IGNITE-3477 [2] and IGNITE-3478 [3]. I believe Alexey G. has already
> started making significant progress on it. Note that in-memory snapshots
> are already defined as a part of this work.
>
> If the community decides to add disk based features, I am all for it. We
> can start a discussion on it now, but the implementation should come after
> the Ignite 2.0, to avoid any conflicts in architecture, design, or code.
> Just my 0.02 cents.
>
> [1] -
>
> http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-2-0-tasks-roadmap-td9585.html
> [2] - https://issues.apache.org/jira/browse/IGNITE-3477
> [3] - https://issues.apache.org/jira/browse/IGNITE-3478
>
>
> > Cos
> >
> > > On Tue, Jul 12, 2016 at 9:44 PM, Alexandre Boudnik <
> > > alexander.boudnik@gmail.com> wrote:
> > >
> > > > Dmitriy, thank you for your time and questions, which helped me to
> > > > realize what I forget to mentioned!
> > > > See my answers inline; later I'll combine everything together to help
> > > > to the next readers :)
> > > >
> > > > I put together some implementation ideas in Apache Ignite JIRA, as
> > > > promised: https://issues.apache.org/jira/browse/IGNITE-3457. I see
> > > > this facility as another CacheStore implementation, so it wouldn't
> > > > interfere with base principals of Ignite platform.
> > > >
> > > >
> > > > On Mon, Jul 11, 2016 at 1:15 AM, Dmitriy Setrakyan
> > > > <ds...@apache.org> wrote:
> > > > > My answers are inline…
> > > > >
> > > > > On Sat, Jul 9, 2016 at 3:04 AM, Dmitriy Setrakyan <
> > dsetrakyan@apache.org
> > > > >
> > > > > wrote:
> > > > >
> > > > >> Thanks Sasha!
> > > > >>
> > > > >> Resending to the dev list.
> > > > >>
> > > > >> D.
> > > > >>
> > > > >> On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik <
> > > > alexandre@boudnik.org>
> > > > >> wrote:
> > > > >>
> > > > >>> Apache Ignite a great platform but it lacks of certain
> > capabilities,
> > > > >>> which are common in RDMS world, such as:
> > > > >>> - Consistent on-line backup for data on entire cluster (or for
> > > > >>> specified set of caches)
> > > > >>>
> > > > >>
> > > > > I think you mean data center replication here. It is not an easy
> > feature
> > > > to
> > > > > implement, and so far has been handled by commercial vendors of
> > Ignite,
> > > > > e.g. GridGain.
> > > > >
> > > > Actually not. Right here I meant exactly what I said: full or
> > > > incremental backup of all/selected caches in consistent state so it
> > > > can be used for the purpose of being able to restore them in case of
> > > > data loss or data corruption. One of important use cases is the OLAP
> > > > systems (let's say for banking), which has been built on Apache
> Ignite
> > > > platform.
> > > >
> > > > And you right, data center replication can be easily implemented
> based
> > > > on log/snapshot shipment.
> > > >
> > > > >
> > > > >> - Hierarchal snapshots for specified set caches
> > > > >>>
> > > > >>
> > > > > What do you mean by hierarchical?
> > > > >
> > > > In this particular case the notion of hierarchical snapshots is very
> > > > similar to the same notion used in SAN appliances or by Virtual Box
> or
> > > > vmware. Using concept of snapshots we can do all this amazing things:
> > > > - full and incremental backup
> > > > - restore
> > > > - rollback to checkpoint
> > > > - roll forward
> > > > much easier, with minimal memory and I/O overhead.
> > > >
> > > > >
> > > > >> - Transaction log
> > > > >>>
> > > > >>
> > > > > Why does Ignite need it for in-memory transactions?
> > > > >
> > > > At least it is required to provide roll-forward functionality, when
> > > > you restores the state of the cache from checkpoint (the cache state
> > > > before snapshot has been made) and then reapply transactions one by
> > > > one.
> > > >
> > > > >
> > > > >> - Restore cluster state as of certain point in time
> > > > >>>
> > > > >>
> > > > > Given that such restorability may introduce lots of memory
> overhead,
> > does
> > > > > it really make sense  for an in-memory cache?
> > > > >
> > > > Actually, it will not consume any memory. It will use external
> memory,
> > > > such as HDD/SSD space instead. And yes, I think that this
> > > > functionality makes complete sense for our users IRL, who will love
> > > > it.
> > > >
> > > > >
> > > > >> - Rolling forward from snapshot with ability to filter/modify
> > > > transactions
> > > > >>>
> > > > >>
> > > > > Same as above
> > > > >
> > > > The same as above: my customers in trenches are begging for that
> > feature.
> > > >
> > > > >
> > > > >> - Asynchronous replication based either on log shipment or
> snapshot
> > > > >>> shipment
> > > > >>> -- Between clusters
> > > > >>>
> > > > >>
> > > > > This is the same as data center replication, no?
> > > > Including but not limited to: log shipment or snapshot shipment also
> > > > could be used to implement so called
> "better-than-lambda-architecture"
> > > > for BI and OLAP, when data replicated to a query-able datasource
> let's
> > > > say Oracle as soon as they are produced by OLTP system. We can use
> > > > RDBMS API such as Oracle Streams (going to be discontinued - sad) or
> > > > Golden Gate to filter changes from logs/snapshots and then apply
> them.
> > > > That approach allows to save a tons of legacy reports and BI
> > > > dashboards.
> > > >
> > > > >
> > > > >
> > > > >> -- Continues data export to let’s say RDMS
> > > > >>>
> > > > >>
> > > > > Don’t we already support it with our write-through feature to a
> > database?
> > > > >
> > > > When write-through used for non-local caches it may cause the data
> > > > corruption in RDBMS: I have opened this issue a few weeks ago:
> > > > https://issues.apache.org/jira/browse/IGNITE-3321
> > > >
> > > > >
> > > > >> It is also a necessity to reduce cold start time for huge clusters
> > > > >>> with strict SLAs.
> > > > >>>
> > > > >>
> > > > > What part are you trying to speed up here? Are you talking about
> > loading
> > > > > data from databases?
> > > > >
> > > > I'm talking about the initial load from Persistent Store when cluster
> > > > has been cold-started (like from GridGain's Local Recoverable Store).
> > > >
> > > > >
> > > > >>
> > > > >>> I'll put some implementation ideas in JIRA later on. I believe
> that
> > > > >>> this list is far from being complete, but I want the community to
> > > > >>> discuss these abovementioned use cases.
> > > > >>>
> > > > >>> --Sasha
> > > > >>>
> > > > >>
> > > > >>
> > > >
> >
>

Re: kick off a discussion

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Thu, Jul 14, 2016 at 9:07 PM, Konstantin Boudnik <co...@apache.org> wrote:

> On Wed, Jul 13, 2016 at 05:30AM, Dmitriy Setrakyan wrote:
> > Hi Alex,
> >
> > I believe most of your comments have to do with disk-based functionality,
> > especially in regard to backups, snapshots, etc. However, Ignite is
> > currently an in-memory system, at least for the nearest future. Let me
> know
> > if I misunderstood something.
>
> And the nearest future is defined by....? This is a collaborative project,
> as
> you all learned during the incubation, and the statements like "the X only
> does bar for now" should be consensual. If there's a will to work on the
> new
> functionality which is demanded by the users, and the said functionality is
> expected to expand the applicability of the technology - I don't really see
> why and how it could be put to hold.
>
> Fortunately, there are a number of ways this development could be put
> through,
> and it doesn't really require much of the moving parts (in fact it is done
> all
> the time in the same way right now): let's put the new development on a
> branch, and start moving. There's JIRA and there's the CI to help to
> validate
> and coordinate the work. Sounds like an easy decision to me.
>

Cos, the nearest future is defined by the community, of course. Take a look
at the Ignite 2.0 discussion which is taking place on another thread [1].

Any disk-based functionality will require some significant memory-model
rearchitecture, which is already planned for Ignite 2.0 as part of
IGNITE-3477 [2] and IGNITE-3478 [3]. I believe Alexey G. has already
started making significant progress on it. Note that in-memory snapshots
are already defined as a part of this work.

If the community decides to add disk based features, I am all for it. We
can start a discussion on it now, but the implementation should come after
the Ignite 2.0, to avoid any conflicts in architecture, design, or code.
Just my 0.02 cents.

[1] -
http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-2-0-tasks-roadmap-td9585.html
[2] - https://issues.apache.org/jira/browse/IGNITE-3477
[3] - https://issues.apache.org/jira/browse/IGNITE-3478


> Cos
>
> > On Tue, Jul 12, 2016 at 9:44 PM, Alexandre Boudnik <
> > alexander.boudnik@gmail.com> wrote:
> >
> > > Dmitriy, thank you for your time and questions, which helped me to
> > > realize what I forget to mentioned!
> > > See my answers inline; later I'll combine everything together to help
> > > to the next readers :)
> > >
> > > I put together some implementation ideas in Apache Ignite JIRA, as
> > > promised: https://issues.apache.org/jira/browse/IGNITE-3457. I see
> > > this facility as another CacheStore implementation, so it wouldn't
> > > interfere with base principals of Ignite platform.
> > >
> > >
> > > On Mon, Jul 11, 2016 at 1:15 AM, Dmitriy Setrakyan
> > > <ds...@apache.org> wrote:
> > > > My answers are inline…
> > > >
> > > > On Sat, Jul 9, 2016 at 3:04 AM, Dmitriy Setrakyan <
> dsetrakyan@apache.org
> > > >
> > > > wrote:
> > > >
> > > >> Thanks Sasha!
> > > >>
> > > >> Resending to the dev list.
> > > >>
> > > >> D.
> > > >>
> > > >> On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik <
> > > alexandre@boudnik.org>
> > > >> wrote:
> > > >>
> > > >>> Apache Ignite a great platform but it lacks of certain
> capabilities,
> > > >>> which are common in RDMS world, such as:
> > > >>> - Consistent on-line backup for data on entire cluster (or for
> > > >>> specified set of caches)
> > > >>>
> > > >>
> > > > I think you mean data center replication here. It is not an easy
> feature
> > > to
> > > > implement, and so far has been handled by commercial vendors of
> Ignite,
> > > > e.g. GridGain.
> > > >
> > > Actually not. Right here I meant exactly what I said: full or
> > > incremental backup of all/selected caches in consistent state so it
> > > can be used for the purpose of being able to restore them in case of
> > > data loss or data corruption. One of important use cases is the OLAP
> > > systems (let's say for banking), which has been built on Apache Ignite
> > > platform.
> > >
> > > And you right, data center replication can be easily implemented based
> > > on log/snapshot shipment.
> > >
> > > >
> > > >> - Hierarchal snapshots for specified set caches
> > > >>>
> > > >>
> > > > What do you mean by hierarchical?
> > > >
> > > In this particular case the notion of hierarchical snapshots is very
> > > similar to the same notion used in SAN appliances or by Virtual Box or
> > > vmware. Using concept of snapshots we can do all this amazing things:
> > > - full and incremental backup
> > > - restore
> > > - rollback to checkpoint
> > > - roll forward
> > > much easier, with minimal memory and I/O overhead.
> > >
> > > >
> > > >> - Transaction log
> > > >>>
> > > >>
> > > > Why does Ignite need it for in-memory transactions?
> > > >
> > > At least it is required to provide roll-forward functionality, when
> > > you restores the state of the cache from checkpoint (the cache state
> > > before snapshot has been made) and then reapply transactions one by
> > > one.
> > >
> > > >
> > > >> - Restore cluster state as of certain point in time
> > > >>>
> > > >>
> > > > Given that such restorability may introduce lots of memory overhead,
> does
> > > > it really make sense  for an in-memory cache?
> > > >
> > > Actually, it will not consume any memory. It will use external memory,
> > > such as HDD/SSD space instead. And yes, I think that this
> > > functionality makes complete sense for our users IRL, who will love
> > > it.
> > >
> > > >
> > > >> - Rolling forward from snapshot with ability to filter/modify
> > > transactions
> > > >>>
> > > >>
> > > > Same as above
> > > >
> > > The same as above: my customers in trenches are begging for that
> feature.
> > >
> > > >
> > > >> - Asynchronous replication based either on log shipment or snapshot
> > > >>> shipment
> > > >>> -- Between clusters
> > > >>>
> > > >>
> > > > This is the same as data center replication, no?
> > > Including but not limited to: log shipment or snapshot shipment also
> > > could be used to implement so called "better-than-lambda-architecture"
> > > for BI and OLAP, when data replicated to a query-able datasource let's
> > > say Oracle as soon as they are produced by OLTP system. We can use
> > > RDBMS API such as Oracle Streams (going to be discontinued - sad) or
> > > Golden Gate to filter changes from logs/snapshots and then apply them.
> > > That approach allows to save a tons of legacy reports and BI
> > > dashboards.
> > >
> > > >
> > > >
> > > >> -- Continues data export to let’s say RDMS
> > > >>>
> > > >>
> > > > Don’t we already support it with our write-through feature to a
> database?
> > > >
> > > When write-through used for non-local caches it may cause the data
> > > corruption in RDBMS: I have opened this issue a few weeks ago:
> > > https://issues.apache.org/jira/browse/IGNITE-3321
> > >
> > > >
> > > >> It is also a necessity to reduce cold start time for huge clusters
> > > >>> with strict SLAs.
> > > >>>
> > > >>
> > > > What part are you trying to speed up here? Are you talking about
> loading
> > > > data from databases?
> > > >
> > > I'm talking about the initial load from Persistent Store when cluster
> > > has been cold-started (like from GridGain's Local Recoverable Store).
> > >
> > > >
> > > >>
> > > >>> I'll put some implementation ideas in JIRA later on. I believe that
> > > >>> this list is far from being complete, but I want the community to
> > > >>> discuss these abovementioned use cases.
> > > >>>
> > > >>> --Sasha
> > > >>>
> > > >>
> > > >>
> > >
>

Re: kick off a discussion

Posted by Konstantin Boudnik <co...@apache.org>.

On Wed, Jul 13, 2016 at 05:30AM, Dmitriy Setrakyan wrote:
> Hi Alex,
> 
> I believe most of your comments have to do with disk-based functionality,
> especially in regard to backups, snapshots, etc. However, Ignite is
> currently an in-memory system, at least for the nearest future. Let me know
> if I misunderstood something.

And the nearest future is defined by....? This is a collaborative project, as
you all learned during the incubation, and the statements like "the X only
does bar for now" should be consensual. If there's a will to work on the new
functionality which is demanded by the users, and the said functionality is
expected to expand the applicability of the technology - I don't really see
why and how it could be put to hold.

Fortunately, there are a number of ways this development could be put through,
and it doesn't really require much of the moving parts (in fact it is done all
the time in the same way right now): let's put the new development on a
branch, and start moving. There's JIRA and there's the CI to help to validate
and coordinate the work. Sounds like an easy decision to me.

Cos

> On Tue, Jul 12, 2016 at 9:44 PM, Alexandre Boudnik <
> alexander.boudnik@gmail.com> wrote:
> 
> > Dmitriy, thank you for your time and questions, which helped me to
> > realize what I forget to mentioned!
> > See my answers inline; later I'll combine everything together to help
> > to the next readers :)
> >
> > I put together some implementation ideas in Apache Ignite JIRA, as
> > promised: https://issues.apache.org/jira/browse/IGNITE-3457. I see
> > this facility as another CacheStore implementation, so it wouldn't
> > interfere with base principals of Ignite platform.
> >
> >
> > On Mon, Jul 11, 2016 at 1:15 AM, Dmitriy Setrakyan
> > <ds...@apache.org> wrote:
> > > My answers are inline\u2026
> > >
> > > On Sat, Jul 9, 2016 at 3:04 AM, Dmitriy Setrakyan <dsetrakyan@apache.org
> > >
> > > wrote:
> > >
> > >> Thanks Sasha!
> > >>
> > >> Resending to the dev list.
> > >>
> > >> D.
> > >>
> > >> On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik <
> > alexandre@boudnik.org>
> > >> wrote:
> > >>
> > >>> Apache Ignite a great platform but it lacks of certain capabilities,
> > >>> which are common in RDMS world, such as:
> > >>> - Consistent on-line backup for data on entire cluster (or for
> > >>> specified set of caches)
> > >>>
> > >>
> > > I think you mean data center replication here. It is not an easy feature
> > to
> > > implement, and so far has been handled by commercial vendors of Ignite,
> > > e.g. GridGain.
> > >
> > Actually not. Right here I meant exactly what I said: full or
> > incremental backup of all/selected caches in consistent state so it
> > can be used for the purpose of being able to restore them in case of
> > data loss or data corruption. One of important use cases is the OLAP
> > systems (let's say for banking), which has been built on Apache Ignite
> > platform.
> >
> > And you right, data center replication can be easily implemented based
> > on log/snapshot shipment.
> >
> > >
> > >> - Hierarchal snapshots for specified set caches
> > >>>
> > >>
> > > What do you mean by hierarchical?
> > >
> > In this particular case the notion of hierarchical snapshots is very
> > similar to the same notion used in SAN appliances or by Virtual Box or
> > vmware. Using concept of snapshots we can do all this amazing things:
> > - full and incremental backup
> > - restore
> > - rollback to checkpoint
> > - roll forward
> > much easier, with minimal memory and I/O overhead.
> >
> > >
> > >> - Transaction log
> > >>>
> > >>
> > > Why does Ignite need it for in-memory transactions?
> > >
> > At least it is required to provide roll-forward functionality, when
> > you restores the state of the cache from checkpoint (the cache state
> > before snapshot has been made) and then reapply transactions one by
> > one.
> >
> > >
> > >> - Restore cluster state as of certain point in time
> > >>>
> > >>
> > > Given that such restorability may introduce lots of memory overhead, does
> > > it really make sense  for an in-memory cache?
> > >
> > Actually, it will not consume any memory. It will use external memory,
> > such as HDD/SSD space instead. And yes, I think that this
> > functionality makes complete sense for our users IRL, who will love
> > it.
> >
> > >
> > >> - Rolling forward from snapshot with ability to filter/modify
> > transactions
> > >>>
> > >>
> > > Same as above
> > >
> > The same as above: my customers in trenches are begging for that feature.
> >
> > >
> > >> - Asynchronous replication based either on log shipment or snapshot
> > >>> shipment
> > >>> -- Between clusters
> > >>>
> > >>
> > > This is the same as data center replication, no?
> > Including but not limited to: log shipment or snapshot shipment also
> > could be used to implement so called "better-than-lambda-architecture"
> > for BI and OLAP, when data replicated to a query-able datasource let's
> > say Oracle as soon as they are produced by OLTP system. We can use
> > RDBMS API such as Oracle Streams (going to be discontinued - sad) or
> > Golden Gate to filter changes from logs/snapshots and then apply them.
> > That approach allows to save a tons of legacy reports and BI
> > dashboards.
> >
> > >
> > >
> > >> -- Continues data export to let\u2019s say RDMS
> > >>>
> > >>
> > > Don\u2019t we already support it with our write-through feature to a database?
> > >
> > When write-through used for non-local caches it may cause the data
> > corruption in RDBMS: I have opened this issue a few weeks ago:
> > https://issues.apache.org/jira/browse/IGNITE-3321
> >
> > >
> > >> It is also a necessity to reduce cold start time for huge clusters
> > >>> with strict SLAs.
> > >>>
> > >>
> > > What part are you trying to speed up here? Are you talking about loading
> > > data from databases?
> > >
> > I'm talking about the initial load from Persistent Store when cluster
> > has been cold-started (like from GridGain's Local Recoverable Store).
> >
> > >
> > >>
> > >>> I'll put some implementation ideas in JIRA later on. I believe that
> > >>> this list is far from being complete, but I want the community to
> > >>> discuss these abovementioned use cases.
> > >>>
> > >>> --Sasha
> > >>>
> > >>
> > >>
> >

Re: kick off a discussion

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Hi Alex,

I believe most of your comments have to do with disk-based functionality,
especially in regard to backups, snapshots, etc. However, Ignite is
currently an in-memory system, at least for the nearest future. Let me know
if I misunderstood something.

D.

On Tue, Jul 12, 2016 at 9:44 PM, Alexandre Boudnik <
alexander.boudnik@gmail.com> wrote:

> Dmitriy, thank you for your time and questions, which helped me to
> realize what I forget to mentioned!
> See my answers inline; later I'll combine everything together to help
> to the next readers :)
>
> I put together some implementation ideas in Apache Ignite JIRA, as
> promised: https://issues.apache.org/jira/browse/IGNITE-3457. I see
> this facility as another CacheStore implementation, so it wouldn't
> interfere with base principals of Ignite platform.
>
>
> On Mon, Jul 11, 2016 at 1:15 AM, Dmitriy Setrakyan
> <ds...@apache.org> wrote:
> > My answers are inline…
> >
> > On Sat, Jul 9, 2016 at 3:04 AM, Dmitriy Setrakyan <dsetrakyan@apache.org
> >
> > wrote:
> >
> >> Thanks Sasha!
> >>
> >> Resending to the dev list.
> >>
> >> D.
> >>
> >> On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik <
> alexandre@boudnik.org>
> >> wrote:
> >>
> >>> Apache Ignite a great platform but it lacks of certain capabilities,
> >>> which are common in RDMS world, such as:
> >>> - Consistent on-line backup for data on entire cluster (or for
> >>> specified set of caches)
> >>>
> >>
> > I think you mean data center replication here. It is not an easy feature
> to
> > implement, and so far has been handled by commercial vendors of Ignite,
> > e.g. GridGain.
> >
> Actually not. Right here I meant exactly what I said: full or
> incremental backup of all/selected caches in consistent state so it
> can be used for the purpose of being able to restore them in case of
> data loss or data corruption. One of important use cases is the OLAP
> systems (let's say for banking), which has been built on Apache Ignite
> platform.
>
> And you right, data center replication can be easily implemented based
> on log/snapshot shipment.
>
> >
> >> - Hierarchal snapshots for specified set caches
> >>>
> >>
> > What do you mean by hierarchical?
> >
> In this particular case the notion of hierarchical snapshots is very
> similar to the same notion used in SAN appliances or by Virtual Box or
> vmware. Using concept of snapshots we can do all this amazing things:
> - full and incremental backup
> - restore
> - rollback to checkpoint
> - roll forward
> much easier, with minimal memory and I/O overhead.
>
> >
> >> - Transaction log
> >>>
> >>
> > Why does Ignite need it for in-memory transactions?
> >
> At least it is required to provide roll-forward functionality, when
> you restores the state of the cache from checkpoint (the cache state
> before snapshot has been made) and then reapply transactions one by
> one.
>
> >
> >> - Restore cluster state as of certain point in time
> >>>
> >>
> > Given that such restorability may introduce lots of memory overhead, does
> > it really make sense  for an in-memory cache?
> >
> Actually, it will not consume any memory. It will use external memory,
> such as HDD/SSD space instead. And yes, I think that this
> functionality makes complete sense for our users IRL, who will love
> it.
>
> >
> >> - Rolling forward from snapshot with ability to filter/modify
> transactions
> >>>
> >>
> > Same as above
> >
> The same as above: my customers in trenches are begging for that feature.
>
> >
> >> - Asynchronous replication based either on log shipment or snapshot
> >>> shipment
> >>> -- Between clusters
> >>>
> >>
> > This is the same as data center replication, no?
> Including but not limited to: log shipment or snapshot shipment also
> could be used to implement so called "better-than-lambda-architecture"
> for BI and OLAP, when data replicated to a query-able datasource let's
> say Oracle as soon as they are produced by OLTP system. We can use
> RDBMS API such as Oracle Streams (going to be discontinued - sad) or
> Golden Gate to filter changes from logs/snapshots and then apply them.
> That approach allows to save a tons of legacy reports and BI
> dashboards.
>
> >
> >
> >> -- Continues data export to let’s say RDMS
> >>>
> >>
> > Don’t we already support it with our write-through feature to a database?
> >
> When write-through used for non-local caches it may cause the data
> corruption in RDBMS: I have opened this issue a few weeks ago:
> https://issues.apache.org/jira/browse/IGNITE-3321
>
> >
> >> It is also a necessity to reduce cold start time for huge clusters
> >>> with strict SLAs.
> >>>
> >>
> > What part are you trying to speed up here? Are you talking about loading
> > data from databases?
> >
> I'm talking about the initial load from Persistent Store when cluster
> has been cold-started (like from GridGain's Local Recoverable Store).
>
> >
> >>
> >>> I'll put some implementation ideas in JIRA later on. I believe that
> >>> this list is far from being complete, but I want the community to
> >>> discuss these abovementioned use cases.
> >>>
> >>> --Sasha
> >>>
> >>
> >>
>

Re: kick off a discussion

Posted by Alexandre Boudnik <al...@gmail.com>.

Dmitriy, thank you for your time and questions, which helped me to
realize what I forget to mentioned!
See my answers inline; later I'll combine everything together to help
to the next readers :)

I put together some implementation ideas in Apache Ignite JIRA, as
promised: https://issues.apache.org/jira/browse/IGNITE-3457. I see
this facility as another CacheStore implementation, so it wouldn't
interfere with base principals of Ignite platform.


On Mon, Jul 11, 2016 at 1:15 AM, Dmitriy Setrakyan
<ds...@apache.org> wrote:
> My answers are inline…
>
> On Sat, Jul 9, 2016 at 3:04 AM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
>
>> Thanks Sasha!
>>
>> Resending to the dev list.
>>
>> D.
>>
>> On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik <al...@boudnik.org>
>> wrote:
>>
>>> Apache Ignite a great platform but it lacks of certain capabilities,
>>> which are common in RDMS world, such as:
>>> - Consistent on-line backup for data on entire cluster (or for
>>> specified set of caches)
>>>
>>
> I think you mean data center replication here. It is not an easy feature to
> implement, and so far has been handled by commercial vendors of Ignite,
> e.g. GridGain.
>
Actually not. Right here I meant exactly what I said: full or
incremental backup of all/selected caches in consistent state so it
can be used for the purpose of being able to restore them in case of
data loss or data corruption. One of important use cases is the OLAP
systems (let's say for banking), which has been built on Apache Ignite
platform.

And you right, data center replication can be easily implemented based
on log/snapshot shipment.

>
>> - Hierarchal snapshots for specified set caches
>>>
>>
> What do you mean by hierarchical?
>
In this particular case the notion of hierarchical snapshots is very
similar to the same notion used in SAN appliances or by Virtual Box or
vmware. Using concept of snapshots we can do all this amazing things:
- full and incremental backup
- restore
- rollback to checkpoint
- roll forward
much easier, with minimal memory and I/O overhead.

>
>> - Transaction log
>>>
>>
> Why does Ignite need it for in-memory transactions?
>
At least it is required to provide roll-forward functionality, when
you restores the state of the cache from checkpoint (the cache state
before snapshot has been made) and then reapply transactions one by
one.

>
>> - Restore cluster state as of certain point in time
>>>
>>
> Given that such restorability may introduce lots of memory overhead, does
> it really make sense  for an in-memory cache?
>
Actually, it will not consume any memory. It will use external memory,
such as HDD/SSD space instead. And yes, I think that this
functionality makes complete sense for our users IRL, who will love
it.

>
>> - Rolling forward from snapshot with ability to filter/modify transactions
>>>
>>
> Same as above
>
The same as above: my customers in trenches are begging for that feature.

>
>> - Asynchronous replication based either on log shipment or snapshot
>>> shipment
>>> -- Between clusters
>>>
>>
> This is the same as data center replication, no?
Including but not limited to: log shipment or snapshot shipment also
could be used to implement so called "better-than-lambda-architecture"
for BI and OLAP, when data replicated to a query-able datasource let's
say Oracle as soon as they are produced by OLTP system. We can use
RDBMS API such as Oracle Streams (going to be discontinued - sad) or
Golden Gate to filter changes from logs/snapshots and then apply them.
That approach allows to save a tons of legacy reports and BI
dashboards.

>
>
>> -- Continues data export to let’s say RDMS
>>>
>>
> Don’t we already support it with our write-through feature to a database?
>
When write-through used for non-local caches it may cause the data
corruption in RDBMS: I have opened this issue a few weeks ago:
https://issues.apache.org/jira/browse/IGNITE-3321

>
>> It is also a necessity to reduce cold start time for huge clusters
>>> with strict SLAs.
>>>
>>
> What part are you trying to speed up here? Are you talking about loading
> data from databases?
>
I'm talking about the initial load from Persistent Store when cluster
has been cold-started (like from GridGain's Local Recoverable Store).

>
>>
>>> I'll put some implementation ideas in JIRA later on. I believe that
>>> this list is far from being complete, but I want the community to
>>> discuss these abovementioned use cases.
>>>
>>> --Sasha
>>>
>>
>>

Re: kick off a discussion

Posted by Dmitriy Setrakyan <ds...@apache.org>.

My answers are inline…

On Sat, Jul 9, 2016 at 3:04 AM, Dmitriy Setrakyan <ds...@apache.org>
wrote:

> Thanks Sasha!
>
> Resending to the dev list.
>
> D.
>
> On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik <al...@boudnik.org>
> wrote:
>
>> Apache Ignite a great platform but it lacks of certain capabilities,
>> which are common in RDMS world, such as:
>> - Consistent on-line backup for data on entire cluster (or for
>> specified set of caches)
>>
>
I think you mean data center replication here. It is not an easy feature to
implement, and so far has been handled by commercial vendors of Ignite,
e.g. GridGain.


> - Hierarchal snapshots for specified set caches
>>
>
What do you mean by hierarchical?


> - Transaction log
>>
>
Why does Ignite need it for in-memory transactions?


> - Restore cluster state as of certain point in time
>>
>
Given that such restorability may introduce lots of memory overhead, does
it really make sense  for an in-memory cache?


> - Rolling forward from snapshot with ability to filter/modify transactions
>>
>
Same as above


> - Asynchronous replication based either on log shipment or snapshot
>> shipment
>> -- Between clusters
>>
>
This is the same as data center replication, no?


> -- Continues data export to let’s say RDMS
>>
>
Don’t we already support it with our write-through feature to a database?


> It is also a necessity to reduce cold start time for huge clusters
>> with strict SLAs.
>>
>
What part are you trying to speed up here? Are you talking about loading
data from databases?


>
>> I'll put some implementation ideas in JIRA later on. I believe that
>> this list is far from being complete, but I want the community to
>> discuss these abovementioned use cases.
>>
>> --Sasha
>>
>
>