You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Konstantin Boudnik <co...@apache.org> on 2015/12/22 09:32:12 UTC

Dynamic caches creation

Guys,

is it possible to configure caches dynamically and persist their configuration
in some shape and form? Here's the use case:
 - we want to create some caches in the already running cluster for a data set
 - once it is done, we'll run some SQL queries on top of those
 - ideally, we'd like to be able to safe the cache configurations so next
   time, we don't need to do the structures and field types discovery, but
   instead be able to load it on (re)start

Is this a supported use-case or everything should be defined statically before
nodes start? Looks like the latter, but perhaps we are missing something.

Thanks in advance for any info,
  Cos

Re: Dynamic caches creation

Posted by Alexey Kuznetsov <ak...@gridgain.com>.

>>Basically, I am trying to think of how to make this whole thing more
>>user-friendly and less middleware developers oriented. Wouldn't it be
great if
>>a user can load some external data and immediately start playing with it
and
>>doing some OLAP or even - oh horror :) - OLTP on it?

I will describe how this could be done with Visor GUI from GridGain (based
on Ignite).
In Visor GUI you can start any cache from UI by specifying XML description
of cache.
User can specify JDBC POJO Store factory in XML cache description.
And in special dialog "Load from store" load any data into cache from RDBMS
using "select .. from ... where ..." queries.
After that user can go to SQL tab and perform any SQL queries on loaded
data.
But this require that data sources beans should be already described in
nodes XML configs in order to specify data source bean name in store
factory.

You may implement in you app smth. like I described, all needed
functionality available in Ignite.

In your app instead of RDBM you may load data from another source.

-- 
Alexey Kuznetsov
GridGain Systems
www.gridgain.com

Re: Dynamic caches creation

Posted by Konstantin Boudnik <co...@apache.org>.

Yup, that makes a lot of sense. I've seen the conversation about the
marshaller, but somehow I blinded it off. Lemme check more on this

Appreciate the pointer!
  Cos

On Thu, Dec 24, 2015 at 07:28AM, Alexey Kuznetsov wrote:
> Cos,
> 
> In ignite-1.5 we introduced new binary marshaller.
> And now you could load data into cache without having POJO classes on
> server nodes.
> Just describe them as JdbcTypes in CacheJdbcPojoStoreFactory.
> The only thing that should be predefined on node start - is a data
> source(s),
> but I believe that data source is a kind of thing that is known when you
> design app.
> 
> Make sense?

Re: Dynamic caches creation

Posted by Alexey Kuznetsov <ak...@gridgain.com>.

Cos,

In ignite-1.5 we introduced new binary marshaller.
And now you could load data into cache without having POJO classes on
server nodes.
Just describe them as JdbcTypes in CacheJdbcPojoStoreFactory.
The only thing that should be predefined on node start - is a data
source(s),
but I believe that data source is a kind of thing that is known when you
design app.

Make sense?

Re: Dynamic caches creation

Posted by Konstantin Boudnik <co...@apache.org>.

Let me try to restate what I've said earlier.

- I have a running cluster
- I want to create a new cache, configuration of which is totally unknown
  beforehand. Would be nice if I can do it by executing some custom java code
  from a client node
- I want to be able to load POJO model classes _without_ restating the nodes,
  but simply from say a network location (URL class loading?)
- once the cache is loaded and provisioned with some data (ie from stream?) I
  want to be able to store its configuration, so the next time I don't need to
  write and execute client Java code (as in the step above)

Am I making myself more clear now?

I believe the main snag, preventing us from getting on the same page is this.
Ignite as it stands right now, is very much oriented on the static cluster and
caches configurations. Even the auto-discovery from an external RDBMS is
semi-static as it requires some user input and pre-existing configuration
files. And that's the main issue IMO - the configuration is fundamentally
defines not only the caches, but the whole cluster. If I want to use slightly
different config to join a node to an existing cluster, it will be rejected.
In other words, the nodes are very homogeneous.

I am not saying it is a bad thing, but it is clearly a limiting factor for the
cases like I described above.

Cheers,
  Cos

On Wed, Dec 23, 2015 at 12:47AM, Dmitriy Setrakyan wrote:
> Cos, I am confused. What is the behavior you would like to see?
> 
> On Wed, Dec 23, 2015 at 12:01 AM, Konstantin Boudnik <co...@apache.org> wrote:
> 
> > What if I don't know the configuration in advance? Doesn't it mean that I
> > would have to restart the nodes whenever a new cache is configured and
> > needs
> > to be added to the cluster?
> >
> > Cos
> >
> > On Tue, Dec 22, 2015 at 10:37PM, Dmitriy Setrakyan wrote:
> > > Cos,
> > >
> > > As far as schema-on-read, you can set all your caches in XML
> > configuration,
> > > and they will be pre-created for you. Will this do the trick?
> > >
> > > D.
> > >
> > > On Tue, Dec 22, 2015 at 7:30 PM, Konstantin Boudnik <co...@apache.org>
> > wrote:
> > >
> > > > On Tue, Dec 22, 2015 at 03:45PM, Alexey Kuznetsov wrote:
> > > > > Cos,
> > > > >
> > > > > How you are going to create caches in already started cluster?
> > > > > I think you will create a cache configuration and after that will get
> > > > cache
> > > > > via Ignite.getOrCreateCache(ccfg).
> > > >
> > > > I would imagine that I can write some Java code to describe the
> > > > configuration
> > > > if needed. After all, Spring is all over the place. And while I am not
> > a
> > > > big
> > > > fun of overusing it, it's already there, so perhaps it can do something
> > > > useful ;)
> > > >
> > > > > So if you server cluster at some point was completely restarted, then
> > > > > executing getOrCreateCache(ccfg) will create cache again (if needed)
> > or
> > > > > return existing cache.
> > > >
> > > > I am trying to have an analogy with either RDBMS or a data processing
> > > > framework. I know the both of those aren't exact, but bear with me for
> > a
> > > > second. In RDBMS world the UX is to be able either to query existing
> > > > tables or
> > > > to create, populate and query new ones. No special auxiliary
> > configurations
> > > > are needed. In a data processing frameworks like Spark, Flink, etc. the
> > > > data
> > > > is originating the schema (aka schema on read) thus no special
> > preparation
> > > > steps is needed before the data could be read from a storage and
> > processed.
> > > >
> > > > Now, in the case of Ignite the data needs to be transferred to a RAM,
> > > > however
> > > > same schema-on-read (a parsing code) or an externalized metadata
> > (stored
> > > > config or something) could be used to structure it on-the-fly. Hence,
> > my
> > > > cluster would have a higher level of runtime dynamic as I now I can
> > create
> > > > new
> > > > caches as I go, without restarting the cluster nodes on every sneeze.
> > > >
> > > > > Also AFAIK CacheConfiguration class is serializable - you can save it
> > > > > somewhere and later load if needed.
> > > > > Or you may define some XML files with cache beans and load them with
> > > > > IgniteSpringHelper.
> > > > >
> > > > > Thoughts?
> > > >
> > > > Basically, I am trying to think of how to make this whole thing more
> > > > user-friendly and less middleware developers oriented. Wouldn't it be
> > > > great if
> > > > a user can load some external data and immediately start playing with
> > it
> > > > and
> > > > doing some OLAP or even - oh horror :) - OLTP on it?
> > > >
> > > > Does it make sense?
> > > >   Cos
> > > >
> > > > > On Tue, Dec 22, 2015 at 3:32 PM, Konstantin Boudnik <co...@apache.org>
> > > > wrote:
> > > > >
> > > > > > Guys,
> > > > > >
> > > > > > is it possible to configure caches dynamically and persist their
> > > > > > configuration
> > > > > > in some shape and form? Here's the use case:
> > > > > >  - we want to create some caches in the already running cluster
> > for a
> > > > data
> > > > > > set
> > > > > >  - once it is done, we'll run some SQL queries on top of those
> > > > > >  - ideally, we'd like to be able to safe the cache configurations
> > so
> > > > next
> > > > > >    time, we don't need to do the structures and field types
> > discovery,
> > > > but
> > > > > >    instead be able to load it on (re)start
> > > > > >
> > > > > > Is this a supported use-case or everything should be defined
> > statically
> > > > > > before
> > > > > > nodes start? Looks like the latter, but perhaps we are missing
> > > > something.
> > > > > >
> > > > > > Thanks in advance for any info,
> > > > > >   Cos
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Alexey Kuznetsov
> > > > > GridGain Systems
> > > > > www.gridgain.com
> > > >
> >

Re: Dynamic caches creation

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Cos, I am confused. What is the behavior you would like to see?

On Wed, Dec 23, 2015 at 12:01 AM, Konstantin Boudnik <co...@apache.org> wrote:

> What if I don't know the configuration in advance? Doesn't it mean that I
> would have to restart the nodes whenever a new cache is configured and
> needs
> to be added to the cluster?
>
> Cos
>
> On Tue, Dec 22, 2015 at 10:37PM, Dmitriy Setrakyan wrote:
> > Cos,
> >
> > As far as schema-on-read, you can set all your caches in XML
> configuration,
> > and they will be pre-created for you. Will this do the trick?
> >
> > D.
> >
> > On Tue, Dec 22, 2015 at 7:30 PM, Konstantin Boudnik <co...@apache.org>
> wrote:
> >
> > > On Tue, Dec 22, 2015 at 03:45PM, Alexey Kuznetsov wrote:
> > > > Cos,
> > > >
> > > > How you are going to create caches in already started cluster?
> > > > I think you will create a cache configuration and after that will get
> > > cache
> > > > via Ignite.getOrCreateCache(ccfg).
> > >
> > > I would imagine that I can write some Java code to describe the
> > > configuration
> > > if needed. After all, Spring is all over the place. And while I am not
> a
> > > big
> > > fun of overusing it, it's already there, so perhaps it can do something
> > > useful ;)
> > >
> > > > So if you server cluster at some point was completely restarted, then
> > > > executing getOrCreateCache(ccfg) will create cache again (if needed)
> or
> > > > return existing cache.
> > >
> > > I am trying to have an analogy with either RDBMS or a data processing
> > > framework. I know the both of those aren't exact, but bear with me for
> a
> > > second. In RDBMS world the UX is to be able either to query existing
> > > tables or
> > > to create, populate and query new ones. No special auxiliary
> configurations
> > > are needed. In a data processing frameworks like Spark, Flink, etc. the
> > > data
> > > is originating the schema (aka schema on read) thus no special
> preparation
> > > steps is needed before the data could be read from a storage and
> processed.
> > >
> > > Now, in the case of Ignite the data needs to be transferred to a RAM,
> > > however
> > > same schema-on-read (a parsing code) or an externalized metadata
> (stored
> > > config or something) could be used to structure it on-the-fly. Hence,
> my
> > > cluster would have a higher level of runtime dynamic as I now I can
> create
> > > new
> > > caches as I go, without restarting the cluster nodes on every sneeze.
> > >
> > > > Also AFAIK CacheConfiguration class is serializable - you can save it
> > > > somewhere and later load if needed.
> > > > Or you may define some XML files with cache beans and load them with
> > > > IgniteSpringHelper.
> > > >
> > > > Thoughts?
> > >
> > > Basically, I am trying to think of how to make this whole thing more
> > > user-friendly and less middleware developers oriented. Wouldn't it be
> > > great if
> > > a user can load some external data and immediately start playing with
> it
> > > and
> > > doing some OLAP or even - oh horror :) - OLTP on it?
> > >
> > > Does it make sense?
> > >   Cos
> > >
> > > > On Tue, Dec 22, 2015 at 3:32 PM, Konstantin Boudnik <co...@apache.org>
> > > wrote:
> > > >
> > > > > Guys,
> > > > >
> > > > > is it possible to configure caches dynamically and persist their
> > > > > configuration
> > > > > in some shape and form? Here's the use case:
> > > > >  - we want to create some caches in the already running cluster
> for a
> > > data
> > > > > set
> > > > >  - once it is done, we'll run some SQL queries on top of those
> > > > >  - ideally, we'd like to be able to safe the cache configurations
> so
> > > next
> > > > >    time, we don't need to do the structures and field types
> discovery,
> > > but
> > > > >    instead be able to load it on (re)start
> > > > >
> > > > > Is this a supported use-case or everything should be defined
> statically
> > > > > before
> > > > > nodes start? Looks like the latter, but perhaps we are missing
> > > something.
> > > > >
> > > > > Thanks in advance for any info,
> > > > >   Cos
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Alexey Kuznetsov
> > > > GridGain Systems
> > > > www.gridgain.com
> > >
>

Re: Dynamic caches creation

Posted by Konstantin Boudnik <co...@apache.org>.

What if I don't know the configuration in advance? Doesn't it mean that I
would have to restart the nodes whenever a new cache is configured and needs
to be added to the cluster?

Cos

On Tue, Dec 22, 2015 at 10:37PM, Dmitriy Setrakyan wrote:
> Cos,
> 
> As far as schema-on-read, you can set all your caches in XML configuration,
> and they will be pre-created for you. Will this do the trick?
> 
> D.
> 
> On Tue, Dec 22, 2015 at 7:30 PM, Konstantin Boudnik <co...@apache.org> wrote:
> 
> > On Tue, Dec 22, 2015 at 03:45PM, Alexey Kuznetsov wrote:
> > > Cos,
> > >
> > > How you are going to create caches in already started cluster?
> > > I think you will create a cache configuration and after that will get
> > cache
> > > via Ignite.getOrCreateCache(ccfg).
> >
> > I would imagine that I can write some Java code to describe the
> > configuration
> > if needed. After all, Spring is all over the place. And while I am not a
> > big
> > fun of overusing it, it's already there, so perhaps it can do something
> > useful ;)
> >
> > > So if you server cluster at some point was completely restarted, then
> > > executing getOrCreateCache(ccfg) will create cache again (if needed) or
> > > return existing cache.
> >
> > I am trying to have an analogy with either RDBMS or a data processing
> > framework. I know the both of those aren't exact, but bear with me for a
> > second. In RDBMS world the UX is to be able either to query existing
> > tables or
> > to create, populate and query new ones. No special auxiliary configurations
> > are needed. In a data processing frameworks like Spark, Flink, etc. the
> > data
> > is originating the schema (aka schema on read) thus no special preparation
> > steps is needed before the data could be read from a storage and processed.
> >
> > Now, in the case of Ignite the data needs to be transferred to a RAM,
> > however
> > same schema-on-read (a parsing code) or an externalized metadata (stored
> > config or something) could be used to structure it on-the-fly. Hence, my
> > cluster would have a higher level of runtime dynamic as I now I can create
> > new
> > caches as I go, without restarting the cluster nodes on every sneeze.
> >
> > > Also AFAIK CacheConfiguration class is serializable - you can save it
> > > somewhere and later load if needed.
> > > Or you may define some XML files with cache beans and load them with
> > > IgniteSpringHelper.
> > >
> > > Thoughts?
> >
> > Basically, I am trying to think of how to make this whole thing more
> > user-friendly and less middleware developers oriented. Wouldn't it be
> > great if
> > a user can load some external data and immediately start playing with it
> > and
> > doing some OLAP or even - oh horror :) - OLTP on it?
> >
> > Does it make sense?
> >   Cos
> >
> > > On Tue, Dec 22, 2015 at 3:32 PM, Konstantin Boudnik <co...@apache.org>
> > wrote:
> > >
> > > > Guys,
> > > >
> > > > is it possible to configure caches dynamically and persist their
> > > > configuration
> > > > in some shape and form? Here's the use case:
> > > >  - we want to create some caches in the already running cluster for a
> > data
> > > > set
> > > >  - once it is done, we'll run some SQL queries on top of those
> > > >  - ideally, we'd like to be able to safe the cache configurations so
> > next
> > > >    time, we don't need to do the structures and field types discovery,
> > but
> > > >    instead be able to load it on (re)start
> > > >
> > > > Is this a supported use-case or everything should be defined statically
> > > > before
> > > > nodes start? Looks like the latter, but perhaps we are missing
> > something.
> > > >
> > > > Thanks in advance for any info,
> > > >   Cos
> > > >
> > >
> > >
> > >
> > > --
> > > Alexey Kuznetsov
> > > GridGain Systems
> > > www.gridgain.com
> >

Re: Dynamic caches creation

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Cos,

As far as schema-on-read, you can set all your caches in XML configuration,
and they will be pre-created for you. Will this do the trick?

D.

On Tue, Dec 22, 2015 at 7:30 PM, Konstantin Boudnik <co...@apache.org> wrote:

> On Tue, Dec 22, 2015 at 03:45PM, Alexey Kuznetsov wrote:
> > Cos,
> >
> > How you are going to create caches in already started cluster?
> > I think you will create a cache configuration and after that will get
> cache
> > via Ignite.getOrCreateCache(ccfg).
>
> I would imagine that I can write some Java code to describe the
> configuration
> if needed. After all, Spring is all over the place. And while I am not a
> big
> fun of overusing it, it's already there, so perhaps it can do something
> useful ;)
>
> > So if you server cluster at some point was completely restarted, then
> > executing getOrCreateCache(ccfg) will create cache again (if needed) or
> > return existing cache.
>
> I am trying to have an analogy with either RDBMS or a data processing
> framework. I know the both of those aren't exact, but bear with me for a
> second. In RDBMS world the UX is to be able either to query existing
> tables or
> to create, populate and query new ones. No special auxiliary configurations
> are needed. In a data processing frameworks like Spark, Flink, etc. the
> data
> is originating the schema (aka schema on read) thus no special preparation
> steps is needed before the data could be read from a storage and processed.
>
> Now, in the case of Ignite the data needs to be transferred to a RAM,
> however
> same schema-on-read (a parsing code) or an externalized metadata (stored
> config or something) could be used to structure it on-the-fly. Hence, my
> cluster would have a higher level of runtime dynamic as I now I can create
> new
> caches as I go, without restarting the cluster nodes on every sneeze.
>
> > Also AFAIK CacheConfiguration class is serializable - you can save it
> > somewhere and later load if needed.
> > Or you may define some XML files with cache beans and load them with
> > IgniteSpringHelper.
> >
> > Thoughts?
>
> Basically, I am trying to think of how to make this whole thing more
> user-friendly and less middleware developers oriented. Wouldn't it be
> great if
> a user can load some external data and immediately start playing with it
> and
> doing some OLAP or even - oh horror :) - OLTP on it?
>
> Does it make sense?
>   Cos
>
> > On Tue, Dec 22, 2015 at 3:32 PM, Konstantin Boudnik <co...@apache.org>
> wrote:
> >
> > > Guys,
> > >
> > > is it possible to configure caches dynamically and persist their
> > > configuration
> > > in some shape and form? Here's the use case:
> > >  - we want to create some caches in the already running cluster for a
> data
> > > set
> > >  - once it is done, we'll run some SQL queries on top of those
> > >  - ideally, we'd like to be able to safe the cache configurations so
> next
> > >    time, we don't need to do the structures and field types discovery,
> but
> > >    instead be able to load it on (re)start
> > >
> > > Is this a supported use-case or everything should be defined statically
> > > before
> > > nodes start? Looks like the latter, but perhaps we are missing
> something.
> > >
> > > Thanks in advance for any info,
> > >   Cos
> > >
> >
> >
> >
> > --
> > Alexey Kuznetsov
> > GridGain Systems
> > www.gridgain.com
>

Re: Dynamic caches creation

Posted by Konstantin Boudnik <co...@apache.org>.

On Tue, Dec 22, 2015 at 03:45PM, Alexey Kuznetsov wrote:
> Cos,
> 
> How you are going to create caches in already started cluster?
> I think you will create a cache configuration and after that will get cache
> via Ignite.getOrCreateCache(ccfg).

I would imagine that I can write some Java code to describe the configuration
if needed. After all, Spring is all over the place. And while I am not a big
fun of overusing it, it's already there, so perhaps it can do something useful ;)

> So if you server cluster at some point was completely restarted, then
> executing getOrCreateCache(ccfg) will create cache again (if needed) or
> return existing cache.

I am trying to have an analogy with either RDBMS or a data processing
framework. I know the both of those aren't exact, but bear with me for a
second. In RDBMS world the UX is to be able either to query existing tables or
to create, populate and query new ones. No special auxiliary configurations
are needed. In a data processing frameworks like Spark, Flink, etc. the data
is originating the schema (aka schema on read) thus no special preparation
steps is needed before the data could be read from a storage and processed.

Now, in the case of Ignite the data needs to be transferred to a RAM, however
same schema-on-read (a parsing code) or an externalized metadata (stored
config or something) could be used to structure it on-the-fly. Hence, my
cluster would have a higher level of runtime dynamic as I now I can create new
caches as I go, without restarting the cluster nodes on every sneeze.

> Also AFAIK CacheConfiguration class is serializable - you can save it
> somewhere and later load if needed.
> Or you may define some XML files with cache beans and load them with
> IgniteSpringHelper.
> 
> Thoughts?

Basically, I am trying to think of how to make this whole thing more
user-friendly and less middleware developers oriented. Wouldn't it be great if
a user can load some external data and immediately start playing with it and
doing some OLAP or even - oh horror :) - OLTP on it?

Does it make sense?
  Cos

> On Tue, Dec 22, 2015 at 3:32 PM, Konstantin Boudnik <co...@apache.org> wrote:
> 
> > Guys,
> >
> > is it possible to configure caches dynamically and persist their
> > configuration
> > in some shape and form? Here's the use case:
> >  - we want to create some caches in the already running cluster for a data
> > set
> >  - once it is done, we'll run some SQL queries on top of those
> >  - ideally, we'd like to be able to safe the cache configurations so next
> >    time, we don't need to do the structures and field types discovery, but
> >    instead be able to load it on (re)start
> >
> > Is this a supported use-case or everything should be defined statically
> > before
> > nodes start? Looks like the latter, but perhaps we are missing something.
> >
> > Thanks in advance for any info,
> >   Cos
> >
> 
> 
> 
> -- 
> Alexey Kuznetsov
> GridGain Systems
> www.gridgain.com

Re: Dynamic caches creation

Posted by Alexey Kuznetsov <ak...@gridgain.com>.

Cos,

How you are going to create caches in already started cluster?
I think you will create a cache configuration and after that will get cache
via Ignite.getOrCreateCache(ccfg).

So if you server cluster at some point was completely restarted, then
executing getOrCreateCache(ccfg) will create cache again (if needed) or
return existing cache.

Also AFAIK CacheConfiguration class is serializable - you can save it
somewhere and later load if needed.
Or you may define some XML files with cache beans and load them with
IgniteSpringHelper.

Thoughts?

On Tue, Dec 22, 2015 at 3:32 PM, Konstantin Boudnik <co...@apache.org> wrote:

> Guys,
>
> is it possible to configure caches dynamically and persist their
> configuration
> in some shape and form? Here's the use case:
>  - we want to create some caches in the already running cluster for a data
> set
>  - once it is done, we'll run some SQL queries on top of those
>  - ideally, we'd like to be able to safe the cache configurations so next
>    time, we don't need to do the structures and field types discovery, but
>    instead be able to load it on (re)start
>
> Is this a supported use-case or everything should be defined statically
> before
> nodes start? Looks like the latter, but perhaps we are missing something.
>
> Thanks in advance for any info,
>   Cos
>

-- 
Alexey Kuznetsov
GridGain Systems
www.gridgain.com