You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Andrey Kornev <an...@hotmail.com> on 2017/10/11 18:40:49 UTC

Indexing fields of non-POJO cache values

Hello,

Consider the following use case: my cache values are a serialized tree-like structure (as opposed to a POJO). The leaf nodes of the tree are Java primitives. Some of the leaf nodes are used by the queries and should be indexed.

What are my options for indexing such data?

Thanks
Andrey

Re: Indexing fields of non-POJO cache values

Posted by Alexey Kuznetsov <ak...@apache.org>.

Alexey G.,

AFAIK we are going to migrate to our own parser at some point.

On Tue, Oct 17, 2017 at 3:43 PM, Alexey Goncharuk <
alexey.goncharuk@gmail.com> wrote:

> Alexey K., looks like this will require significant changes in H2 (I cannot
> find anything on partial indexes there).
>
> Vladimir, any ideas?
>
> 2017-10-17 11:35 GMT+03:00 Alexey Kuznetsov <ak...@apache.org>:
>
> > Alexey G.,
> >
> > >  How these field extractors will be configured. QueryField and
> > QueryIndex are
> > already quite complex classes.
> > > Adding such a closure to configuration would complicate them even
> > further.
> > May be we can go in "JavaScript" way and pass a string with expression
> that
> > will be parsed and evaluated later on server side.
> >
> > > How these extractors will interact with future SQL drivers (my current
> > guess
> > - there is no way to define them in SQL)
> >
> > AFAIK RDBMS support index on expression.
> > For example: https://sqlite.org/expridx.html
> >
> > Make sense?
> >
> > On Tue, Oct 17, 2017 at 3:26 PM, Alexey Goncharuk <
> > alexey.goncharuk@gmail.com> wrote:
> >
> > > I like this idea. In general case, this will not even require
> > > deserializing the cache value. Consider a binary tree implementation
> > with a
> > > binary object node {val, left, right}. In this case, it is impossible
> to
> > > have an index of min or max, but with Andrey's suggestion, these
> indexes
> > > are trivially extracted.
> > >
> > > Two things to consider:
> > >  * How these field extractors will be configured. QueryField and
> > QueryIndex
> > > are already quite complex classes. Adding such a closure to
> configuration
> > > would complicate them even further.
> > >  * How these extractors will interact with future SQL drivers (my
> current
> > > guess - there is no way to define them in SQL)
> > >
> > > Andrey, can you create a ticket and suggest an API design so we can
> > review
> > > it?
> > >
> > > Thanks,
> > > AG
> > >
> > > 2017-10-17 5:44 GMT+03:00 Andrey Kornev <an...@hotmail.com>:
> > >
> > > > Of course it does, Dmitriy! However as I suggested below, the feature
> > > > should be optional. The current behavior (not requiring user classes
> on
> > > the
> > > > server, etc.) would remain the default one.
> > > >
> > > > Also, please realize that not everyone stores their data as POJOs or
> > uses
> > > > Ignite as a JDBC source -- the use cases that appear to have been the
> > > main
> > > > focus of Ignite community lately.
> > > >
> > > > Payloads with dynamic structures require more advanced mechanisms for
> > > > indexing, for example, to avoid the overhead of duplicating the
> > indexable
> > > > fields as top level fields of the BinaryObjects. In cases where the
> > cache
> > > > sizes are in tens of millions of entries, the ability to generate
> index
> > > > values on the fly rather than store them, would go a long way in
> terms
> > of
> > > > reducing memory utilization.
> > > >
> > > > In Ignite community finds this feature generally useful, I'd be more
> > than
> > > > happy to contribute its implementation.
> > > >
> > > > Regards
> > > > Andrey
> > > >
> > > > ________________________________
> > > > From: Dmitriy Setrakyan <ds...@apache.org>
> > > > Sent: Monday, October 16, 2017 6:14 PM
> > > > To: dev@ignite.apache.org
> > > > Subject: Re: Indexing fields of non-POJO cache values
> > > >
> > > > On Mon, Oct 16, 2017 at 12:35 PM, Andrey Kornev <
> > > andrewkornev@hotmail.com>
> > > > wrote:
> > > >
> > > > > [Crossposting to the dev list]
> > > > >
> > > > > Alexey,
> > > > >
> > > > > Yes, something like that, where the "reference"/"alias" is
> expressed
> > > as a
> > > > > piece of Java code (as part of QueryEntity definition, perhaps)
> that
> > is
> > > > > invoked by Ignite at the cache entry indexing time.
> > > > >
> > > > > My point is that rather than limiting indexable fields only to
> > > predefined
> > > > > POJO attributes (or BinaryObject fields) Ignite could adopt a more
> > > > general
> > > > > approach by allowing users designate an arbitrary piece of code (a
> > > > > lambda/closure) to be used as an index value extractor. In such
> case,
> > > the
> > > > > current functionality (extracting index values from POJO
> attributes)
> > > > > becomes just a special case that's supported by Ignite out of the
> > box.
> > > > >
> > > >
> > > > Andrey, this would require deserialization on the server side. It
> would
> > > > also require that user classes are present on the server side. Both
> of
> > > this
> > > > scenarios Ignite tries to avoid.
> > > >
> > > > Makes sense?
> > > >
> > >
> >
> >
> >
> > --
> > Alexey Kuznetsov
> >
>
> --
> Alexey Kuznetsov
>
>

Re: Indexing fields of non-POJO cache values

Posted by Alexey Goncharuk <al...@gmail.com>.

Alexey K., looks like this will require significant changes in H2 (I cannot
find anything on partial indexes there).

Vladimir, any ideas?

2017-10-17 11:35 GMT+03:00 Alexey Kuznetsov <ak...@apache.org>:

> Alexey G.,
>
> >  How these field extractors will be configured. QueryField and
> QueryIndex are
> already quite complex classes.
> > Adding such a closure to configuration would complicate them even
> further.
> May be we can go in "JavaScript" way and pass a string with expression that
> will be parsed and evaluated later on server side.
>
> > How these extractors will interact with future SQL drivers (my current
> guess
> - there is no way to define them in SQL)
>
> AFAIK RDBMS support index on expression.
> For example: https://sqlite.org/expridx.html
>
> Make sense?
>
> On Tue, Oct 17, 2017 at 3:26 PM, Alexey Goncharuk <
> alexey.goncharuk@gmail.com> wrote:
>
> > I like this idea. In general case, this will not even require
> > deserializing the cache value. Consider a binary tree implementation
> with a
> > binary object node {val, left, right}. In this case, it is impossible to
> > have an index of min or max, but with Andrey's suggestion, these indexes
> > are trivially extracted.
> >
> > Two things to consider:
> >  * How these field extractors will be configured. QueryField and
> QueryIndex
> > are already quite complex classes. Adding such a closure to configuration
> > would complicate them even further.
> >  * How these extractors will interact with future SQL drivers (my current
> > guess - there is no way to define them in SQL)
> >
> > Andrey, can you create a ticket and suggest an API design so we can
> review
> > it?
> >
> > Thanks,
> > AG
> >
> > 2017-10-17 5:44 GMT+03:00 Andrey Kornev <an...@hotmail.com>:
> >
> > > Of course it does, Dmitriy! However as I suggested below, the feature
> > > should be optional. The current behavior (not requiring user classes on
> > the
> > > server, etc.) would remain the default one.
> > >
> > > Also, please realize that not everyone stores their data as POJOs or
> uses
> > > Ignite as a JDBC source -- the use cases that appear to have been the
> > main
> > > focus of Ignite community lately.
> > >
> > > Payloads with dynamic structures require more advanced mechanisms for
> > > indexing, for example, to avoid the overhead of duplicating the
> indexable
> > > fields as top level fields of the BinaryObjects. In cases where the
> cache
> > > sizes are in tens of millions of entries, the ability to generate index
> > > values on the fly rather than store them, would go a long way in terms
> of
> > > reducing memory utilization.
> > >
> > > In Ignite community finds this feature generally useful, I'd be more
> than
> > > happy to contribute its implementation.
> > >
> > > Regards
> > > Andrey
> > >
> > > ________________________________
> > > From: Dmitriy Setrakyan <ds...@apache.org>
> > > Sent: Monday, October 16, 2017 6:14 PM
> > > To: dev@ignite.apache.org
> > > Subject: Re: Indexing fields of non-POJO cache values
> > >
> > > On Mon, Oct 16, 2017 at 12:35 PM, Andrey Kornev <
> > andrewkornev@hotmail.com>
> > > wrote:
> > >
> > > > [Crossposting to the dev list]
> > > >
> > > > Alexey,
> > > >
> > > > Yes, something like that, where the "reference"/"alias" is expressed
> > as a
> > > > piece of Java code (as part of QueryEntity definition, perhaps) that
> is
> > > > invoked by Ignite at the cache entry indexing time.
> > > >
> > > > My point is that rather than limiting indexable fields only to
> > predefined
> > > > POJO attributes (or BinaryObject fields) Ignite could adopt a more
> > > general
> > > > approach by allowing users designate an arbitrary piece of code (a
> > > > lambda/closure) to be used as an index value extractor. In such case,
> > the
> > > > current functionality (extracting index values from POJO attributes)
> > > > becomes just a special case that's supported by Ignite out of the
> box.
> > > >
> > >
> > > Andrey, this would require deserialization on the server side. It would
> > > also require that user classes are present on the server side. Both of
> > this
> > > scenarios Ignite tries to avoid.
> > >
> > > Makes sense?
> > >
> >
>
>
>
> --
> Alexey Kuznetsov
>

Re: Indexing fields of non-POJO cache values

Posted by Alexey Kuznetsov <ak...@apache.org>.

Alexey G.,

>  How these field extractors will be configured. QueryField and QueryIndex are
already quite complex classes.
> Adding such a closure to configuration would complicate them even further.
May be we can go in "JavaScript" way and pass a string with expression that
will be parsed and evaluated later on server side.

> How these extractors will interact with future SQL drivers (my current guess
- there is no way to define them in SQL)

AFAIK RDBMS support index on expression.
For example: https://sqlite.org/expridx.html

Make sense?

On Tue, Oct 17, 2017 at 3:26 PM, Alexey Goncharuk <
alexey.goncharuk@gmail.com> wrote:

> I like this idea. In general case, this will not even require
> deserializing the cache value. Consider a binary tree implementation with a
> binary object node {val, left, right}. In this case, it is impossible to
> have an index of min or max, but with Andrey's suggestion, these indexes
> are trivially extracted.
>
> Two things to consider:
>  * How these field extractors will be configured. QueryField and QueryIndex
> are already quite complex classes. Adding such a closure to configuration
> would complicate them even further.
>  * How these extractors will interact with future SQL drivers (my current
> guess - there is no way to define them in SQL)
>
> Andrey, can you create a ticket and suggest an API design so we can review
> it?
>
> Thanks,
> AG
>
> 2017-10-17 5:44 GMT+03:00 Andrey Kornev <an...@hotmail.com>:
>
> > Of course it does, Dmitriy! However as I suggested below, the feature
> > should be optional. The current behavior (not requiring user classes on
> the
> > server, etc.) would remain the default one.
> >
> > Also, please realize that not everyone stores their data as POJOs or uses
> > Ignite as a JDBC source -- the use cases that appear to have been the
> main
> > focus of Ignite community lately.
> >
> > Payloads with dynamic structures require more advanced mechanisms for
> > indexing, for example, to avoid the overhead of duplicating the indexable
> > fields as top level fields of the BinaryObjects. In cases where the cache
> > sizes are in tens of millions of entries, the ability to generate index
> > values on the fly rather than store them, would go a long way in terms of
> > reducing memory utilization.
> >
> > In Ignite community finds this feature generally useful, I'd be more than
> > happy to contribute its implementation.
> >
> > Regards
> > Andrey
> >
> > ________________________________
> > From: Dmitriy Setrakyan <ds...@apache.org>
> > Sent: Monday, October 16, 2017 6:14 PM
> > To: dev@ignite.apache.org
> > Subject: Re: Indexing fields of non-POJO cache values
> >
> > On Mon, Oct 16, 2017 at 12:35 PM, Andrey Kornev <
> andrewkornev@hotmail.com>
> > wrote:
> >
> > > [Crossposting to the dev list]
> > >
> > > Alexey,
> > >
> > > Yes, something like that, where the "reference"/"alias" is expressed
> as a
> > > piece of Java code (as part of QueryEntity definition, perhaps) that is
> > > invoked by Ignite at the cache entry indexing time.
> > >
> > > My point is that rather than limiting indexable fields only to
> predefined
> > > POJO attributes (or BinaryObject fields) Ignite could adopt a more
> > general
> > > approach by allowing users designate an arbitrary piece of code (a
> > > lambda/closure) to be used as an index value extractor. In such case,
> the
> > > current functionality (extracting index values from POJO attributes)
> > > becomes just a special case that's supported by Ignite out of the box.
> > >
> >
> > Andrey, this would require deserialization on the server side. It would
> > also require that user classes are present on the server side. Both of
> this
> > scenarios Ignite tries to avoid.
> >
> > Makes sense?
> >
>



-- 
Alexey Kuznetsov