You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Andrey Kornev <an...@hotmail.com> on 2017/10/11 18:40:49 UTC

Indexing fields of non-POJO cache values

Hello,

Consider the following use case: my cache values are a serialized tree-like structure (as opposed to a POJO). The leaf nodes of the tree are Java primitives. Some of the leaf nodes are used by the queries and should be indexed.

What are my options for indexing such data?

Thanks
Andrey

Re: Indexing fields of non-POJO cache values

Posted by Alexey Kuznetsov <ak...@apache.org>.
Alexey G.,

AFAIK we are going to migrate to our own parser at some point.

On Tue, Oct 17, 2017 at 3:43 PM, Alexey Goncharuk <
alexey.goncharuk@gmail.com> wrote:

> Alexey K., looks like this will require significant changes in H2 (I cannot
> find anything on partial indexes there).
>
> Vladimir, any ideas?
>
> 2017-10-17 11:35 GMT+03:00 Alexey Kuznetsov <ak...@apache.org>:
>
> > Alexey G.,
> >
> > >  How these field extractors will be configured. QueryField and
> > QueryIndex are
> > already quite complex classes.
> > > Adding such a closure to configuration would complicate them even
> > further.
> > May be we can go in "JavaScript" way and pass a string with expression
> that
> > will be parsed and evaluated later on server side.
> >
> > > How these extractors will interact with future SQL drivers (my current
> > guess
> > - there is no way to define them in SQL)
> >
> > AFAIK RDBMS support index on expression.
> > For example: https://sqlite.org/expridx.html
> >
> > Make sense?
> >
> > On Tue, Oct 17, 2017 at 3:26 PM, Alexey Goncharuk <
> > alexey.goncharuk@gmail.com> wrote:
> >
> > > I like this idea. In general case, this will not even require
> > > deserializing the cache value. Consider a binary tree implementation
> > with a
> > > binary object node {val, left, right}. In this case, it is impossible
> to
> > > have an index of min or max, but with Andrey's suggestion, these
> indexes
> > > are trivially extracted.
> > >
> > > Two things to consider:
> > >  * How these field extractors will be configured. QueryField and
> > QueryIndex
> > > are already quite complex classes. Adding such a closure to
> configuration
> > > would complicate them even further.
> > >  * How these extractors will interact with future SQL drivers (my
> current
> > > guess - there is no way to define them in SQL)
> > >
> > > Andrey, can you create a ticket and suggest an API design so we can
> > review
> > > it?
> > >
> > > Thanks,
> > > AG
> > >
> > > 2017-10-17 5:44 GMT+03:00 Andrey Kornev <an...@hotmail.com>:
> > >
> > > > Of course it does, Dmitriy! However as I suggested below, the feature
> > > > should be optional. The current behavior (not requiring user classes
> on
> > > the
> > > > server, etc.) would remain the default one.
> > > >
> > > > Also, please realize that not everyone stores their data as POJOs or
> > uses
> > > > Ignite as a JDBC source -- the use cases that appear to have been the
> > > main
> > > > focus of Ignite community lately.
> > > >
> > > > Payloads with dynamic structures require more advanced mechanisms for
> > > > indexing, for example, to avoid the overhead of duplicating the
> > indexable
> > > > fields as top level fields of the BinaryObjects. In cases where the
> > cache
> > > > sizes are in tens of millions of entries, the ability to generate
> index
> > > > values on the fly rather than store them, would go a long way in
> terms
> > of
> > > > reducing memory utilization.
> > > >
> > > > In Ignite community finds this feature generally useful, I'd be more
> > than
> > > > happy to contribute its implementation.
> > > >
> > > > Regards
> > > > Andrey
> > > >
> > > > ________________________________
> > > > From: Dmitriy Setrakyan <ds...@apache.org>
> > > > Sent: Monday, October 16, 2017 6:14 PM
> > > > To: dev@ignite.apache.org
> > > > Subject: Re: Indexing fields of non-POJO cache values
> > > >
> > > > On Mon, Oct 16, 2017 at 12:35 PM, Andrey Kornev <
> > > andrewkornev@hotmail.com>
> > > > wrote:
> > > >
> > > > > [Crossposting to the dev list]
> > > > >
> > > > > Alexey,
> > > > >
> > > > > Yes, something like that, where the "reference"/"alias" is
> expressed
> > > as a
> > > > > piece of Java code (as part of QueryEntity definition, perhaps)
> that
> > is
> > > > > invoked by Ignite at the cache entry indexing time.
> > > > >
> > > > > My point is that rather than limiting indexable fields only to
> > > predefined
> > > > > POJO attributes (or BinaryObject fields) Ignite could adopt a more
> > > > general
> > > > > approach by allowing users designate an arbitrary piece of code (a
> > > > > lambda/closure) to be used as an index value extractor. In such
> case,
> > > the
> > > > > current functionality (extracting index values from POJO
> attributes)
> > > > > becomes just a special case that's supported by Ignite out of the
> > box.
> > > > >
> > > >
> > > > Andrey, this would require deserialization on the server side. It
> would
> > > > also require that user classes are present on the server side. Both
> of
> > > this
> > > > scenarios Ignite tries to avoid.
> > > >
> > > > Makes sense?
> > > >
> > >
> >
> >
> >
> > --
> > Alexey Kuznetsov
> >
>
> --
> Alexey Kuznetsov
>
>

Re: Indexing fields of non-POJO cache values

Posted by Alexey Goncharuk <al...@gmail.com>.
Alexey K., looks like this will require significant changes in H2 (I cannot
find anything on partial indexes there).

Vladimir, any ideas?

2017-10-17 11:35 GMT+03:00 Alexey Kuznetsov <ak...@apache.org>:

> Alexey G.,
>
> >  How these field extractors will be configured. QueryField and
> QueryIndex are
> already quite complex classes.
> > Adding such a closure to configuration would complicate them even
> further.
> May be we can go in "JavaScript" way and pass a string with expression that
> will be parsed and evaluated later on server side.
>
> > How these extractors will interact with future SQL drivers (my current
> guess
> - there is no way to define them in SQL)
>
> AFAIK RDBMS support index on expression.
> For example: https://sqlite.org/expridx.html
>
> Make sense?
>
> On Tue, Oct 17, 2017 at 3:26 PM, Alexey Goncharuk <
> alexey.goncharuk@gmail.com> wrote:
>
> > I like this idea. In general case, this will not even require
> > deserializing the cache value. Consider a binary tree implementation
> with a
> > binary object node {val, left, right}. In this case, it is impossible to
> > have an index of min or max, but with Andrey's suggestion, these indexes
> > are trivially extracted.
> >
> > Two things to consider:
> >  * How these field extractors will be configured. QueryField and
> QueryIndex
> > are already quite complex classes. Adding such a closure to configuration
> > would complicate them even further.
> >  * How these extractors will interact with future SQL drivers (my current
> > guess - there is no way to define them in SQL)
> >
> > Andrey, can you create a ticket and suggest an API design so we can
> review
> > it?
> >
> > Thanks,
> > AG
> >
> > 2017-10-17 5:44 GMT+03:00 Andrey Kornev <an...@hotmail.com>:
> >
> > > Of course it does, Dmitriy! However as I suggested below, the feature
> > > should be optional. The current behavior (not requiring user classes on
> > the
> > > server, etc.) would remain the default one.
> > >
> > > Also, please realize that not everyone stores their data as POJOs or
> uses
> > > Ignite as a JDBC source -- the use cases that appear to have been the
> > main
> > > focus of Ignite community lately.
> > >
> > > Payloads with dynamic structures require more advanced mechanisms for
> > > indexing, for example, to avoid the overhead of duplicating the
> indexable
> > > fields as top level fields of the BinaryObjects. In cases where the
> cache
> > > sizes are in tens of millions of entries, the ability to generate index
> > > values on the fly rather than store them, would go a long way in terms
> of
> > > reducing memory utilization.
> > >
> > > In Ignite community finds this feature generally useful, I'd be more
> than
> > > happy to contribute its implementation.
> > >
> > > Regards
> > > Andrey
> > >
> > > ________________________________
> > > From: Dmitriy Setrakyan <ds...@apache.org>
> > > Sent: Monday, October 16, 2017 6:14 PM
> > > To: dev@ignite.apache.org
> > > Subject: Re: Indexing fields of non-POJO cache values
> > >
> > > On Mon, Oct 16, 2017 at 12:35 PM, Andrey Kornev <
> > andrewkornev@hotmail.com>
> > > wrote:
> > >
> > > > [Crossposting to the dev list]
> > > >
> > > > Alexey,
> > > >
> > > > Yes, something like that, where the "reference"/"alias" is expressed
> > as a
> > > > piece of Java code (as part of QueryEntity definition, perhaps) that
> is
> > > > invoked by Ignite at the cache entry indexing time.
> > > >
> > > > My point is that rather than limiting indexable fields only to
> > predefined
> > > > POJO attributes (or BinaryObject fields) Ignite could adopt a more
> > > general
> > > > approach by allowing users designate an arbitrary piece of code (a
> > > > lambda/closure) to be used as an index value extractor. In such case,
> > the
> > > > current functionality (extracting index values from POJO attributes)
> > > > becomes just a special case that's supported by Ignite out of the
> box.
> > > >
> > >
> > > Andrey, this would require deserialization on the server side. It would
> > > also require that user classes are present on the server side. Both of
> > this
> > > scenarios Ignite tries to avoid.
> > >
> > > Makes sense?
> > >
> >
>
>
>
> --
> Alexey Kuznetsov
>

Re: Indexing fields of non-POJO cache values

Posted by Alexey Kuznetsov <ak...@apache.org>.
Alexey G.,

>  How these field extractors will be configured. QueryField and QueryIndex are
already quite complex classes.
> Adding such a closure to configuration would complicate them even further.
May be we can go in "JavaScript" way and pass a string with expression that
will be parsed and evaluated later on server side.

> How these extractors will interact with future SQL drivers (my current guess
- there is no way to define them in SQL)

AFAIK RDBMS support index on expression.
For example: https://sqlite.org/expridx.html

Make sense?

On Tue, Oct 17, 2017 at 3:26 PM, Alexey Goncharuk <
alexey.goncharuk@gmail.com> wrote:

> I like this idea. In general case, this will not even require
> deserializing the cache value. Consider a binary tree implementation with a
> binary object node {val, left, right}. In this case, it is impossible to
> have an index of min or max, but with Andrey's suggestion, these indexes
> are trivially extracted.
>
> Two things to consider:
>  * How these field extractors will be configured. QueryField and QueryIndex
> are already quite complex classes. Adding such a closure to configuration
> would complicate them even further.
>  * How these extractors will interact with future SQL drivers (my current
> guess - there is no way to define them in SQL)
>
> Andrey, can you create a ticket and suggest an API design so we can review
> it?
>
> Thanks,
> AG
>
> 2017-10-17 5:44 GMT+03:00 Andrey Kornev <an...@hotmail.com>:
>
> > Of course it does, Dmitriy! However as I suggested below, the feature
> > should be optional. The current behavior (not requiring user classes on
> the
> > server, etc.) would remain the default one.
> >
> > Also, please realize that not everyone stores their data as POJOs or uses
> > Ignite as a JDBC source -- the use cases that appear to have been the
> main
> > focus of Ignite community lately.
> >
> > Payloads with dynamic structures require more advanced mechanisms for
> > indexing, for example, to avoid the overhead of duplicating the indexable
> > fields as top level fields of the BinaryObjects. In cases where the cache
> > sizes are in tens of millions of entries, the ability to generate index
> > values on the fly rather than store them, would go a long way in terms of
> > reducing memory utilization.
> >
> > In Ignite community finds this feature generally useful, I'd be more than
> > happy to contribute its implementation.
> >
> > Regards
> > Andrey
> >
> > ________________________________
> > From: Dmitriy Setrakyan <ds...@apache.org>
> > Sent: Monday, October 16, 2017 6:14 PM
> > To: dev@ignite.apache.org
> > Subject: Re: Indexing fields of non-POJO cache values
> >
> > On Mon, Oct 16, 2017 at 12:35 PM, Andrey Kornev <
> andrewkornev@hotmail.com>
> > wrote:
> >
> > > [Crossposting to the dev list]
> > >
> > > Alexey,
> > >
> > > Yes, something like that, where the "reference"/"alias" is expressed
> as a
> > > piece of Java code (as part of QueryEntity definition, perhaps) that is
> > > invoked by Ignite at the cache entry indexing time.
> > >
> > > My point is that rather than limiting indexable fields only to
> predefined
> > > POJO attributes (or BinaryObject fields) Ignite could adopt a more
> > general
> > > approach by allowing users designate an arbitrary piece of code (a
> > > lambda/closure) to be used as an index value extractor. In such case,
> the
> > > current functionality (extracting index values from POJO attributes)
> > > becomes just a special case that's supported by Ignite out of the box.
> > >
> >
> > Andrey, this would require deserialization on the server side. It would
> > also require that user classes are present on the server side. Both of
> this
> > scenarios Ignite tries to avoid.
> >
> > Makes sense?
> >
>



-- 
Alexey Kuznetsov

Re: Indexing fields of non-POJO cache values

Posted by Alexey Goncharuk <al...@gmail.com>.
I like this idea. In general case, this will not even require
deserializing the cache value. Consider a binary tree implementation with a
binary object node {val, left, right}. In this case, it is impossible to
have an index of min or max, but with Andrey's suggestion, these indexes
are trivially extracted.

Two things to consider:
 * How these field extractors will be configured. QueryField and QueryIndex
are already quite complex classes. Adding such a closure to configuration
would complicate them even further.
 * How these extractors will interact with future SQL drivers (my current
guess - there is no way to define them in SQL)

Andrey, can you create a ticket and suggest an API design so we can review
it?

Thanks,
AG

2017-10-17 5:44 GMT+03:00 Andrey Kornev <an...@hotmail.com>:

> Of course it does, Dmitriy! However as I suggested below, the feature
> should be optional. The current behavior (not requiring user classes on the
> server, etc.) would remain the default one.
>
> Also, please realize that not everyone stores their data as POJOs or uses
> Ignite as a JDBC source -- the use cases that appear to have been the main
> focus of Ignite community lately.
>
> Payloads with dynamic structures require more advanced mechanisms for
> indexing, for example, to avoid the overhead of duplicating the indexable
> fields as top level fields of the BinaryObjects. In cases where the cache
> sizes are in tens of millions of entries, the ability to generate index
> values on the fly rather than store them, would go a long way in terms of
> reducing memory utilization.
>
> In Ignite community finds this feature generally useful, I'd be more than
> happy to contribute its implementation.
>
> Regards
> Andrey
>
> ________________________________
> From: Dmitriy Setrakyan <ds...@apache.org>
> Sent: Monday, October 16, 2017 6:14 PM
> To: dev@ignite.apache.org
> Subject: Re: Indexing fields of non-POJO cache values
>
> On Mon, Oct 16, 2017 at 12:35 PM, Andrey Kornev <an...@hotmail.com>
> wrote:
>
> > [Crossposting to the dev list]
> >
> > Alexey,
> >
> > Yes, something like that, where the "reference"/"alias" is expressed as a
> > piece of Java code (as part of QueryEntity definition, perhaps) that is
> > invoked by Ignite at the cache entry indexing time.
> >
> > My point is that rather than limiting indexable fields only to predefined
> > POJO attributes (or BinaryObject fields) Ignite could adopt a more
> general
> > approach by allowing users designate an arbitrary piece of code (a
> > lambda/closure) to be used as an index value extractor. In such case, the
> > current functionality (extracting index values from POJO attributes)
> > becomes just a special case that's supported by Ignite out of the box.
> >
>
> Andrey, this would require deserialization on the server side. It would
> also require that user classes are present on the server side. Both of this
> scenarios Ignite tries to avoid.
>
> Makes sense?
>

Re: Indexing fields of non-POJO cache values

Posted by Andrey Kornev <an...@hotmail.com>.
Of course it does, Dmitriy! However as I suggested below, the feature should be optional. The current behavior (not requiring user classes on the server, etc.) would remain the default one.

Also, please realize that not everyone stores their data as POJOs or uses Ignite as a JDBC source -- the use cases that appear to have been the main focus of Ignite community lately.

Payloads with dynamic structures require more advanced mechanisms for indexing, for example, to avoid the overhead of duplicating the indexable fields as top level fields of the BinaryObjects. In cases where the cache sizes are in tens of millions of entries, the ability to generate index values on the fly rather than store them, would go a long way in terms of reducing memory utilization.

In Ignite community finds this feature generally useful, I'd be more than happy to contribute its implementation.

Regards
Andrey

________________________________
From: Dmitriy Setrakyan <ds...@apache.org>
Sent: Monday, October 16, 2017 6:14 PM
To: dev@ignite.apache.org
Subject: Re: Indexing fields of non-POJO cache values

On Mon, Oct 16, 2017 at 12:35 PM, Andrey Kornev <an...@hotmail.com>
wrote:

> [Crossposting to the dev list]
>
> Alexey,
>
> Yes, something like that, where the "reference"/"alias" is expressed as a
> piece of Java code (as part of QueryEntity definition, perhaps) that is
> invoked by Ignite at the cache entry indexing time.
>
> My point is that rather than limiting indexable fields only to predefined
> POJO attributes (or BinaryObject fields) Ignite could adopt a more general
> approach by allowing users designate an arbitrary piece of code (a
> lambda/closure) to be used as an index value extractor. In such case, the
> current functionality (extracting index values from POJO attributes)
> becomes just a special case that's supported by Ignite out of the box.
>

Andrey, this would require deserialization on the server side. It would
also require that user classes are present on the server side. Both of this
scenarios Ignite tries to avoid.

Makes sense?

Re: Indexing fields of non-POJO cache values

Posted by Dmitriy Setrakyan <ds...@apache.org>.
On Mon, Oct 16, 2017 at 12:35 PM, Andrey Kornev <an...@hotmail.com>
wrote:

> [Crossposting to the dev list]
>
> Alexey,
>
> Yes, something like that, where the "reference"/"alias" is expressed as a
> piece of Java code (as part of QueryEntity definition, perhaps) that is
> invoked by Ignite at the cache entry indexing time.
>
> My point is that rather than limiting indexable fields only to predefined
> POJO attributes (or BinaryObject fields) Ignite could adopt a more general
> approach by allowing users designate an arbitrary piece of code (a
> lambda/closure) to be used as an index value extractor. In such case, the
> current functionality (extracting index values from POJO attributes)
> becomes just a special case that's supported by Ignite out of the box.
>

Andrey, this would require deserialization on the server side. It would
also require that user classes are present on the server side. Both of this
scenarios Ignite tries to avoid.

Makes sense?

Indexing fields of non-POJO cache values

Posted by Andrey Kornev <an...@hotmail.com>.
[Crossposting to the dev list]

Alexey,

Yes, something like that, where the "reference"/"alias" is expressed as a piece of Java code (as part of QueryEntity definition, perhaps) that is invoked by Ignite at the cache entry indexing time.

My point is that rather than limiting indexable fields only to predefined POJO attributes (or BinaryObject fields) Ignite could adopt a more general approach by allowing users designate an arbitrary piece of code (a lambda/closure) to be used as an index value extractor. In such case, the current functionality (extracting index values from POJO attributes) becomes just a special case that's supported by Ignite out of the box.

This would really help in cases (like mine) where the cache values are non-POJO entities.

Thanks
Andrey
________________________________
From: Alexey Kuznetsov <ak...@apache.org>
Sent: Thursday, October 12, 2017 5:53 PM
To: user@ignite.apache.org
Subject: Re: Indexing fields of non-POJO cache values

Just as idea.

What if we can to declare a kind of "references" or "aliases" for fields in such cases?
And this will help us to avoid duplication of data.

For example in JavaScript I could (almost on the fly) declare getters and setters that could be as aliases for my data.


On Fri, Oct 13, 2017 at 12:39 AM, Andrey Kornev <an...@hotmail.com>> wrote:
Hey Andrey,

Thanks for your reply!

We've been using a slightly different approach, where we extract the values of the indexable leaf nodes and store them as individual fields of the binary object along with the serialized tree itself. Then we configure the cache to use those fields as QueryEntities. It works fine and this way we avoid using joins in our queries.

However an obvious drawback of such approach is data duplication. We end up with three copies of a field value:

1) the leaf node of the tree,
2) the field of the binary object, and
3) Ignite index

I was hoping that there may be a better way to achieve this. In particular I'd like to avoid storing the value as a field of a binary object (copy #2).

One possible (and elegant) approach to solving this problem would be to introduce a way to specify a method (or a closure) for a QueryEntity in addition to currently supported BinaryObject field/POJO attribute.

Regards
Andrey

________________________________
From: Andrey Mashenkov <an...@gmail.com>>
Sent: Thursday, October 12, 2017 6:25 AM
To: user@ignite.apache.org<ma...@ignite.apache.org>
Subject: Re: Indexing fields of non-POJO cache values

Hi,

Another way here is to implement your own query engine by extending IndexingSPI interface, which looks much more complicated.

On Thu, Oct 12, 2017 at 4:23 PM, Andrey Mashenkov <an...@gmail.com>> wrote:
Hi,

There is no way to index such data as is. To index data you need to have entry_field<->column mapping configured.
As a workaround here, leaves can be stored in cache as values.

E.g. you can have a separate cache to index leaf nodes, where entries will have 2 fields: "original tree key" field and "leaf node value" indexed field.
So, you will be able to query serialized tree-like structures via SQL query with JOIN condition on  "original tree key" and WHERE condition on "leaf node value" field.
Obviously, you will need to implement intermediate logic to keep data of both caches consistent.


On Wed, Oct 11, 2017 at 9:40 PM, Andrey Kornev <an...@hotmail.com>> wrote:
Hello,

Consider the following use case: my cache values are a serialized tree-like structure (as opposed to a POJO). The leaf nodes of the tree are Java primitives. Some of the leaf nodes are used by the queries and should be indexed.

What are my options for indexing such data?

Thanks
Andrey



--
Best regards,
Andrey V. Mashenkov



--
Best regards,
Andrey V. Mashenkov



--
Alexey Kuznetsov

Re: Indexing fields of non-POJO cache values

Posted by Andrey Kornev <an...@hotmail.com>.
[Crossposting to the dev list]

Alexey,

Yes, something like that, where the "reference"/"alias" is expressed as a piece of Java code (as part of QueryEntity definition, perhaps) that is invoked by Ignite at the cache entry indexing time.

My point is that rather than limiting indexable fields only to predefined POJO attributes (or BinaryObject fields) Ignite could adopt a more general approach by allowing users designate an arbitrary piece of code (a lambda/closure) to be used as an index value extractor. In such case, the current functionality (extracting index values from POJO attributes) becomes just a special case that's supported by Ignite out of the box.

This would really help in cases (like mine) where the cache values are non-POJO entities.

Thanks
Andrey
________________________________
From: Alexey Kuznetsov <ak...@apache.org>
Sent: Thursday, October 12, 2017 5:53 PM
To: user@ignite.apache.org
Subject: Re: Indexing fields of non-POJO cache values

Just as idea.

What if we can to declare a kind of "references" or "aliases" for fields in such cases?
And this will help us to avoid duplication of data.

For example in JavaScript I could (almost on the fly) declare getters and setters that could be as aliases for my data.


On Fri, Oct 13, 2017 at 12:39 AM, Andrey Kornev <an...@hotmail.com>> wrote:
Hey Andrey,

Thanks for your reply!

We've been using a slightly different approach, where we extract the values of the indexable leaf nodes and store them as individual fields of the binary object along with the serialized tree itself. Then we configure the cache to use those fields as QueryEntities. It works fine and this way we avoid using joins in our queries.

However an obvious drawback of such approach is data duplication. We end up with three copies of a field value:

1) the leaf node of the tree,
2) the field of the binary object, and
3) Ignite index

I was hoping that there may be a better way to achieve this. In particular I'd like to avoid storing the value as a field of a binary object (copy #2).

One possible (and elegant) approach to solving this problem would be to introduce a way to specify a method (or a closure) for a QueryEntity in addition to currently supported BinaryObject field/POJO attribute.

Regards
Andrey

________________________________
From: Andrey Mashenkov <an...@gmail.com>>
Sent: Thursday, October 12, 2017 6:25 AM
To: user@ignite.apache.org<ma...@ignite.apache.org>
Subject: Re: Indexing fields of non-POJO cache values

Hi,

Another way here is to implement your own query engine by extending IndexingSPI interface, which looks much more complicated.

On Thu, Oct 12, 2017 at 4:23 PM, Andrey Mashenkov <an...@gmail.com>> wrote:
Hi,

There is no way to index such data as is. To index data you need to have entry_field<->column mapping configured.
As a workaround here, leaves can be stored in cache as values.

E.g. you can have a separate cache to index leaf nodes, where entries will have 2 fields: "original tree key" field and "leaf node value" indexed field.
So, you will be able to query serialized tree-like structures via SQL query with JOIN condition on  "original tree key" and WHERE condition on "leaf node value" field.
Obviously, you will need to implement intermediate logic to keep data of both caches consistent.


On Wed, Oct 11, 2017 at 9:40 PM, Andrey Kornev <an...@hotmail.com>> wrote:
Hello,

Consider the following use case: my cache values are a serialized tree-like structure (as opposed to a POJO). The leaf nodes of the tree are Java primitives. Some of the leaf nodes are used by the queries and should be indexed.

What are my options for indexing such data?

Thanks
Andrey



--
Best regards,
Andrey V. Mashenkov



--
Best regards,
Andrey V. Mashenkov



--
Alexey Kuznetsov

Re: Indexing fields of non-POJO cache values

Posted by Andrey Kornev <an...@hotmail.com>.
[Crossposting to the dev list]

Alexey,

Yes, something like that, where the "reference"/"alias" is expressed as a piece of Java code (as part of QueryEntity definition, perhaps) that is invoked by Ignite at the cache entry indexing time.

My point is that rather than limiting indexable fields only to predefined POJO attributes (or BinaryObject fields) Ignite could adopt a more general approach by allowing users designate an arbitrary piece of code (a lambda/closure) to be used as an index value extractor. In such case, the current functionality (extracting index values from POJO attributes) becomes just a special case that's supported by Ignite out of the box.

This would really help in cases (like mine) where the cache values are non-POJO entities.

Thanks
Andrey
________________________________
From: Alexey Kuznetsov <ak...@apache.org>
Sent: Thursday, October 12, 2017 5:53 PM
To: user@ignite.apache.org
Subject: Re: Indexing fields of non-POJO cache values

Just as idea.

What if we can to declare a kind of "references" or "aliases" for fields in such cases?
And this will help us to avoid duplication of data.

For example in JavaScript I could (almost on the fly) declare getters and setters that could be as aliases for my data.


On Fri, Oct 13, 2017 at 12:39 AM, Andrey Kornev <an...@hotmail.com>> wrote:
Hey Andrey,

Thanks for your reply!

We've been using a slightly different approach, where we extract the values of the indexable leaf nodes and store them as individual fields of the binary object along with the serialized tree itself. Then we configure the cache to use those fields as QueryEntities. It works fine and this way we avoid using joins in our queries.

However an obvious drawback of such approach is data duplication. We end up with three copies of a field value:

1) the leaf node of the tree,
2) the field of the binary object, and
3) Ignite index

I was hoping that there may be a better way to achieve this. In particular I'd like to avoid storing the value as a field of a binary object (copy #2).

One possible (and elegant) approach to solving this problem would be to introduce a way to specify a method (or a closure) for a QueryEntity in addition to currently supported BinaryObject field/POJO attribute.

Regards
Andrey

________________________________
From: Andrey Mashenkov <an...@gmail.com>>
Sent: Thursday, October 12, 2017 6:25 AM
To: user@ignite.apache.org<ma...@ignite.apache.org>
Subject: Re: Indexing fields of non-POJO cache values

Hi,

Another way here is to implement your own query engine by extending IndexingSPI interface, which looks much more complicated.

On Thu, Oct 12, 2017 at 4:23 PM, Andrey Mashenkov <an...@gmail.com>> wrote:
Hi,

There is no way to index such data as is. To index data you need to have entry_field<->column mapping configured.
As a workaround here, leaves can be stored in cache as values.

E.g. you can have a separate cache to index leaf nodes, where entries will have 2 fields: "original tree key" field and "leaf node value" indexed field.
So, you will be able to query serialized tree-like structures via SQL query with JOIN condition on  "original tree key" and WHERE condition on "leaf node value" field.
Obviously, you will need to implement intermediate logic to keep data of both caches consistent.


On Wed, Oct 11, 2017 at 9:40 PM, Andrey Kornev <an...@hotmail.com>> wrote:
Hello,

Consider the following use case: my cache values are a serialized tree-like structure (as opposed to a POJO). The leaf nodes of the tree are Java primitives. Some of the leaf nodes are used by the queries and should be indexed.

What are my options for indexing such data?

Thanks
Andrey



--
Best regards,
Andrey V. Mashenkov



--
Best regards,
Andrey V. Mashenkov



--
Alexey Kuznetsov

Re: Indexing fields of non-POJO cache values

Posted by Alexey Kuznetsov <ak...@apache.org>.
Just as idea.

What if we can to declare a kind of "references" or "aliases" for fields in
such cases?
And this will help us to avoid duplication of data.

For example in JavaScript I could (almost on the fly) declare getters and
setters that could be as aliases for my data.


On Fri, Oct 13, 2017 at 12:39 AM, Andrey Kornev <an...@hotmail.com>
wrote:

> Hey Andrey,
>
> Thanks for your reply!
>
> We've been using a slightly different approach, where we extract the
> values of the indexable leaf nodes and store them as individual fields of
> the binary object along with the serialized tree itself. Then we configure
> the cache to use those fields as QueryEntities. It works fine and this way
> we avoid using joins in our queries.
>
> However an obvious drawback of such approach is data duplication. We end
> up with three copies of a field value:
>
> 1) the leaf node of the tree,
> 2) the field of the binary object, and
> 3) Ignite index
>
> I was hoping that there may be a better way to achieve this. In particular
> I'd like to avoid storing the value as a field of a binary object (copy #2).
>
> One possible (and elegant) approach to solving this problem would be to
> introduce a way to specify a method (or a closure) for a QueryEntity in
> addition to currently supported BinaryObject field/POJO attribute.
>
> Regards
> Andrey
>
> ------------------------------
> *From:* Andrey Mashenkov <an...@gmail.com>
> *Sent:* Thursday, October 12, 2017 6:25 AM
> *To:* user@ignite.apache.org
> *Subject:* Re: Indexing fields of non-POJO cache values
>
> Hi,
>
> Another way here is to implement your own query engine by extending
> IndexingSPI interface, which looks much more complicated.
>
> On Thu, Oct 12, 2017 at 4:23 PM, Andrey Mashenkov <
> andrey.mashenkov@gmail.com> wrote:
>
>> Hi,
>>
>> There is no way to index such data as is. To index data you need to have
>> entry_field<->column mapping configured.
>> As a workaround here, leaves can be stored in cache as values.
>>
>> E.g. you can have a separate cache to index leaf nodes, where entries
>> will have 2 fields: "original tree key" field and "leaf node value" indexed
>> field.
>> So, you will be able to query serialized tree-like structures via SQL
>> query with JOIN condition on  "original tree key" and WHERE condition on
>> "leaf node value" field.
>> Obviously, you will need to implement intermediate logic to keep data of
>> both caches consistent.
>>
>>
>> On Wed, Oct 11, 2017 at 9:40 PM, Andrey Kornev <an...@hotmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> Consider the following use case: my cache values are a
>>> serialized tree-like structure (as opposed to a POJO). The leaf nodes of
>>> the tree are Java primitives. Some of the leaf nodes are used by the
>>> queries and should be indexed.
>>>
>>> What are my options for indexing such data?
>>>
>>> Thanks
>>> Andrey
>>>
>>
>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>



-- 
Alexey Kuznetsov

Re: Indexing fields of non-POJO cache values

Posted by Andrey Kornev <an...@hotmail.com>.
Hey Andrey,

Thanks for your reply!

We've been using a slightly different approach, where we extract the values of the indexable leaf nodes and store them as individual fields of the binary object along with the serialized tree itself. Then we configure the cache to use those fields as QueryEntities. It works fine and this way we avoid using joins in our queries.

However an obvious drawback of such approach is data duplication. We end up with three copies of a field value:

1) the leaf node of the tree,
2) the field of the binary object, and
3) Ignite index

I was hoping that there may be a better way to achieve this. In particular I'd like to avoid storing the value as a field of a binary object (copy #2).

One possible (and elegant) approach to solving this problem would be to introduce a way to specify a method (or a closure) for a QueryEntity in addition to currently supported BinaryObject field/POJO attribute.

Regards
Andrey

________________________________
From: Andrey Mashenkov <an...@gmail.com>
Sent: Thursday, October 12, 2017 6:25 AM
To: user@ignite.apache.org
Subject: Re: Indexing fields of non-POJO cache values

Hi,

Another way here is to implement your own query engine by extending IndexingSPI interface, which looks much more complicated.

On Thu, Oct 12, 2017 at 4:23 PM, Andrey Mashenkov <an...@gmail.com>> wrote:
Hi,

There is no way to index such data as is. To index data you need to have entry_field<->column mapping configured.
As a workaround here, leaves can be stored in cache as values.

E.g. you can have a separate cache to index leaf nodes, where entries will have 2 fields: "original tree key" field and "leaf node value" indexed field.
So, you will be able to query serialized tree-like structures via SQL query with JOIN condition on  "original tree key" and WHERE condition on "leaf node value" field.
Obviously, you will need to implement intermediate logic to keep data of both caches consistent.


On Wed, Oct 11, 2017 at 9:40 PM, Andrey Kornev <an...@hotmail.com>> wrote:
Hello,

Consider the following use case: my cache values are a serialized tree-like structure (as opposed to a POJO). The leaf nodes of the tree are Java primitives. Some of the leaf nodes are used by the queries and should be indexed.

What are my options for indexing such data?

Thanks
Andrey



--
Best regards,
Andrey V. Mashenkov



--
Best regards,
Andrey V. Mashenkov

Re: Indexing fields of non-POJO cache values

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi,

Another way here is to implement your own query engine by extending
IndexingSPI interface, which looks much more complicated.

On Thu, Oct 12, 2017 at 4:23 PM, Andrey Mashenkov <
andrey.mashenkov@gmail.com> wrote:

> Hi,
>
> There is no way to index such data as is. To index data you need to have
> entry_field<->column mapping configured.
> As a workaround here, leaves can be stored in cache as values.
>
> E.g. you can have a separate cache to index leaf nodes, where entries will
> have 2 fields: "original tree key" field and "leaf node value" indexed
> field.
> So, you will be able to query serialized tree-like structures via SQL
> query with JOIN condition on  "original tree key" and WHERE condition on
> "leaf node value" field.
> Obviously, you will need to implement intermediate logic to keep data of
> both caches consistent.
>
>
> On Wed, Oct 11, 2017 at 9:40 PM, Andrey Kornev <an...@hotmail.com>
> wrote:
>
>> Hello,
>>
>> Consider the following use case: my cache values are a
>> serialized tree-like structure (as opposed to a POJO). The leaf nodes of
>> the tree are Java primitives. Some of the leaf nodes are used by the
>> queries and should be indexed.
>>
>> What are my options for indexing such data?
>>
>> Thanks
>> Andrey
>>
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>



-- 
Best regards,
Andrey V. Mashenkov

Re: Indexing fields of non-POJO cache values

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi,

There is no way to index such data as is. To index data you need to have
entry_field<->column mapping configured.
As a workaround here, leaves can be stored in cache as values.

E.g. you can have a separate cache to index leaf nodes, where entries will
have 2 fields: "original tree key" field and "leaf node value" indexed
field.
So, you will be able to query serialized tree-like structures via SQL query
with JOIN condition on  "original tree key" and WHERE condition on "leaf
node value" field.
Obviously, you will need to implement intermediate logic to keep data of
both caches consistent.


On Wed, Oct 11, 2017 at 9:40 PM, Andrey Kornev <an...@hotmail.com>
wrote:

> Hello,
>
> Consider the following use case: my cache values are a
> serialized tree-like structure (as opposed to a POJO). The leaf nodes of
> the tree are Java primitives. Some of the leaf nodes are used by the
> queries and should be indexed.
>
> What are my options for indexing such data?
>
> Thanks
> Andrey
>



-- 
Best regards,
Andrey V. Mashenkov