You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by ankit Soni <an...@gmail.com> on 2021/01/29 17:32:40 UTC

Inputs for efficient querying

Hello Team,

I am loading data into Geode (V 1.12) with the following *Key (of type
String)* and *value (custom java object - ValueObject)*.

*public class ValueObject implements Serializable {*
*     private int id;*

*     private String keyColumn;   <--------- Region.Key *

*     private String column_2;     <---------- Json document*
*     private String column_3;*

*     private String column_4*



*     //few more string type members*
*}*

*Keycolum* is a normal string of around 8 chars, like "12345678",
"23456789" etc...

*In ValueObject, column_2 is of type string and having a values of type
valid JSON doc as bellow; *
{"k1" : "v1", "k3" : "v3", "k6" : "v6", "k7" : "v7", *"k10" : "v7"*, "k12"
: "v12", "k13" : "v13"}
{"k2" : "v2", "k3" : "v3", "k4" : "v4", "k6" : "v6", *"k10" : "v10"*, "k13"
: "v13", "k14" : "v14"}
{"k1" : "v1", "k2" : "v2", "k6" : "v6", "k8" : "v8", "k10" : "v7", "k12" :
"v12", "k13" : "v13", "k14" : "v14"}
.....

after storing the data in Geode i need to run following two queries.
*Query to be supported.*


*Q1. //query with filter on keyColumn*
"select d.keyColumn, d.column_2, d.column_3, d.column_4
from /DATA_REGION.keyset key
where (key IN SET('12345678', '23456789', '34567890'))"


*Q2. //query with filter on column_2 attribute, something like "where
d.column_2.k10 IN SET('v10', 'v15', 'v7'); *
"select d.keyColumn, d.column_2, d.column_3, d.column_4
from /DATA_REGION v
where v.column_2.k10 INSET('v10', 'v15', 'v7')"

I am able to run the Q1 but not sure *how to achieve Q2 (form a OQL for
this case)*...?

Request team to help, how can i efficiently form and execute above kind of
queries with geode OQL...?

Also advise, what kind of index are recommended to get higher query
performance for above queries...?

Thanks
Ankit.

Re: Inputs for efficient querying

Posted by ankit Soni <an...@gmail.com>.
Thanks Json for your guidance, this is certainly helpful in proceeding
further ...

Ankit.

On Thu, Feb 11, 2021, 6:31 AM Jason Huynh <jh...@vmware.com> wrote:

> Hi Ankit,
>
> I haven't had time to try this out but hopefully the answers get you on
> the correct path...
>
>     > 1. How can i form an OQL query (syntax) to fetch the latest row
> based on
>     > MAX(versionId).
> 1.) maybe a nested query or use order by?
> “Select x,y,z from /data-region d where d.versionId > 0 order by
> d.versionId”
> “Select x,y,z from /data-region d where d.versionId in (select
> max(versionId) from /data-region d)”
>
>     > 2. It seems *BETWEEN* support is not available, how can this be
> achieved.
> 2.) to_date might be of use or maybe you can call a method to convert the
> date to millis and do a >, <
> “Select x,y,z from /data-region d where d.date > to_date('01/11/2021’,
> 'MM/dd/yyyy’) and d.date < to_date('01/12/2021’, 'MM/dd/yyyy’)”
>
>
>     > 3. What should be the* recommended index creation here*, for this
> query
>     > to gain fast performance.
> 3.) optimal indexes will probably require knowing more about the entire
> data set.
> Whichever column lookup that can reduce the data down to the smallest
> size, will probably be the field to create the index on
>
> Regards,
> -Jason
>
> On 2/9/21, 9:55 PM, "ankit Soni" <an...@gmail.com> wrote:
>
>     Can some one pls guide how  functionality like "BETWEEN" operator can
> be
>     achieved using geode OQL (for Date fields).
>
>     Thanks
>     Ankit
>
>     On Tue, Feb 9, 2021, 11:53 PM ankit Soni <an...@gmail.com>
> wrote:
>
>     > Thanks Dan for your input. I am able to try this at my end and it's
>     > working as expected.
>     >
>     > As a next steps I need to support somewhat complex queries, so
> updated a
>     > ValueObject, like
>     >
>     > public class ValueObject implements PdxSerializable {
>     >     private static final long serialVersionUID =
> -754645299372860596L;
>     >     private int versionId;   //1 for latest record; 0 for previous
> latest
>     >     private String date;
>     >     private String col_1;
>     >     private String col_2;
>     >     private String col_3;
>     >     private String type;
>     >     private Map<String, String> map;
>     >
>     >     public ValueObject() {
>     >     }
>     >
>     > *Region*<String, ValueObject> : *Key:* random-string and *Value:*
>     > ValueObject
>     >
>     > Need to support queries that fetches columns for *latest record
> (whose
>     > versionId is Max)* with filters and aggregation like
>     > "SELECT date, col_1, col_2, col_3    <-----------Must be fetched
> from a
>     > row where MAX(versionId)
>     >      FROM /data-region d
>     >      WHERE d.type='t1'
>     >      AND d.date BETWEEN '2021-01-11' AND '2021-01-12'
>     >     AND d.vesionId BETWEEN 0 AND 1
>     >     AND d.col_1 IN SET ('111111', '222222')
>     >     GROUP BY d.col_1"
>     >
>     > *Team, Kindly guide on following,*
>     > 1. How can i form an OQL query (syntax) to fetch the latest row
> based on
>     > MAX(versionId).
>     > 2. It seems *BETWEEN* support is not available, how can this be
> achieved.
>     > 3. What should be the* recommended index creation here*, for this
> query
>     > to gain fast performance.
>     > 4. Any recommendation for Key, currently it's a random string.
>     >
>     > Any suggestions on above will be really helpful.
>     >
>     > Thank you
>     > Ankit.
>     >
>     > On Fri, 29 Jan 2021 at 23:23, Dan Smith <da...@vmware.com> wrote:
>     >
>     >> For the best performance, you should store column2 as a java Map
> instead
>     >> of a String which contains a json document. If column2 was
> Map<String,
>     >> String>, you could do a query like this:
>     >>
>     >>
>     >> SELECT * FROM /exampleRegion r WHERE r.column2['k1'] IN SET('v10',
> 'v15',
>     >> 'v7')"
>     >>
>     >> You can create an index on the map to optimize this sort of query
>     >>
>     >> gfsh>create index --name="IndexName" --expression="r.column2[*]"
>     >> --region="/exampleRegion r"
>     >>
>     >> This page might be helpful
>     >>
>     >>
>     >>
> https://geode.apache.org/docs/guide/112/developing/query_index/creating_map_indexes.html
>     >>
>     >> In addition, I noticed that your value implements Serializable. You
> will
>     >> get better performance out of the query engine if you configure PDX
>     >> serialization for your object, either by configuring the auto
> serializer or
>     >> implementing PdxSerializable. That avoids the need to deserialize
> your
>     >> entire value on the server to query/index it.
>     >>
>     >> -Dan
>     >>
>     >>
>     >> ________________________________
>     >> From: ankit Soni <an...@gmail.com>
>     >> Sent: Friday, January 29, 2021 9:32 AM
>     >> To: dev@geode.apache.org <de...@geode.apache.org>
>     >> Subject: Inputs for efficient querying
>     >>
>     >> Hello Team,
>     >>
>     >> I am loading data into Geode (V 1.12) with the following *Key (of
> type
>     >> String)* and *value (custom java object - ValueObject)*.
>     >>
>     >> *public class ValueObject implements Serializable {*
>     >> *     private int id;*
>     >>
>     >> *     private String keyColumn;   <--------- Region.Key *
>     >>
>     >> *     private String column_2;     <---------- Json document*
>     >> *     private String column_3;*
>     >>
>     >> *     private String column_4*
>     >>
>     >>
>     >>
>     >> *     //few more string type members*
>     >> *}*
>     >>
>     >> *Keycolum* is a normal string of around 8 chars, like "12345678",
>     >> "23456789" etc...
>     >>
>     >> *In ValueObject, column_2 is of type string and having a values of
> type
>     >> valid JSON doc as bellow; *
>     >> {"k1" : "v1", "k3" : "v3", "k6" : "v6", "k7" : "v7", *"k10" :
> "v7"*, "k12"
>     >> : "v12", "k13" : "v13"}
>     >> {"k2" : "v2", "k3" : "v3", "k4" : "v4", "k6" : "v6", *"k10" :
> "v10"*,
>     >> "k13"
>     >> : "v13", "k14" : "v14"}
>     >> {"k1" : "v1", "k2" : "v2", "k6" : "v6", "k8" : "v8", "k10" : "v7",
> "k12" :
>     >> "v12", "k13" : "v13", "k14" : "v14"}
>     >> .....
>     >>
>     >> after storing the data in Geode i need to run following two queries.
>     >> *Query to be supported.*
>     >>
>     >>
>     >> *Q1. //query with filter on keyColumn*
>     >> "select d.keyColumn, d.column_2, d.column_3, d.column_4
>     >> from /DATA_REGION.keyset key
>     >> where (key IN SET('12345678', '23456789', '34567890'))"
>     >>
>     >>
>     >> *Q2. //query with filter on column_2 attribute, something like
> "where
>     >> d.column_2.k10 IN SET('v10', 'v15', 'v7'); *
>     >> "select d.keyColumn, d.column_2, d.column_3, d.column_4
>     >> from /DATA_REGION v
>     >> where v.column_2.k10 INSET('v10', 'v15', 'v7')"
>     >>
>     >> I am able to run the Q1 but not sure *how to achieve Q2 (form a OQL
> for
>     >> this case)*...?
>     >>
>     >> Request team to help, how can i efficiently form and execute above
> kind of
>     >> queries with geode OQL...?
>     >>
>     >> Also advise, what kind of index are recommended to get higher query
>     >> performance for above queries...?
>     >>
>     >> Thanks
>     >> Ankit.
>     >>
>     >
>
>

Re: Inputs for efficient querying

Posted by Jason Huynh <jh...@vmware.com>.
Hi Ankit,

I haven't had time to try this out but hopefully the answers get you on the correct path...

    > 1. How can i form an OQL query (syntax) to fetch the latest row based on
    > MAX(versionId).
1.) maybe a nested query or use order by?
“Select x,y,z from /data-region d where d.versionId > 0 order by d.versionId”
“Select x,y,z from /data-region d where d.versionId in (select max(versionId) from /data-region d)”

    > 2. It seems *BETWEEN* support is not available, how can this be achieved.
2.) to_date might be of use or maybe you can call a method to convert the date to millis and do a >, < 
“Select x,y,z from /data-region d where d.date > to_date('01/11/2021’, 'MM/dd/yyyy’) and d.date < to_date('01/12/2021’, 'MM/dd/yyyy’)”


    > 3. What should be the* recommended index creation here*, for this query
    > to gain fast performance.
3.) optimal indexes will probably require knowing more about the entire data set.
Whichever column lookup that can reduce the data down to the smallest size, will probably be the field to create the index on

Regards,
-Jason

On 2/9/21, 9:55 PM, "ankit Soni" <an...@gmail.com> wrote:

    Can some one pls guide how  functionality like "BETWEEN" operator can be
    achieved using geode OQL (for Date fields).

    Thanks
    Ankit

    On Tue, Feb 9, 2021, 11:53 PM ankit Soni <an...@gmail.com> wrote:

    > Thanks Dan for your input. I am able to try this at my end and it's
    > working as expected.
    >
    > As a next steps I need to support somewhat complex queries, so updated a
    > ValueObject, like
    >
    > public class ValueObject implements PdxSerializable {
    >     private static final long serialVersionUID = -754645299372860596L;
    >     private int versionId;   //1 for latest record; 0 for previous latest
    >     private String date;
    >     private String col_1;
    >     private String col_2;
    >     private String col_3;
    >     private String type;
    >     private Map<String, String> map;
    >
    >     public ValueObject() {
    >     }
    >
    > *Region*<String, ValueObject> : *Key:* random-string and *Value:*
    > ValueObject
    >
    > Need to support queries that fetches columns for *latest record (whose
    > versionId is Max)* with filters and aggregation like
    > "SELECT date, col_1, col_2, col_3    <-----------Must be fetched from a
    > row where MAX(versionId)
    >      FROM /data-region d
    >      WHERE d.type='t1'
    >      AND d.date BETWEEN '2021-01-11' AND '2021-01-12'
    >     AND d.vesionId BETWEEN 0 AND 1
    >     AND d.col_1 IN SET ('111111', '222222')
    >     GROUP BY d.col_1"
    >
    > *Team, Kindly guide on following,*
    > 1. How can i form an OQL query (syntax) to fetch the latest row based on
    > MAX(versionId).
    > 2. It seems *BETWEEN* support is not available, how can this be achieved.
    > 3. What should be the* recommended index creation here*, for this query
    > to gain fast performance.
    > 4. Any recommendation for Key, currently it's a random string.
    >
    > Any suggestions on above will be really helpful.
    >
    > Thank you
    > Ankit.
    >
    > On Fri, 29 Jan 2021 at 23:23, Dan Smith <da...@vmware.com> wrote:
    >
    >> For the best performance, you should store column2 as a java Map instead
    >> of a String which contains a json document. If column2 was Map<String,
    >> String>, you could do a query like this:
    >>
    >>
    >> SELECT * FROM /exampleRegion r WHERE r.column2['k1'] IN SET('v10', 'v15',
    >> 'v7')"
    >>
    >> You can create an index on the map to optimize this sort of query
    >>
    >> gfsh>create index --name="IndexName" --expression="r.column2[*]"
    >> --region="/exampleRegion r"
    >>
    >> This page might be helpful
    >>
    >>
    >> https://geode.apache.org/docs/guide/112/developing/query_index/creating_map_indexes.html
    >>
    >> In addition, I noticed that your value implements Serializable. You will
    >> get better performance out of the query engine if you configure PDX
    >> serialization for your object, either by configuring the auto serializer or
    >> implementing PdxSerializable. That avoids the need to deserialize your
    >> entire value on the server to query/index it.
    >>
    >> -Dan
    >>
    >>
    >> ________________________________
    >> From: ankit Soni <an...@gmail.com>
    >> Sent: Friday, January 29, 2021 9:32 AM
    >> To: dev@geode.apache.org <de...@geode.apache.org>
    >> Subject: Inputs for efficient querying
    >>
    >> Hello Team,
    >>
    >> I am loading data into Geode (V 1.12) with the following *Key (of type
    >> String)* and *value (custom java object - ValueObject)*.
    >>
    >> *public class ValueObject implements Serializable {*
    >> *     private int id;*
    >>
    >> *     private String keyColumn;   <--------- Region.Key *
    >>
    >> *     private String column_2;     <---------- Json document*
    >> *     private String column_3;*
    >>
    >> *     private String column_4*
    >>
    >>
    >>
    >> *     //few more string type members*
    >> *}*
    >>
    >> *Keycolum* is a normal string of around 8 chars, like "12345678",
    >> "23456789" etc...
    >>
    >> *In ValueObject, column_2 is of type string and having a values of type
    >> valid JSON doc as bellow; *
    >> {"k1" : "v1", "k3" : "v3", "k6" : "v6", "k7" : "v7", *"k10" : "v7"*, "k12"
    >> : "v12", "k13" : "v13"}
    >> {"k2" : "v2", "k3" : "v3", "k4" : "v4", "k6" : "v6", *"k10" : "v10"*,
    >> "k13"
    >> : "v13", "k14" : "v14"}
    >> {"k1" : "v1", "k2" : "v2", "k6" : "v6", "k8" : "v8", "k10" : "v7", "k12" :
    >> "v12", "k13" : "v13", "k14" : "v14"}
    >> .....
    >>
    >> after storing the data in Geode i need to run following two queries.
    >> *Query to be supported.*
    >>
    >>
    >> *Q1. //query with filter on keyColumn*
    >> "select d.keyColumn, d.column_2, d.column_3, d.column_4
    >> from /DATA_REGION.keyset key
    >> where (key IN SET('12345678', '23456789', '34567890'))"
    >>
    >>
    >> *Q2. //query with filter on column_2 attribute, something like "where
    >> d.column_2.k10 IN SET('v10', 'v15', 'v7'); *
    >> "select d.keyColumn, d.column_2, d.column_3, d.column_4
    >> from /DATA_REGION v
    >> where v.column_2.k10 INSET('v10', 'v15', 'v7')"
    >>
    >> I am able to run the Q1 but not sure *how to achieve Q2 (form a OQL for
    >> this case)*...?
    >>
    >> Request team to help, how can i efficiently form and execute above kind of
    >> queries with geode OQL...?
    >>
    >> Also advise, what kind of index are recommended to get higher query
    >> performance for above queries...?
    >>
    >> Thanks
    >> Ankit.
    >>
    >


Re: Inputs for efficient querying

Posted by ankit Soni <an...@gmail.com>.
Can some one pls guide how  functionality like "BETWEEN" operator can be
achieved using geode OQL (for Date fields).

Thanks
Ankit

On Tue, Feb 9, 2021, 11:53 PM ankit Soni <an...@gmail.com> wrote:

> Thanks Dan for your input. I am able to try this at my end and it's
> working as expected.
>
> As a next steps I need to support somewhat complex queries, so updated a
> ValueObject, like
>
> public class ValueObject implements PdxSerializable {
>     private static final long serialVersionUID = -754645299372860596L;
>     private int versionId;   //1 for latest record; 0 for previous latest
>     private String date;
>     private String col_1;
>     private String col_2;
>     private String col_3;
>     private String type;
>     private Map<String, String> map;
>
>     public ValueObject() {
>     }
>
> *Region*<String, ValueObject> : *Key:* random-string and *Value:*
> ValueObject
>
> Need to support queries that fetches columns for *latest record (whose
> versionId is Max)* with filters and aggregation like
> "SELECT date, col_1, col_2, col_3    <-----------Must be fetched from a
> row where MAX(versionId)
>      FROM /data-region d
>      WHERE d.type='t1'
>      AND d.date BETWEEN '2021-01-11' AND '2021-01-12'
>     AND d.vesionId BETWEEN 0 AND 1
>     AND d.col_1 IN SET ('111111', '222222')
>     GROUP BY d.col_1"
>
> *Team, Kindly guide on following,*
> 1. How can i form an OQL query (syntax) to fetch the latest row based on
> MAX(versionId).
> 2. It seems *BETWEEN* support is not available, how can this be achieved.
> 3. What should be the* recommended index creation here*, for this query
> to gain fast performance.
> 4. Any recommendation for Key, currently it's a random string.
>
> Any suggestions on above will be really helpful.
>
> Thank you
> Ankit.
>
> On Fri, 29 Jan 2021 at 23:23, Dan Smith <da...@vmware.com> wrote:
>
>> For the best performance, you should store column2 as a java Map instead
>> of a String which contains a json document. If column2 was Map<String,
>> String>, you could do a query like this:
>>
>>
>> SELECT * FROM /exampleRegion r WHERE r.column2['k1'] IN SET('v10', 'v15',
>> 'v7')"
>>
>> You can create an index on the map to optimize this sort of query
>>
>> gfsh>create index --name="IndexName" --expression="r.column2[*]"
>> --region="/exampleRegion r"
>>
>> This page might be helpful
>>
>>
>> https://geode.apache.org/docs/guide/112/developing/query_index/creating_map_indexes.html
>>
>> In addition, I noticed that your value implements Serializable. You will
>> get better performance out of the query engine if you configure PDX
>> serialization for your object, either by configuring the auto serializer or
>> implementing PdxSerializable. That avoids the need to deserialize your
>> entire value on the server to query/index it.
>>
>> -Dan
>>
>>
>> ________________________________
>> From: ankit Soni <an...@gmail.com>
>> Sent: Friday, January 29, 2021 9:32 AM
>> To: dev@geode.apache.org <de...@geode.apache.org>
>> Subject: Inputs for efficient querying
>>
>> Hello Team,
>>
>> I am loading data into Geode (V 1.12) with the following *Key (of type
>> String)* and *value (custom java object - ValueObject)*.
>>
>> *public class ValueObject implements Serializable {*
>> *     private int id;*
>>
>> *     private String keyColumn;   <--------- Region.Key *
>>
>> *     private String column_2;     <---------- Json document*
>> *     private String column_3;*
>>
>> *     private String column_4*
>>
>>
>>
>> *     //few more string type members*
>> *}*
>>
>> *Keycolum* is a normal string of around 8 chars, like "12345678",
>> "23456789" etc...
>>
>> *In ValueObject, column_2 is of type string and having a values of type
>> valid JSON doc as bellow; *
>> {"k1" : "v1", "k3" : "v3", "k6" : "v6", "k7" : "v7", *"k10" : "v7"*, "k12"
>> : "v12", "k13" : "v13"}
>> {"k2" : "v2", "k3" : "v3", "k4" : "v4", "k6" : "v6", *"k10" : "v10"*,
>> "k13"
>> : "v13", "k14" : "v14"}
>> {"k1" : "v1", "k2" : "v2", "k6" : "v6", "k8" : "v8", "k10" : "v7", "k12" :
>> "v12", "k13" : "v13", "k14" : "v14"}
>> .....
>>
>> after storing the data in Geode i need to run following two queries.
>> *Query to be supported.*
>>
>>
>> *Q1. //query with filter on keyColumn*
>> "select d.keyColumn, d.column_2, d.column_3, d.column_4
>> from /DATA_REGION.keyset key
>> where (key IN SET('12345678', '23456789', '34567890'))"
>>
>>
>> *Q2. //query with filter on column_2 attribute, something like "where
>> d.column_2.k10 IN SET('v10', 'v15', 'v7'); *
>> "select d.keyColumn, d.column_2, d.column_3, d.column_4
>> from /DATA_REGION v
>> where v.column_2.k10 INSET('v10', 'v15', 'v7')"
>>
>> I am able to run the Q1 but not sure *how to achieve Q2 (form a OQL for
>> this case)*...?
>>
>> Request team to help, how can i efficiently form and execute above kind of
>> queries with geode OQL...?
>>
>> Also advise, what kind of index are recommended to get higher query
>> performance for above queries...?
>>
>> Thanks
>> Ankit.
>>
>

Re: Inputs for efficient querying

Posted by ankit Soni <an...@gmail.com>.
Thanks Dan for your input. I am able to try this at my end and it's working
as expected.

As a next steps I need to support somewhat complex queries, so updated a
ValueObject, like

public class ValueObject implements PdxSerializable {
    private static final long serialVersionUID = -754645299372860596L;
    private int versionId;   //1 for latest record; 0 for previous latest
    private String date;
    private String col_1;
    private String col_2;
    private String col_3;
    private String type;
    private Map<String, String> map;

    public ValueObject() {
    }

*Region*<String, ValueObject> : *Key:* random-string and *Value:*
ValueObject

Need to support queries that fetches columns for *latest record (whose
versionId is Max)* with filters and aggregation like
"SELECT date, col_1, col_2, col_3    <-----------Must be fetched from a row
where MAX(versionId)
     FROM /data-region d
     WHERE d.type='t1'
     AND d.date BETWEEN '2021-01-11' AND '2021-01-12'
    AND d.vesionId BETWEEN 0 AND 1
    AND d.col_1 IN SET ('111111', '222222')
    GROUP BY d.col_1"

*Team, Kindly guide on following,*
1. How can i form an OQL query (syntax) to fetch the latest row based on
MAX(versionId).
2. It seems *BETWEEN* support is not available, how can this be achieved.
3. What should be the* recommended index creation here*, for this query to
gain fast performance.
4. Any recommendation for Key, currently it's a random string.

Any suggestions on above will be really helpful.

Thank you
Ankit.

On Fri, 29 Jan 2021 at 23:23, Dan Smith <da...@vmware.com> wrote:

> For the best performance, you should store column2 as a java Map instead
> of a String which contains a json document. If column2 was Map<String,
> String>, you could do a query like this:
>
>
> SELECT * FROM /exampleRegion r WHERE r.column2['k1'] IN SET('v10', 'v15',
> 'v7')"
>
> You can create an index on the map to optimize this sort of query
>
> gfsh>create index --name="IndexName" --expression="r.column2[*]"
> --region="/exampleRegion r"
>
> This page might be helpful
>
>
> https://geode.apache.org/docs/guide/112/developing/query_index/creating_map_indexes.html
>
> In addition, I noticed that your value implements Serializable. You will
> get better performance out of the query engine if you configure PDX
> serialization for your object, either by configuring the auto serializer or
> implementing PdxSerializable. That avoids the need to deserialize your
> entire value on the server to query/index it.
>
> -Dan
>
>
> ________________________________
> From: ankit Soni <an...@gmail.com>
> Sent: Friday, January 29, 2021 9:32 AM
> To: dev@geode.apache.org <de...@geode.apache.org>
> Subject: Inputs for efficient querying
>
> Hello Team,
>
> I am loading data into Geode (V 1.12) with the following *Key (of type
> String)* and *value (custom java object - ValueObject)*.
>
> *public class ValueObject implements Serializable {*
> *     private int id;*
>
> *     private String keyColumn;   <--------- Region.Key *
>
> *     private String column_2;     <---------- Json document*
> *     private String column_3;*
>
> *     private String column_4*
>
>
>
> *     //few more string type members*
> *}*
>
> *Keycolum* is a normal string of around 8 chars, like "12345678",
> "23456789" etc...
>
> *In ValueObject, column_2 is of type string and having a values of type
> valid JSON doc as bellow; *
> {"k1" : "v1", "k3" : "v3", "k6" : "v6", "k7" : "v7", *"k10" : "v7"*, "k12"
> : "v12", "k13" : "v13"}
> {"k2" : "v2", "k3" : "v3", "k4" : "v4", "k6" : "v6", *"k10" : "v10"*, "k13"
> : "v13", "k14" : "v14"}
> {"k1" : "v1", "k2" : "v2", "k6" : "v6", "k8" : "v8", "k10" : "v7", "k12" :
> "v12", "k13" : "v13", "k14" : "v14"}
> .....
>
> after storing the data in Geode i need to run following two queries.
> *Query to be supported.*
>
>
> *Q1. //query with filter on keyColumn*
> "select d.keyColumn, d.column_2, d.column_3, d.column_4
> from /DATA_REGION.keyset key
> where (key IN SET('12345678', '23456789', '34567890'))"
>
>
> *Q2. //query with filter on column_2 attribute, something like "where
> d.column_2.k10 IN SET('v10', 'v15', 'v7'); *
> "select d.keyColumn, d.column_2, d.column_3, d.column_4
> from /DATA_REGION v
> where v.column_2.k10 INSET('v10', 'v15', 'v7')"
>
> I am able to run the Q1 but not sure *how to achieve Q2 (form a OQL for
> this case)*...?
>
> Request team to help, how can i efficiently form and execute above kind of
> queries with geode OQL...?
>
> Also advise, what kind of index are recommended to get higher query
> performance for above queries...?
>
> Thanks
> Ankit.
>

Re: Inputs for efficient querying

Posted by Dan Smith <da...@vmware.com>.
For the best performance, you should store column2 as a java Map instead of a String which contains a json document. If column2 was Map<String, String>, you could do a query like this:


SELECT * FROM /exampleRegion r WHERE r.column2['k1'] IN SET('v10', 'v15', 'v7')"

You can create an index on the map to optimize this sort of query

gfsh>create index --name="IndexName" --expression="r.column2[*]" --region="/exampleRegion r"

This page might be helpful

https://geode.apache.org/docs/guide/112/developing/query_index/creating_map_indexes.html

In addition, I noticed that your value implements Serializable. You will get better performance out of the query engine if you configure PDX serialization for your object, either by configuring the auto serializer or implementing PdxSerializable. That avoids the need to deserialize your entire value on the server to query/index it.

-Dan


________________________________
From: ankit Soni <an...@gmail.com>
Sent: Friday, January 29, 2021 9:32 AM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Inputs for efficient querying

Hello Team,

I am loading data into Geode (V 1.12) with the following *Key (of type
String)* and *value (custom java object - ValueObject)*.

*public class ValueObject implements Serializable {*
*     private int id;*

*     private String keyColumn;   <--------- Region.Key *

*     private String column_2;     <---------- Json document*
*     private String column_3;*

*     private String column_4*



*     //few more string type members*
*}*

*Keycolum* is a normal string of around 8 chars, like "12345678",
"23456789" etc...

*In ValueObject, column_2 is of type string and having a values of type
valid JSON doc as bellow; *
{"k1" : "v1", "k3" : "v3", "k6" : "v6", "k7" : "v7", *"k10" : "v7"*, "k12"
: "v12", "k13" : "v13"}
{"k2" : "v2", "k3" : "v3", "k4" : "v4", "k6" : "v6", *"k10" : "v10"*, "k13"
: "v13", "k14" : "v14"}
{"k1" : "v1", "k2" : "v2", "k6" : "v6", "k8" : "v8", "k10" : "v7", "k12" :
"v12", "k13" : "v13", "k14" : "v14"}
.....

after storing the data in Geode i need to run following two queries.
*Query to be supported.*


*Q1. //query with filter on keyColumn*
"select d.keyColumn, d.column_2, d.column_3, d.column_4
from /DATA_REGION.keyset key
where (key IN SET('12345678', '23456789', '34567890'))"


*Q2. //query with filter on column_2 attribute, something like "where
d.column_2.k10 IN SET('v10', 'v15', 'v7'); *
"select d.keyColumn, d.column_2, d.column_3, d.column_4
from /DATA_REGION v
where v.column_2.k10 INSET('v10', 'v15', 'v7')"

I am able to run the Q1 but not sure *how to achieve Q2 (form a OQL for
this case)*...?

Request team to help, how can i efficiently form and execute above kind of
queries with geode OQL...?

Also advise, what kind of index are recommended to get higher query
performance for above queries...?

Thanks
Ankit.