You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Maksim Timonin <ti...@gmail.com> on 2021/04/06 13:13:46 UTC

[DISCUSS] IEP-71 Public API for secondary index search

Hi, Igniters!

I'd like to propose a new feature - opportunity to query and create indexes
from public API.

It will help in some cases, where:
1. SQL is not applicable by design of user application;
2. Where IndexScan is preferable than ScanQuery for performance reasons;
3. Functional indexes are required.

Also it'll be great to have a transactional support for such queries, like
the "select for update" query provides. But I don't dig there much. It will
be a next step if this API will be implemented.

I've prepared an IEP-71 for that [1] with more details. Please share your
thoughts.


[1]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Andrey Mashenkov <an...@gmail.com>.

Maksim,

Am I right, that you mean this ticket [1] *IGNITE-12291 Create controllable
> paged query requests / responses for TextQuery similar to current SQL
> result processing*, when talked about incorrect limit work for
> TextQueries?
>
Yes, sure, that's it.

On Mon, Apr 12, 2021 at 12:15 PM Maksim Timonin <ti...@gmail.com>
wrote:

> Hi, Andrey!
>
> Am I right, that you mean this ticket [1] *IGNITE-12291 Create
> controllable paged query requests / responses for TextQuery similar to
> current SQL result processing*, when talked about incorrect limit work
> for TextQueries?
>
>
> [1] https://issues.apache.org/jira/browse/IGNITE-12291
>
> On Thu, Apr 8, 2021 at 4:32 PM Maksim Timonin <ti...@gmail.com>
> wrote:
>
>> Hi, Andrey!
>>
>> >> ScanQuery, TextQuery and partially SQL query share the same
>> infrastructure
>> I think I understand what you mean. I debug query processing and now
>> agree that it's a nice idea to try to reuse the infrastructure of scan and
>> text queries. Also as I can see there already Reducer functionality exists,
>> so I hope we can use that. I'm not absolutely confident now that it will
>> work fine, but I'm going to start there. Thanks for pointing me this
>> direction!
>>
>> >> I don't like the idea a user code will be executed inside BTree
>> operation
>> On the confluence page I've shown that a predicate passes as
>> TreeRowClosure. In this case you're right, any exception in a predicate
>> will lead to a CorruptedTreeException. But I see another legal way to
>> implement the predicate operation. BPlusTree.find accepts the X param that
>> passed to IO.getRow(). As I understand this param helps to control how much
>> returned row is filled. Then we can use it to return an object that
>> contains only basic info - link, pageAddr, offset. Then predicate operation
>> will be applied on the higher level on a cursor returned by a tree (like
>> H2TreeIndex does). It's safe to run user code there, we can handle
>> exceptions there.
>>
>>
>>
>> On Wed, Apr 7, 2021 at 4:46 PM Andrey Mashenkov <
>> andrey.mashenkov@gmail.com> wrote:
>>
>>> Maksim,
>>>
>>> The ScanQuery API provides a filter as
>>> > param that for case of index query should be splitted on such
>>> conditions.
>>> > It looks like a non-trivial task.
>>> >
>>> ScanQuery, TextQuery and partially SQL query share the same
>>> infrastructure.
>>> I've thought we could extend, improve and reuse some ScanQuery code that
>>> already works fine: map query on topology, IO, batching.
>>> Add IndexCondition alongside the Filter, and abstract query executor from
>>> source (primary and secondary Indexes).
>>> Add a sorted merge algorithm to the query merge stage. It can be very
>>> useful also for TextQueries that suffers from the absence of sorted merge
>>> and a "limit' condition work incorrectly.
>>>
>>> If you think it will be too hard than creating from scratch, I'm ok.
>>>
>>> 3. Ignite creates a proxy object that is filled with objects that are
>>> > inlined. If a user tries to access a field that isn't inlined or not
>>> > indexed, then deserialization will start and Ignite will log.warn()
>>> about
>>> > that.
>>> >
>>> Agree, this can be faster.
>>> I don't like the idea a user code will be executed inside BTree
>>> operation,
>>> any exception can cause FailureHandler triggering and stop the node.
>>>
>>> There is one more thing that could be improved.
>>> ScanQuery now iterates over per-partition PK Hash index trees and has
>>> performance issues on a small grid with a large number of partitions.
>>> So, there are many partitions on every node and many trees should be
>>> scanned.
>>> In this case scan over a secondary index gives significant boots even if
>>> every row is materialized, because we need to traverse over a single tree
>>> per-node.
>>> Having the ability to run a ScanQuery over a secondary index (if one
>>> exists) instead of PK Hash will be great.
>>>
>>>
>>> On Wed, Apr 7, 2021 at 11:18 AM Maksim Timonin <ti...@gmail.com>
>>> wrote:
>>>
>>> > Hi, Andrey!
>>> >
>>> > Thanks for the review and your comments!
>>> >
>>> > >> Is it possible to extend ScanQuery functionality to pass index
>>> condition
>>> > I investigated this way and see some issues:
>>> > 1. Querying of indexes is not a scan actually. It's
>>> > a tree traverse (predicate operation is an exclusion, other operations
>>> like
>>> > gt, lt, min, max have explicit boundaries). An index query consists of
>>> > conditions that match an index structure. In general for a multi-key
>>> index
>>> > there can be multiple conditions. The ScanQuery API provides a filter
>>> as
>>> > param that for case of index query should be splitted on such
>>> conditions.
>>> > It looks like a non-trivial task.
>>> > 2. Querying of an index requires a sorted result, while The ScanQuery
>>> > doesn't matter about that. So there will be a different behavior of the
>>> > iterator for scanning a cache and querying indexes. It's not much to
>>> > implement I think, but it can make ScanQuery unclear for a user.
>>> >
>>> > Maybe it's a point to separate traverse (gt, lt, in, etc...) and scan
>>> > (predicate) index operations to different API. So there still will be
>>> a new
>>> > query type for the traversing.
>>> >
>>> > But we will introduce some inheritors for ScanQuery, like
>>> TableScanQuery
>>> > and IndexScanQuery, for scan and filter. Then the question is about
>>> > ordering, Cache and Table scans aren't ordered, but Index is. Then we
>>> can
>>> > introduce an optional param "order" for ScanQuery too.
>>> >
>>> > WDYT?
>>> >
>>> > >> Functional indices
>>> > >> This task looks like a huge one because the lifecycle of such
>>> classes
>>> > should be described first
>>> > I agree with you. That this part should be investigated deeper than I
>>> did.
>>> > So let's postpone discussion about functional indexes for a while.
>>> IEP-71
>>> > declares some phases, functional indexes are part of the 2nd phase, but
>>> > users will get new functionality already from the 1st phase. Then I'll
>>> dig
>>> > into things you mentioned. Thanks for pointing them out.
>>> >
>>> > >> IndexScan by the predicate is questionable
>>> > Also in comments to the IEP on the Confluence you mentioned about
>>> > deserialization that is required to get an object for predicate
>>> function.
>>> > Now I see it like that:
>>> > 1. The predicate should operate only with indexed fields;
>>> > 2. User win from predicate only if index is inlined properly (even a
>>> part
>>> > of rows aren't inlined due to varlen - it still can be faster then
>>> make a
>>> > ScanQuery);
>>> > 3. Ignite creates a proxy object that is filled with objects that are
>>> > inlined. If a user tries to access a field that isn't inlined or not
>>> > indexed, then deserialization will start and Ignite will log.warn()
>>> about
>>> > that.
>>> >
>>> > So, I think it's a valid use case. Is there smth I'm missing?
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Tue, Apr 6, 2021 at 6:21 PM Andrey Mashenkov <
>>> > andrey.mashenkov@gmail.com>
>>> > wrote:
>>> >
>>> > > Hi Maksim,
>>> > >
>>> > > Nice idea, I'd like to see this feature in Ignite.
>>> > > The motivation is clear to me, it would be nice to have fast scans
>>> and
>>> > omit
>>> > > SQL overhead on planning, parsing and etc in some simple use-cases.
>>> > >
>>> > > I've left few minor comments to the IEP, but I have the next
>>> questions
>>> > > which answer I failed to find in IEP.
>>> > > 1. Is it possible to extend ScanQuery functionality to pass index
>>> > condition
>>> > > as a hint/parameter rather than create a separate query type?
>>> > > This allows a user to run a query over the particular table (for
>>> > > multi-table per cache case) and use an index for some type of
>>> conditions.
>>> > >
>>> > > 2. Functional indices, as you wrote, should use Functions
>>> distributed via
>>> > > peerClassLoading mechanics.
>>> > > This means there will no class with function on server sides and such
>>> > > classes are not persistent. Seems, they can survive grid restart.
>>> > > This task looks like a huge one because the lifecycle of such classes
>>> > > should be described first.
>>> > > Possible pitfalls are:
>>> > > * Durability. Function code MUST be persistent, to survive node
>>> restart
>>> > as
>>> > > there can be no guaranteed classes available on the server-side.
>>> > > * Consistency. Server (and maybe clients) nodes MUST have the same
>>> class
>>> > > code at a time.
>>> > > * Code ownership. Would class code be shared or per-cache? If first,
>>> you
>>> > > can't just change class code by loading a new one, because other
>>> caches
>>> > may
>>> > > use this function.
>>> > > If second, different caches may have different code/behavior, that
>>> may be
>>> > > non-obvious to end-user.
>>> > >
>>> > > 3. IndexScan by the predicate is questionable.
>>> > > Maybe it will can faster if there are multiple tables in a cache, but
>>> > looks
>>> > > similar to ScanQuery with a filter.
>>> > >
>>> > > Also, I believe we can have a common API (configuring, creating,
>>> using)
>>> > for
>>> > > all types of Indices, but
>>> > > some types (e.g. functional) will be ignored in SQL due to limited
>>> > support
>>> > > on H2 side,
>>> > > and other types will be shared and could be used by ScanQuery engine
>>> as
>>> > > well as by SQL engine.
>>> > >
>>> > > On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin <
>>> timonin.maxim@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > Hi, Igniters!
>>> > > >
>>> > > > I'd like to propose a new feature - opportunity to query and create
>>> > > indexes
>>> > > > from public API.
>>> > > >
>>> > > > It will help in some cases, where:
>>> > > > 1. SQL is not applicable by design of user application;
>>> > > > 2. Where IndexScan is preferable than ScanQuery for performance
>>> > reasons;
>>> > > > 3. Functional indexes are required.
>>> > > >
>>> > > > Also it'll be great to have a transactional support for such
>>> queries,
>>> > > like
>>> > > > the "select for update" query provides. But I don't dig there
>>> much. It
>>> > > will
>>> > > > be a next step if this API will be implemented.
>>> > > >
>>> > > > I've prepared an IEP-71 for that [1] with more details. Please
>>> share
>>> > your
>>> > > > thoughts.
>>> > > >
>>> > > >
>>> > > > [1]
>>> > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
>>> > > >
>>> > >
>>> > >
>>> > > --
>>> > > Best regards,
>>> > > Andrey V. Mashenkov
>>> > >
>>> >
>>>
>>>
>>> --
>>> Best regards,
>>> Andrey V. Mashenkov
>>>
>>

-- 
Best regards,
Andrey V. Mashenkov

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Maksim Timonin <ti...@gmail.com>.

Hi, Andrey!

Am I right, that you mean this ticket [1] *IGNITE-12291 Create controllable
paged query requests / responses for TextQuery similar to current SQL
result processing*, when talked about incorrect limit work for TextQueries?


[1] https://issues.apache.org/jira/browse/IGNITE-12291

On Thu, Apr 8, 2021 at 4:32 PM Maksim Timonin <ti...@gmail.com>
wrote:

> Hi, Andrey!
>
> >> ScanQuery, TextQuery and partially SQL query share the same
> infrastructure
> I think I understand what you mean. I debug query processing and now agree
> that it's a nice idea to try to reuse the infrastructure of scan and text
> queries. Also as I can see there already Reducer functionality exists, so I
> hope we can use that. I'm not absolutely confident now that it will work
> fine, but I'm going to start there. Thanks for pointing me this direction!
>
> >> I don't like the idea a user code will be executed inside BTree
> operation
> On the confluence page I've shown that a predicate passes as
> TreeRowClosure. In this case you're right, any exception in a predicate
> will lead to a CorruptedTreeException. But I see another legal way to
> implement the predicate operation. BPlusTree.find accepts the X param that
> passed to IO.getRow(). As I understand this param helps to control how much
> returned row is filled. Then we can use it to return an object that
> contains only basic info - link, pageAddr, offset. Then predicate operation
> will be applied on the higher level on a cursor returned by a tree (like
> H2TreeIndex does). It's safe to run user code there, we can handle
> exceptions there.
>
>
>
> On Wed, Apr 7, 2021 at 4:46 PM Andrey Mashenkov <
> andrey.mashenkov@gmail.com> wrote:
>
>> Maksim,
>>
>> The ScanQuery API provides a filter as
>> > param that for case of index query should be splitted on such
>> conditions.
>> > It looks like a non-trivial task.
>> >
>> ScanQuery, TextQuery and partially SQL query share the same
>> infrastructure.
>> I've thought we could extend, improve and reuse some ScanQuery code that
>> already works fine: map query on topology, IO, batching.
>> Add IndexCondition alongside the Filter, and abstract query executor from
>> source (primary and secondary Indexes).
>> Add a sorted merge algorithm to the query merge stage. It can be very
>> useful also for TextQueries that suffers from the absence of sorted merge
>> and a "limit' condition work incorrectly.
>>
>> If you think it will be too hard than creating from scratch, I'm ok.
>>
>> 3. Ignite creates a proxy object that is filled with objects that are
>> > inlined. If a user tries to access a field that isn't inlined or not
>> > indexed, then deserialization will start and Ignite will log.warn()
>> about
>> > that.
>> >
>> Agree, this can be faster.
>> I don't like the idea a user code will be executed inside BTree operation,
>> any exception can cause FailureHandler triggering and stop the node.
>>
>> There is one more thing that could be improved.
>> ScanQuery now iterates over per-partition PK Hash index trees and has
>> performance issues on a small grid with a large number of partitions.
>> So, there are many partitions on every node and many trees should be
>> scanned.
>> In this case scan over a secondary index gives significant boots even if
>> every row is materialized, because we need to traverse over a single tree
>> per-node.
>> Having the ability to run a ScanQuery over a secondary index (if one
>> exists) instead of PK Hash will be great.
>>
>>
>> On Wed, Apr 7, 2021 at 11:18 AM Maksim Timonin <ti...@gmail.com>
>> wrote:
>>
>> > Hi, Andrey!
>> >
>> > Thanks for the review and your comments!
>> >
>> > >> Is it possible to extend ScanQuery functionality to pass index
>> condition
>> > I investigated this way and see some issues:
>> > 1. Querying of indexes is not a scan actually. It's
>> > a tree traverse (predicate operation is an exclusion, other operations
>> like
>> > gt, lt, min, max have explicit boundaries). An index query consists of
>> > conditions that match an index structure. In general for a multi-key
>> index
>> > there can be multiple conditions. The ScanQuery API provides a filter as
>> > param that for case of index query should be splitted on such
>> conditions.
>> > It looks like a non-trivial task.
>> > 2. Querying of an index requires a sorted result, while The ScanQuery
>> > doesn't matter about that. So there will be a different behavior of the
>> > iterator for scanning a cache and querying indexes. It's not much to
>> > implement I think, but it can make ScanQuery unclear for a user.
>> >
>> > Maybe it's a point to separate traverse (gt, lt, in, etc...) and scan
>> > (predicate) index operations to different API. So there still will be a
>> new
>> > query type for the traversing.
>> >
>> > But we will introduce some inheritors for ScanQuery, like TableScanQuery
>> > and IndexScanQuery, for scan and filter. Then the question is about
>> > ordering, Cache and Table scans aren't ordered, but Index is. Then we
>> can
>> > introduce an optional param "order" for ScanQuery too.
>> >
>> > WDYT?
>> >
>> > >> Functional indices
>> > >> This task looks like a huge one because the lifecycle of such classes
>> > should be described first
>> > I agree with you. That this part should be investigated deeper than I
>> did.
>> > So let's postpone discussion about functional indexes for a while.
>> IEP-71
>> > declares some phases, functional indexes are part of the 2nd phase, but
>> > users will get new functionality already from the 1st phase. Then I'll
>> dig
>> > into things you mentioned. Thanks for pointing them out.
>> >
>> > >> IndexScan by the predicate is questionable
>> > Also in comments to the IEP on the Confluence you mentioned about
>> > deserialization that is required to get an object for predicate
>> function.
>> > Now I see it like that:
>> > 1. The predicate should operate only with indexed fields;
>> > 2. User win from predicate only if index is inlined properly (even a
>> part
>> > of rows aren't inlined due to varlen - it still can be faster then make
>> a
>> > ScanQuery);
>> > 3. Ignite creates a proxy object that is filled with objects that are
>> > inlined. If a user tries to access a field that isn't inlined or not
>> > indexed, then deserialization will start and Ignite will log.warn()
>> about
>> > that.
>> >
>> > So, I think it's a valid use case. Is there smth I'm missing?
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Apr 6, 2021 at 6:21 PM Andrey Mashenkov <
>> > andrey.mashenkov@gmail.com>
>> > wrote:
>> >
>> > > Hi Maksim,
>> > >
>> > > Nice idea, I'd like to see this feature in Ignite.
>> > > The motivation is clear to me, it would be nice to have fast scans and
>> > omit
>> > > SQL overhead on planning, parsing and etc in some simple use-cases.
>> > >
>> > > I've left few minor comments to the IEP, but I have the next questions
>> > > which answer I failed to find in IEP.
>> > > 1. Is it possible to extend ScanQuery functionality to pass index
>> > condition
>> > > as a hint/parameter rather than create a separate query type?
>> > > This allows a user to run a query over the particular table (for
>> > > multi-table per cache case) and use an index for some type of
>> conditions.
>> > >
>> > > 2. Functional indices, as you wrote, should use Functions distributed
>> via
>> > > peerClassLoading mechanics.
>> > > This means there will no class with function on server sides and such
>> > > classes are not persistent. Seems, they can survive grid restart.
>> > > This task looks like a huge one because the lifecycle of such classes
>> > > should be described first.
>> > > Possible pitfalls are:
>> > > * Durability. Function code MUST be persistent, to survive node
>> restart
>> > as
>> > > there can be no guaranteed classes available on the server-side.
>> > > * Consistency. Server (and maybe clients) nodes MUST have the same
>> class
>> > > code at a time.
>> > > * Code ownership. Would class code be shared or per-cache? If first,
>> you
>> > > can't just change class code by loading a new one, because other
>> caches
>> > may
>> > > use this function.
>> > > If second, different caches may have different code/behavior, that
>> may be
>> > > non-obvious to end-user.
>> > >
>> > > 3. IndexScan by the predicate is questionable.
>> > > Maybe it will can faster if there are multiple tables in a cache, but
>> > looks
>> > > similar to ScanQuery with a filter.
>> > >
>> > > Also, I believe we can have a common API (configuring, creating,
>> using)
>> > for
>> > > all types of Indices, but
>> > > some types (e.g. functional) will be ignored in SQL due to limited
>> > support
>> > > on H2 side,
>> > > and other types will be shared and could be used by ScanQuery engine
>> as
>> > > well as by SQL engine.
>> > >
>> > > On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin <
>> timonin.maxim@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi, Igniters!
>> > > >
>> > > > I'd like to propose a new feature - opportunity to query and create
>> > > indexes
>> > > > from public API.
>> > > >
>> > > > It will help in some cases, where:
>> > > > 1. SQL is not applicable by design of user application;
>> > > > 2. Where IndexScan is preferable than ScanQuery for performance
>> > reasons;
>> > > > 3. Functional indexes are required.
>> > > >
>> > > > Also it'll be great to have a transactional support for such
>> queries,
>> > > like
>> > > > the "select for update" query provides. But I don't dig there much.
>> It
>> > > will
>> > > > be a next step if this API will be implemented.
>> > > >
>> > > > I've prepared an IEP-71 for that [1] with more details. Please share
>> > your
>> > > > thoughts.
>> > > >
>> > > >
>> > > > [1]
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
>> > > >
>> > >
>> > >
>> > > --
>> > > Best regards,
>> > > Andrey V. Mashenkov
>> > >
>> >
>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Maksim Timonin <ti...@gmail.com>.

Hi, Andrey!

>> ScanQuery, TextQuery and partially SQL query share the same
infrastructure
I think I understand what you mean. I debug query processing and now agree
that it's a nice idea to try to reuse the infrastructure of scan and text
queries. Also as I can see there already Reducer functionality exists, so I
hope we can use that. I'm not absolutely confident now that it will work
fine, but I'm going to start there. Thanks for pointing me this direction!

>> I don't like the idea a user code will be executed inside BTree operation
On the confluence page I've shown that a predicate passes as
TreeRowClosure. In this case you're right, any exception in a predicate
will lead to a CorruptedTreeException. But I see another legal way to
implement the predicate operation. BPlusTree.find accepts the X param that
passed to IO.getRow(). As I understand this param helps to control how much
returned row is filled. Then we can use it to return an object that
contains only basic info - link, pageAddr, offset. Then predicate operation
will be applied on the higher level on a cursor returned by a tree (like
H2TreeIndex does). It's safe to run user code there, we can handle
exceptions there.



On Wed, Apr 7, 2021 at 4:46 PM Andrey Mashenkov <an...@gmail.com>
wrote:

> Maksim,
>
> The ScanQuery API provides a filter as
> > param that for case of index query should be splitted on such conditions.
> > It looks like a non-trivial task.
> >
> ScanQuery, TextQuery and partially SQL query share the same infrastructure.
> I've thought we could extend, improve and reuse some ScanQuery code that
> already works fine: map query on topology, IO, batching.
> Add IndexCondition alongside the Filter, and abstract query executor from
> source (primary and secondary Indexes).
> Add a sorted merge algorithm to the query merge stage. It can be very
> useful also for TextQueries that suffers from the absence of sorted merge
> and a "limit' condition work incorrectly.
>
> If you think it will be too hard than creating from scratch, I'm ok.
>
> 3. Ignite creates a proxy object that is filled with objects that are
> > inlined. If a user tries to access a field that isn't inlined or not
> > indexed, then deserialization will start and Ignite will log.warn() about
> > that.
> >
> Agree, this can be faster.
> I don't like the idea a user code will be executed inside BTree operation,
> any exception can cause FailureHandler triggering and stop the node.
>
> There is one more thing that could be improved.
> ScanQuery now iterates over per-partition PK Hash index trees and has
> performance issues on a small grid with a large number of partitions.
> So, there are many partitions on every node and many trees should be
> scanned.
> In this case scan over a secondary index gives significant boots even if
> every row is materialized, because we need to traverse over a single tree
> per-node.
> Having the ability to run a ScanQuery over a secondary index (if one
> exists) instead of PK Hash will be great.
>
>
> On Wed, Apr 7, 2021 at 11:18 AM Maksim Timonin <ti...@gmail.com>
> wrote:
>
> > Hi, Andrey!
> >
> > Thanks for the review and your comments!
> >
> > >> Is it possible to extend ScanQuery functionality to pass index
> condition
> > I investigated this way and see some issues:
> > 1. Querying of indexes is not a scan actually. It's
> > a tree traverse (predicate operation is an exclusion, other operations
> like
> > gt, lt, min, max have explicit boundaries). An index query consists of
> > conditions that match an index structure. In general for a multi-key
> index
> > there can be multiple conditions. The ScanQuery API provides a filter as
> > param that for case of index query should be splitted on such conditions.
> > It looks like a non-trivial task.
> > 2. Querying of an index requires a sorted result, while The ScanQuery
> > doesn't matter about that. So there will be a different behavior of the
> > iterator for scanning a cache and querying indexes. It's not much to
> > implement I think, but it can make ScanQuery unclear for a user.
> >
> > Maybe it's a point to separate traverse (gt, lt, in, etc...) and scan
> > (predicate) index operations to different API. So there still will be a
> new
> > query type for the traversing.
> >
> > But we will introduce some inheritors for ScanQuery, like TableScanQuery
> > and IndexScanQuery, for scan and filter. Then the question is about
> > ordering, Cache and Table scans aren't ordered, but Index is. Then we can
> > introduce an optional param "order" for ScanQuery too.
> >
> > WDYT?
> >
> > >> Functional indices
> > >> This task looks like a huge one because the lifecycle of such classes
> > should be described first
> > I agree with you. That this part should be investigated deeper than I
> did.
> > So let's postpone discussion about functional indexes for a while. IEP-71
> > declares some phases, functional indexes are part of the 2nd phase, but
> > users will get new functionality already from the 1st phase. Then I'll
> dig
> > into things you mentioned. Thanks for pointing them out.
> >
> > >> IndexScan by the predicate is questionable
> > Also in comments to the IEP on the Confluence you mentioned about
> > deserialization that is required to get an object for predicate function.
> > Now I see it like that:
> > 1. The predicate should operate only with indexed fields;
> > 2. User win from predicate only if index is inlined properly (even a part
> > of rows aren't inlined due to varlen - it still can be faster then make a
> > ScanQuery);
> > 3. Ignite creates a proxy object that is filled with objects that are
> > inlined. If a user tries to access a field that isn't inlined or not
> > indexed, then deserialization will start and Ignite will log.warn() about
> > that.
> >
> > So, I think it's a valid use case. Is there smth I'm missing?
> >
> >
> >
> >
> >
> > On Tue, Apr 6, 2021 at 6:21 PM Andrey Mashenkov <
> > andrey.mashenkov@gmail.com>
> > wrote:
> >
> > > Hi Maksim,
> > >
> > > Nice idea, I'd like to see this feature in Ignite.
> > > The motivation is clear to me, it would be nice to have fast scans and
> > omit
> > > SQL overhead on planning, parsing and etc in some simple use-cases.
> > >
> > > I've left few minor comments to the IEP, but I have the next questions
> > > which answer I failed to find in IEP.
> > > 1. Is it possible to extend ScanQuery functionality to pass index
> > condition
> > > as a hint/parameter rather than create a separate query type?
> > > This allows a user to run a query over the particular table (for
> > > multi-table per cache case) and use an index for some type of
> conditions.
> > >
> > > 2. Functional indices, as you wrote, should use Functions distributed
> via
> > > peerClassLoading mechanics.
> > > This means there will no class with function on server sides and such
> > > classes are not persistent. Seems, they can survive grid restart.
> > > This task looks like a huge one because the lifecycle of such classes
> > > should be described first.
> > > Possible pitfalls are:
> > > * Durability. Function code MUST be persistent, to survive node restart
> > as
> > > there can be no guaranteed classes available on the server-side.
> > > * Consistency. Server (and maybe clients) nodes MUST have the same
> class
> > > code at a time.
> > > * Code ownership. Would class code be shared or per-cache? If first,
> you
> > > can't just change class code by loading a new one, because other caches
> > may
> > > use this function.
> > > If second, different caches may have different code/behavior, that may
> be
> > > non-obvious to end-user.
> > >
> > > 3. IndexScan by the predicate is questionable.
> > > Maybe it will can faster if there are multiple tables in a cache, but
> > looks
> > > similar to ScanQuery with a filter.
> > >
> > > Also, I believe we can have a common API (configuring, creating, using)
> > for
> > > all types of Indices, but
> > > some types (e.g. functional) will be ignored in SQL due to limited
> > support
> > > on H2 side,
> > > and other types will be shared and could be used by ScanQuery engine as
> > > well as by SQL engine.
> > >
> > > On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin <timonin.maxim@gmail.com
> >
> > > wrote:
> > >
> > > > Hi, Igniters!
> > > >
> > > > I'd like to propose a new feature - opportunity to query and create
> > > indexes
> > > > from public API.
> > > >
> > > > It will help in some cases, where:
> > > > 1. SQL is not applicable by design of user application;
> > > > 2. Where IndexScan is preferable than ScanQuery for performance
> > reasons;
> > > > 3. Functional indexes are required.
> > > >
> > > > Also it'll be great to have a transactional support for such queries,
> > > like
> > > > the "select for update" query provides. But I don't dig there much.
> It
> > > will
> > > > be a next step if this API will be implemented.
> > > >
> > > > I've prepared an IEP-71 for that [1] with more details. Please share
> > your
> > > > thoughts.
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrey V. Mashenkov
> > >
> >
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Andrey Mashenkov <an...@gmail.com>.

Maksim,

The ScanQuery API provides a filter as
> param that for case of index query should be splitted on such conditions.
> It looks like a non-trivial task.
>
ScanQuery, TextQuery and partially SQL query share the same infrastructure.
I've thought we could extend, improve and reuse some ScanQuery code that
already works fine: map query on topology, IO, batching.
Add IndexCondition alongside the Filter, and abstract query executor from
source (primary and secondary Indexes).
Add a sorted merge algorithm to the query merge stage. It can be very
useful also for TextQueries that suffers from the absence of sorted merge
and a "limit' condition work incorrectly.

If you think it will be too hard than creating from scratch, I'm ok.

3. Ignite creates a proxy object that is filled with objects that are
> inlined. If a user tries to access a field that isn't inlined or not
> indexed, then deserialization will start and Ignite will log.warn() about
> that.
>
Agree, this can be faster.
I don't like the idea a user code will be executed inside BTree operation,
any exception can cause FailureHandler triggering and stop the node.

There is one more thing that could be improved.
ScanQuery now iterates over per-partition PK Hash index trees and has
performance issues on a small grid with a large number of partitions.
So, there are many partitions on every node and many trees should be
scanned.
In this case scan over a secondary index gives significant boots even if
every row is materialized, because we need to traverse over a single tree
per-node.
Having the ability to run a ScanQuery over a secondary index (if one
exists) instead of PK Hash will be great.


On Wed, Apr 7, 2021 at 11:18 AM Maksim Timonin <ti...@gmail.com>
wrote:

> Hi, Andrey!
>
> Thanks for the review and your comments!
>
> >> Is it possible to extend ScanQuery functionality to pass index condition
> I investigated this way and see some issues:
> 1. Querying of indexes is not a scan actually. It's
> a tree traverse (predicate operation is an exclusion, other operations like
> gt, lt, min, max have explicit boundaries). An index query consists of
> conditions that match an index structure. In general for a multi-key index
> there can be multiple conditions. The ScanQuery API provides a filter as
> param that for case of index query should be splitted on such conditions.
> It looks like a non-trivial task.
> 2. Querying of an index requires a sorted result, while The ScanQuery
> doesn't matter about that. So there will be a different behavior of the
> iterator for scanning a cache and querying indexes. It's not much to
> implement I think, but it can make ScanQuery unclear for a user.
>
> Maybe it's a point to separate traverse (gt, lt, in, etc...) and scan
> (predicate) index operations to different API. So there still will be a new
> query type for the traversing.
>
> But we will introduce some inheritors for ScanQuery, like TableScanQuery
> and IndexScanQuery, for scan and filter. Then the question is about
> ordering, Cache and Table scans aren't ordered, but Index is. Then we can
> introduce an optional param "order" for ScanQuery too.
>
> WDYT?
>
> >> Functional indices
> >> This task looks like a huge one because the lifecycle of such classes
> should be described first
> I agree with you. That this part should be investigated deeper than I did.
> So let's postpone discussion about functional indexes for a while. IEP-71
> declares some phases, functional indexes are part of the 2nd phase, but
> users will get new functionality already from the 1st phase. Then I'll dig
> into things you mentioned. Thanks for pointing them out.
>
> >> IndexScan by the predicate is questionable
> Also in comments to the IEP on the Confluence you mentioned about
> deserialization that is required to get an object for predicate function.
> Now I see it like that:
> 1. The predicate should operate only with indexed fields;
> 2. User win from predicate only if index is inlined properly (even a part
> of rows aren't inlined due to varlen - it still can be faster then make a
> ScanQuery);
> 3. Ignite creates a proxy object that is filled with objects that are
> inlined. If a user tries to access a field that isn't inlined or not
> indexed, then deserialization will start and Ignite will log.warn() about
> that.
>
> So, I think it's a valid use case. Is there smth I'm missing?
>
>
>
>
>
> On Tue, Apr 6, 2021 at 6:21 PM Andrey Mashenkov <
> andrey.mashenkov@gmail.com>
> wrote:
>
> > Hi Maksim,
> >
> > Nice idea, I'd like to see this feature in Ignite.
> > The motivation is clear to me, it would be nice to have fast scans and
> omit
> > SQL overhead on planning, parsing and etc in some simple use-cases.
> >
> > I've left few minor comments to the IEP, but I have the next questions
> > which answer I failed to find in IEP.
> > 1. Is it possible to extend ScanQuery functionality to pass index
> condition
> > as a hint/parameter rather than create a separate query type?
> > This allows a user to run a query over the particular table (for
> > multi-table per cache case) and use an index for some type of conditions.
> >
> > 2. Functional indices, as you wrote, should use Functions distributed via
> > peerClassLoading mechanics.
> > This means there will no class with function on server sides and such
> > classes are not persistent. Seems, they can survive grid restart.
> > This task looks like a huge one because the lifecycle of such classes
> > should be described first.
> > Possible pitfalls are:
> > * Durability. Function code MUST be persistent, to survive node restart
> as
> > there can be no guaranteed classes available on the server-side.
> > * Consistency. Server (and maybe clients) nodes MUST have the same class
> > code at a time.
> > * Code ownership. Would class code be shared or per-cache? If first, you
> > can't just change class code by loading a new one, because other caches
> may
> > use this function.
> > If second, different caches may have different code/behavior, that may be
> > non-obvious to end-user.
> >
> > 3. IndexScan by the predicate is questionable.
> > Maybe it will can faster if there are multiple tables in a cache, but
> looks
> > similar to ScanQuery with a filter.
> >
> > Also, I believe we can have a common API (configuring, creating, using)
> for
> > all types of Indices, but
> > some types (e.g. functional) will be ignored in SQL due to limited
> support
> > on H2 side,
> > and other types will be shared and could be used by ScanQuery engine as
> > well as by SQL engine.
> >
> > On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin <ti...@gmail.com>
> > wrote:
> >
> > > Hi, Igniters!
> > >
> > > I'd like to propose a new feature - opportunity to query and create
> > indexes
> > > from public API.
> > >
> > > It will help in some cases, where:
> > > 1. SQL is not applicable by design of user application;
> > > 2. Where IndexScan is preferable than ScanQuery for performance
> reasons;
> > > 3. Functional indexes are required.
> > >
> > > Also it'll be great to have a transactional support for such queries,
> > like
> > > the "select for update" query provides. But I don't dig there much. It
> > will
> > > be a next step if this API will be implemented.
> > >
> > > I've prepared an IEP-71 for that [1] with more details. Please share
> your
> > > thoughts.
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
> > >
> >
> >
> > --
> > Best regards,
> > Andrey V. Mashenkov
> >
>


-- 
Best regards,
Andrey V. Mashenkov

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Maksim Timonin <ti...@gmail.com>.

Hi, Andrey!

Thanks for the review and your comments!

>> Is it possible to extend ScanQuery functionality to pass index condition
I investigated this way and see some issues:
1. Querying of indexes is not a scan actually. It's
a tree traverse (predicate operation is an exclusion, other operations like
gt, lt, min, max have explicit boundaries). An index query consists of
conditions that match an index structure. In general for a multi-key index
there can be multiple conditions. The ScanQuery API provides a filter as
param that for case of index query should be splitted on such conditions.
It looks like a non-trivial task.
2. Querying of an index requires a sorted result, while The ScanQuery
doesn't matter about that. So there will be a different behavior of the
iterator for scanning a cache and querying indexes. It's not much to
implement I think, but it can make ScanQuery unclear for a user.

Maybe it's a point to separate traverse (gt, lt, in, etc...) and scan
(predicate) index operations to different API. So there still will be a new
query type for the traversing.

But we will introduce some inheritors for ScanQuery, like TableScanQuery
and IndexScanQuery, for scan and filter. Then the question is about
ordering, Cache and Table scans aren't ordered, but Index is. Then we can
introduce an optional param "order" for ScanQuery too.

WDYT?

>> Functional indices
>> This task looks like a huge one because the lifecycle of such classes
should be described first
I agree with you. That this part should be investigated deeper than I did.
So let's postpone discussion about functional indexes for a while. IEP-71
declares some phases, functional indexes are part of the 2nd phase, but
users will get new functionality already from the 1st phase. Then I'll dig
into things you mentioned. Thanks for pointing them out.

>> IndexScan by the predicate is questionable
Also in comments to the IEP on the Confluence you mentioned about
deserialization that is required to get an object for predicate function.
Now I see it like that:
1. The predicate should operate only with indexed fields;
2. User win from predicate only if index is inlined properly (even a part
of rows aren't inlined due to varlen - it still can be faster then make a
ScanQuery);
3. Ignite creates a proxy object that is filled with objects that are
inlined. If a user tries to access a field that isn't inlined or not
indexed, then deserialization will start and Ignite will log.warn() about
that.

So, I think it's a valid use case. Is there smth I'm missing?





On Tue, Apr 6, 2021 at 6:21 PM Andrey Mashenkov <an...@gmail.com>
wrote:

> Hi Maksim,
>
> Nice idea, I'd like to see this feature in Ignite.
> The motivation is clear to me, it would be nice to have fast scans and omit
> SQL overhead on planning, parsing and etc in some simple use-cases.
>
> I've left few minor comments to the IEP, but I have the next questions
> which answer I failed to find in IEP.
> 1. Is it possible to extend ScanQuery functionality to pass index condition
> as a hint/parameter rather than create a separate query type?
> This allows a user to run a query over the particular table (for
> multi-table per cache case) and use an index for some type of conditions.
>
> 2. Functional indices, as you wrote, should use Functions distributed via
> peerClassLoading mechanics.
> This means there will no class with function on server sides and such
> classes are not persistent. Seems, they can survive grid restart.
> This task looks like a huge one because the lifecycle of such classes
> should be described first.
> Possible pitfalls are:
> * Durability. Function code MUST be persistent, to survive node restart as
> there can be no guaranteed classes available on the server-side.
> * Consistency. Server (and maybe clients) nodes MUST have the same class
> code at a time.
> * Code ownership. Would class code be shared or per-cache? If first, you
> can't just change class code by loading a new one, because other caches may
> use this function.
> If second, different caches may have different code/behavior, that may be
> non-obvious to end-user.
>
> 3. IndexScan by the predicate is questionable.
> Maybe it will can faster if there are multiple tables in a cache, but looks
> similar to ScanQuery with a filter.
>
> Also, I believe we can have a common API (configuring, creating, using) for
> all types of Indices, but
> some types (e.g. functional) will be ignored in SQL due to limited support
> on H2 side,
> and other types will be shared and could be used by ScanQuery engine as
> well as by SQL engine.
>
> On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin <ti...@gmail.com>
> wrote:
>
> > Hi, Igniters!
> >
> > I'd like to propose a new feature - opportunity to query and create
> indexes
> > from public API.
> >
> > It will help in some cases, where:
> > 1. SQL is not applicable by design of user application;
> > 2. Where IndexScan is preferable than ScanQuery for performance reasons;
> > 3. Functional indexes are required.
> >
> > Also it'll be great to have a transactional support for such queries,
> like
> > the "select for update" query provides. But I don't dig there much. It
> will
> > be a next step if this API will be implemented.
> >
> > I've prepared an IEP-71 for that [1] with more details. Please share your
> > thoughts.
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
> >
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Andrey Mashenkov <an...@gmail.com>.

Hi Maksim,

Nice idea, I'd like to see this feature in Ignite.
The motivation is clear to me, it would be nice to have fast scans and omit
SQL overhead on planning, parsing and etc in some simple use-cases.

I've left few minor comments to the IEP, but I have the next questions
which answer I failed to find in IEP.
1. Is it possible to extend ScanQuery functionality to pass index condition
as a hint/parameter rather than create a separate query type?
This allows a user to run a query over the particular table (for
multi-table per cache case) and use an index for some type of conditions.

2. Functional indices, as you wrote, should use Functions distributed via
peerClassLoading mechanics.
This means there will no class with function on server sides and such
classes are not persistent. Seems, they can survive grid restart.
This task looks like a huge one because the lifecycle of such classes
should be described first.
Possible pitfalls are:
* Durability. Function code MUST be persistent, to survive node restart as
there can be no guaranteed classes available on the server-side.
* Consistency. Server (and maybe clients) nodes MUST have the same class
code at a time.
* Code ownership. Would class code be shared or per-cache? If first, you
can't just change class code by loading a new one, because other caches may
use this function.
If second, different caches may have different code/behavior, that may be
non-obvious to end-user.

3. IndexScan by the predicate is questionable.
Maybe it will can faster if there are multiple tables in a cache, but looks
similar to ScanQuery with a filter.

Also, I believe we can have a common API (configuring, creating, using) for
all types of Indices, but
some types (e.g. functional) will be ignored in SQL due to limited support
on H2 side,
and other types will be shared and could be used by ScanQuery engine as
well as by SQL engine.

On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin <ti...@gmail.com>
wrote:

> Hi, Igniters!
>
> I'd like to propose a new feature - opportunity to query and create indexes
> from public API.
>
> It will help in some cases, where:
> 1. SQL is not applicable by design of user application;
> 2. Where IndexScan is preferable than ScanQuery for performance reasons;
> 3. Functional indexes are required.
>
> Also it'll be great to have a transactional support for such queries, like
> the "select for update" query provides. But I don't dig there much. It will
> be a next step if this API will be implemented.
>
> I've prepared an IEP-71 for that [1] with more details. Please share your
> thoughts.
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
>

-- 
Best regards,
Andrey V. Mashenkov

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Maksim Timonin <ti...@gmail.com>.

Hi!

The patch merged (IndexQuery with required index name). A lot of thanks to
Andrew Mashenkov, Taras Ledkov and Alex Plekhanov for their time and
reviews.

By the way, I'd like to up the discussion about the optionality of the
indexName param for creating IndexQuery. I believe that it's a useful
feature. My arguments are:
1. Ignite provides users the possibility to create an index without name
setting (with the QuerySqlField annotation, with QueryIndex for
QueryEntity). Then Ignite should provide the possibility to query indexes
the same way it uses for creating them.

2. Naming of the index isn't transparent:
- creating index name is hidden in internal packages: QueryUtils#indexName.
- the QuerySqlField annotation provides an opportunity to create an index
(set `index = true`), but it doesn't set name to index. Users will have to
replace QuerySqlField(index=true) with QuerySqlField(orderedGroups=) to set
the index name. This is a little bit misleading to use group index to set
naming for a single field index.

3. There is an example of using such API in other databases: MongoDB
doesn't require to specify an index name [1], it searches it by fields
only. Note, MongoDB doesn't require strict fields order for querying
indexes, but requires it for sorting [2].

When defining an index we enforce order of fields. QueryIndex uses
LinkedHashMap for that, QuerySqlField.Group uses the order param. But I
agree that it can be very inconvenient to fill IndexQuery with criteria in
the same order. Also cases when users support both indexes (A, B) and (B,
A) are pretty rare, and in this specific case one should set the index name
in any case to avoid misleading results. Also let's take in consideration
that there is already a case with MongoDB that doesn't restrict field
order.

So, I propose:
1. Allow create IndexQuery with list of criteria only, without specifying
name.
2. In this case iterate over all indexes created for specified value type,
and use the first that matches all criteria fields.


If there are no objections I'll prepare a patch for that. Igniters, WDYT?



[1] https://docs.mongodb.com/manual/indexes/
[2]
https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/#sort-on-multiple-fields



On Mon, Aug 30, 2021 at 5:52 PM Maksim Timonin <ti...@gmail.com>
wrote:

> Hi!
>
> It's a PR re-review request.
>
> I've reworked PR [1]. The major changes are:
> 1. indexName is a required param for building IndexQuery.
> 2. the order of criteria fields doesn't make sense - now it checks only
> that criteria is a prefix for index fields.
> 3. IndexQuery can accept a value type (only value class was before).
> Thanks a lot to Taras Ledkov for this comment, I missed this.
> 4. Also refactored code in places where Andrey Mashenkov put his comments.
>
> Also created a separate ticket [2] for building IndexQuery from a list of
> criteria fields only.
>
> [1] https://github.com/apache/ignite/pull/9118
> [2] https://issues.apache.org/jira/browse/IGNITE-15391
>
> On Fri, Aug 27, 2021 at 10:49 AM Maksim Timonin <ti...@gmail.com>
> wrote:
>
>> Hi!
>>
>> Thanks all for your comments!
>>
>> 1. I'm OK on the first step to make the indexName param required and not
>> check fields order for this case. So, I'll remove code with search index by
>> fields from current PR.
>>
>> 2. But I propose to create a ticket for possible improvement - IndexQuery
>> should be created with a list of fields only.
>> I believe that it's a useful feature. My main argument is still the same:
>> - Ignite provides to users the possibility to create an index without
>> name setting. Then Ignite should provide the possibility to query them.
>> - There is an example of using such API in other databases: as Ivan
>> mentioned, MongoDB doesn't require to specify an index name [1], it
>> searches it by fields only. Note, MongoDB doesn't require strict fields
>> order for querying indexes, but requires it for sorting [2].
>>
>> I don't see reasons not to provide a similar API for our users. Agree,
>> that order of fields is under discussion.
>>
>> So, let's postpone step 2, and discuss it later again, as currently it is
>> a blocker for Index API. If somebody sees strong arguments against this
>> proposal and my arguments, please write them.
>>
>> [1] https://docs.mongodb.com/manual/indexes/
>> [2]
>> https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/#sort-on-multiple-fields
>>
>>
>>
>>
>> On Thu, Aug 26, 2021 at 8:00 PM Courtney Robinson <
>> courtney.robinson@hypi.io> wrote:
>>
>>> Prefer 1 from Teras' response. Specifying index name is preferred.
>>> I've seen customers do idx(A,B) and idx(B,A) where semantics change
>>> between
>>> the two.
>>>
>>> Regards,
>>> Courtney Robinson
>>> Founder and CEO, Hypi
>>> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>>>
>>> <https://hypi.io>
>>> https://hypi.io
>>>
>>>
>>> On Thu, Aug 26, 2021 at 4:28 PM Taras Ledkov <tl...@gridgain.com>
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > My proposal:
>>> > 1. Don't search index by criteria, specify the index name always
>>> > (preferred).
>>> >
>>> > OR
>>> >
>>> > 2. Search index by criteria without check the order of criteriones.
>>> > Use the Set of criterions instead of the ordered collection.
>>> > In the strange case when the both index exist (a, b) and (b, a) - use
>>> > the any index
>>> > when index name isn't specified.
>>> >
>>> > On 26.08.2021 16:49, Maksim Timonin wrote:
>>> > > There are some thoughts about strict field order:
>>> > > 1. Index (A, B) is not equivalent to index (B, A). Some queries may
>>> have
>>> > > different performance on such indexes, and users have to specify the
>>> > right
>>> > > index. What if both indexes exist?
>>> > > 2. We should avoid cases when a user uses in query only field B for
>>> index
>>> > > (A, B). We have to force the user to specify range for (A) too, or
>>> > > explicitly set it (null, null). Otherwise it looks like a mistake.
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Thu, Aug 26, 2021 at 4:39 PM Ivan Daschinsky <ivandasch@gmail.com
>>> >
>>> > wrote:
>>> > >
>>> > >> 1. I suppose, that the next step is to implement the api for
>>> manually
>>> > >> creating index. I think that user wants to create index that will
>>> speed
>>> > up
>>> > >> his criteria base queries, so he or she will use the same criteria
>>> to
>>> > >> define the index. So no problem at all
>>> > >> 2. We should print warning or throws exception if there is not any
>>> index
>>> > >> that match specific criteria.
>>> > >>
>>> > >> BTW, Mongo DB doesn't make user to write index name in query. It
>>> just
>>> > >> works.
>>> > >>
>>> > >> чт, 26 авг. 2021 г., 15:52 Taras Ledkov <tl...@gridgain.com>:
>>> > >>
>>> > >>> Hi,
>>> > >>>
>>> > >>>> It is an usability nightmare to make user write index name in all
>>> > >> cases.
>>> > >>> I don't see any difference between specifying the index name and
>>> > >>> specifying the index fields in the right order.
>>> > >>> Do you see?
>>> > >>>
>>> > >>> Let's there is the index:
>>> > >>> idx_A_B ON TBL (A, B)
>>> > >>>
>>> > >>> Is it OK that the query like below doesn't math the index
>>> 'idx_A_B'?
>>> > >>> new IndexQuery<>(..)
>>> > >>>       .setCriteria(lt("b", 1), lt("a", 2));
>>> > >>>
>>> > >>> On 26.08.2021 15:23, Ivan Daschinsky wrote:
>>> > >>>> I am against to make user write index name. It is quite simple and
>>> > >>>> straightforward algorithm to match index to field names, so it is
>>> > >> strange
>>> > >>>> to compare it to sql engine optimizer.
>>> > >>>>
>>> > >>>> It is an usability nightmare to make user write index name in all
>>> > >> cases.
>>> > >>>> чт, 26 авг. 2021 г., 14:42 Maksim Timonin <
>>> timonin.maxim@gmail.com>:
>>> > >>>>
>>> > >>>>> Hi, Igniters!
>>> > >>>>>
>>> > >>>>> There is a discussion about how to specify an index to query
>>> with an
>>> > >>>>> IndexQuery [1]. Currently my PR provides 2 ways to specify index:
>>> > >>>>> 1. With a table and index name;
>>> > >>>>> 2. With a table and list of index fields (without index name). In
>>> > this
>>> > >>> case
>>> > >>>>> IndexQueryProcessor tries to find an index that matches table and
>>> > >> index
>>> > >>>>> fields in strict order (order of fields in criteria has to match
>>> the
>>> > >>> order
>>> > >>>>> of fields in index).
>>> > >>>>>
>>> > >>>>> Discussion is whether is the second approach valid?
>>> > >>>>>
>>> > >>>>> Pros:
>>> > >>>>> 1. Currently index name is an optional field for QueryIndex and
>>> > >>>>> QuerySqlField. Then users can create an index with a table and
>>> list
>>> > of
>>> > >>>>> fields. Then, we should provide an opportunity to define an
>>> index for
>>> > >>>>> querying the same way as we do for creating.
>>> > >>>>> 2. It's required to know the index name to query it (in case the
>>> > index
>>> > >>> was
>>> > >>>>> created without an explicit name). Users can find it and then
>>> use it
>>> > >> as
>>> > >>> a
>>> > >>>>> constant in code, but I see some troubles there:
>>> > >>>>> 2.1. Get index name by querying the system view INDEXES. Note,
>>> that
>>> > >>> system
>>> > >>>>> views are marked as an experimental feature [2].
>>> > >>>>> 2.2. There is a workaround to know an index name with EXPLAIN
>>> clause
>>> > >> for
>>> > >>>>> sql query that uses the required index (but it depends on SQL
>>> > >>> optimizer).
>>> > >>>>> 2.3. Users can use the index name builder, but it is in the
>>> > >>>>> internal package
>>> > >>>>>
>>> (org.apache.ignite.internal.processors.query.QueryUtils#indexName).
>>> > >>> Then it
>>> > >>>>> can be changed from version to version without warning, and then
>>> the
>>> > >>> user
>>> > >>>>> can't rely on it in code.
>>> > >>>>> 3. Name of the Primary Key index (_key_PK) is predefined and
>>> > hardcoded
>>> > >>> in
>>> > >>>>> Ignite. Users can't set it while creating it, and the name of PK
>>> > index
>>> > >>> is
>>> > >>>>> hardcoded in the internal package too
>>> (QueryUtils.PRIMARY_KEY_INDEX).
>>> > >>>>>
>>> > >>>>> Cons:
>>> > >>>>> 1. It's declared that IndexQuery avoids some SQL steps (like
>>> > planning,
>>> > >>>>> optimizer) in favor of speed. It looks like that looking for an
>>> index
>>> > >> by
>>> > >>>>> list of fields is the work of an optimizer.
>>> > >>>>> 2. It can be not obvious that the order of fields in a query has
>>> to
>>> > >>> match
>>> > >>>>> the order of fields in the related index. We should have it in
>>> mind
>>> > >> when
>>> > >>>>> building a query - there should be a check for order of fields
>>> before
>>> > >>>>> querying.
>>> > >>>>>
>>> > >>>>>   From my side, I think that arguments for enforcing usage of an
>>> > index
>>> > >>> name
>>> > >>>>> for queries are strong enough. But for me it's strange that it's
>>> > >>> possible
>>> > >>>>> to create an index without a name, but it's required to use name
>>> to
>>> > >>> query
>>> > >>>>> it. Also taking in consideration that there is no guaranteed way
>>> to
>>> > >> get
>>> > >>> an
>>> > >>>>> index name (or I don't know it).
>>> > >>>>>
>>> > >>>>> Igniters, what do you think?
>>> > >>>>> [1]
>>> https://github.com/apache/ignite/pull/9118#discussion_r642557531
>>> > >>>>> [2]
>>> > >>>
>>> https://ignite.apache.org/docs/latest/monitoring-metrics/system-views
>>> > >>>>>
>>> > >>>>> On Fri, Aug 6, 2021 at 4:04 PM Maksim Timonin <
>>> > >> timonin.maxim@gmail.com>
>>> > >>>>> wrote:
>>> > >>>>>
>>> > >>>>>> Hi, all!
>>> > >>>>>>
>>> > >>>>>> It's a gentle reminder. There is a PR for the new Index API
>>> [1]. It
>>> > >> was
>>> > >>>>>> approved by Alex Plekhanov. Does anybody want to review this API
>>> > too?
>>> > >>> If
>>> > >>>>>> there won't be objections we're going to merge it Monday, 16th
>>> of
>>> > >>> August.
>>> > >>>>>> Thanks!
>>> > >>>>>>
>>> > >>>>>> [1] https://github.com/apache/ignite/pull/9118
>>> > >>>>>>
>>> > >>>>>> On Fri, May 21, 2021 at 10:43 PM Maksim Timonin <
>>> > >>> timonin.maxim@gmail.com
>>> > >>>>>> wrote:
>>> > >>>>>>
>>> > >>>>>>> Andrey, hi!
>>> > >>>>>>>
>>> > >>>>>>> Some updates, there.
>>> > >>>>>>>
>>> > >>>>>>> I've submitted a PR for IndexQuery [1]. There is an issue about
>>> > lazy
>>> > >>>>> page
>>> > >>>>>>> loading, that is also related to Text query ticket
>>> IGNITE-12291.
>>> > >>>>>>>
>>> > >>>>>>> CacheQueries already have pending pages functionality, it's
>>> done
>>> > >> with
>>> > >>>>>>> multiple sending GridCacheQueryRequest. There was an issue with
>>> > >>>>> TextQuery
>>> > >>>>>>> and limit, after exceeding a limit we still send requests, so I
>>> > >>>>> submitted a
>>> > >>>>>>> patch to fix this [2].
>>> > >>>>>>>
>>> > >>>>>>> But currently, TextQuery, as SqlFieldsQuery also does, prepares
>>> > >> whole
>>> > >>>>>>> data on query request, holds it, and provides a cursor over
>>> this
>>> > >>>>>>> collection.
>>> > >>>>>>>
>>> > >>>>>>> As I understand you correctly, you propose to run TextQuery
>>> over
>>> > >> index
>>> > >>>>>>> with every poll page request. We can do this with Lucene
>>> > >>>>>>> IndexSearcher.searchAfter. So from one side, it will save
>>> > resources.
>>> > >>> But
>>> > >>>>>>> from the other side, no queries (no TextQuery, no
>>> SqlFieldsQuery)
>>> > >> lock
>>> > >>>>>>> index for querying. So there can be data inconsistency, as
>>> there
>>> > can
>>> > >>> be
>>> > >>>>>>> concurrent operations on an index while a user iterates over
>>> the
>>> > >>>>> cursor. It
>>> > >>>>>>> also could be for queries now, due to no index lock being
>>> there,
>>> > but
>>> > >>> the
>>> > >>>>>>> window of time of such inconsistency is much shorter.
>>> > >>>>>>>
>>> > >>>>>>> The same dilemma I have for IndexQuery. In my patch [1] I
>>> provide
>>> > >> lazy
>>> > >>>>>>> iteration over BPlusTree. There is no lock on an index too
>>> while
>>> > >>>>> querying.
>>> > >>>>>>> And I want to discuss the right way. I have in mind the next
>>> > things:
>>> > >>>>>>> 1. Indexes currently doesn't support transactions, also SQL
>>> queries
>>> > >>>>> don't
>>> > >>>>>>> lock index for queries, so Ignite don't guarantee data
>>> consistency;
>>> > >>>>>>> 2. As I understand preparing whole data for SQL queries is
>>> required
>>> > >>> due
>>> > >>>>>>> to relations between tables. The more complex query and
>>> relations
>>> > we
>>> > >>>>> have,
>>> > >>>>>>> the much consistency issues we have in result in case of
>>> parallel
>>> > >>>>>>> operations;
>>> > >>>>>>> 3. Querying a single index only (by TextQuery or IndexQuery)
>>> > doesn't
>>> > >>>>>>> affect any relations, so we can allow concurrent updates, as it
>>> > >> could
>>> > >>>>>>> affect a query result but it doesn't hurt.
>>> > >>>>>>>
>>> > >>>>>>> And following these thoughts, it's right to implement lazy
>>> > >> iterations
>>> > >>>>>>> over indexes. What do you think?
>>> > >>>>>>>
>>> > >>>>>>> Also, there is a second topic to discuss. BPlusTree indexes
>>> support
>>> > >>>>> query
>>> > >>>>>>> parallelism. But CacheQueries don't. There needs to be a
>>> change to
>>> > >>>>>>> infrastructure to support query parallelism, so on this patch
>>> [1] I
>>> > >>>>> handle
>>> > >>>>>>> multiple segments in a single thread. And this works OK, as in
>>> the
>>> > >>> case
>>> > >>>>> of
>>> > >>>>>>> lazy querying it's very fast to initialize a cursor, so there
>>> is
>>> > not
>>> > >>>>> much
>>> > >>>>>>> overhead on multiple segments. I ran performance tests and
>>> found
>>> > >> that
>>> > >>> in
>>> > >>>>>>> some cases, IndexQuery beats SqlFieldsQuery even with enabled
>>> > >>>>>>> queryParallelism (it helps a SqlFieldsQuery much). So the need
>>> for
>>> > >>>>>>> supporting queryParallelism for IndexQuery is required to be
>>> tested
>>> > >>>>> well.
>>> > >>>>>>> As IndexQuery already can help users to speed up some queries I
>>> > >>> propose
>>> > >>>>> to
>>> > >>>>>>> check queryParallelism a little bit later. WDYT?
>>> > >>>>>>>
>>> > >>>>>>> So, those 2 things affect the Apache Ignite release that
>>> IndexQuery
>>> > >>> will
>>> > >>>>>>> be delivered with. So, please let me know your thoughts.
>>> > >>>>>>>
>>> > >>>>>>> Any thoughts from the community are welcome too.
>>> > >>>>>>>
>>> > >>>>>>>
>>> > >>>>>>> [1] https://github.com/apache/ignite/pull/9118
>>> > >>>>>>> [2] https://github.com/apache/ignite/pull/9086
>>> > >>>>>>>
>>> > >>>>>>> On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <
>>> > >>> timonin.maxim@gmail.com
>>> > >>>>>>> wrote:
>>> > >>>>>>>
>>> > >>>>>>>> Andrey,
>>> > >>>>>>>>
>>> > >>>>>>>> Thanks! I picked it.
>>> > >>>>>>>>
>>> > >>>>>>>> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <
>>> > >>>>> timonin.maxim@gmail.com>
>>> > >>>>>>>> wrote:
>>> > >>>>>>>>
>>> > >>>>>>>>> Stephen,
>>> > >>>>>>>>>
>>> > >>>>>>>>> I don't see a reason to replace or deprecate IndexingSpi.
>>> I'm not
>>> > >>>>>>>>> sure how smbd uses it, but it works now.
>>> > >>>>>>>>>
>>> > >>>>>>>>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
>>> > >>>>>>>>> stephen.darlington@gridgain.com> wrote:
>>> > >>>>>>>>>
>>> > >>>>>>>>>> Is this a replacement for IndexingSpi? Put bluntly, do we
>>> > >> deprecate
>>> > >>>>>>>>>> (and remove) it?
>>> > >>>>>>>>>>
>>> > >>>>>>>>>> Or do you see them as complimentary?
>>> > >>>>>>>>>>
>>> > >>>>>>>>>>> On 12 Apr 2021, at 11:29, Maksim Timonin <
>>> > >> timonin.maxim@gmail.com
>>> > >>>>>>>>>> wrote:
>>> > >>>>>>>>>>> Hi Stephen!
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>> Please have a look at the QueryProcessing paragraph [1].
>>> I've
>>> > >>>>>>>>>> described
>>> > >>>>>>>>>>> why IndexingSpi doesn't fit us well.
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>> [1]
>>> > >>>>>>>>>>>
>>> > >>
>>> >
>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
>>> > >>>>>>>>>>> On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
>>> > >>>>>>>>>>> stephen.darlington@gridgain.com> wrote:
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>>> How does this fit with the current IndexingSpi?
>>> Superficially
>>> > >>> they
>>> > >>>>>>>>>> appear
>>> > >>>>>>>>>>>> to do very similar things?
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>> Regards,
>>> > >>>>>>>>>>>> Stephen
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>>> On 6 Apr 2021, at 14:13, Maksim Timonin <
>>> > >>> timonin.maxim@gmail.com
>>> > >>>>>>>>>> wrote:
>>> > >>>>>>>>>>>>> Hi, Igniters!
>>> > >>>>>>>>>>>>>
>>> > >>>>>>>>>>>>> I'd like to propose a new feature - opportunity to query
>>> and
>>> > >>>>> create
>>> > >>>>>>>>>>>> indexes
>>> > >>>>>>>>>>>>> from public API.
>>> > >>>>>>>>>>>>>
>>> > >>>>>>>>>>>>> It will help in some cases, where:
>>> > >>>>>>>>>>>>> 1. SQL is not applicable by design of user application;
>>> > >>>>>>>>>>>>> 2. Where IndexScan is preferable than ScanQuery for
>>> > >> performance
>>> > >>>>>>>>>> reasons;
>>> > >>>>>>>>>>>>> 3. Functional indexes are required.
>>> > >>>>>>>>>>>>>
>>> > >>>>>>>>>>>>> Also it'll be great to have a transactional support for
>>> such
>>> > >>>>>>>>>> queries,
>>> > >>>>>>>>>>>> like
>>> > >>>>>>>>>>>>> the "select for update" query provides. But I don't dig
>>> there
>>> > >>>>>>>>>> much. It
>>> > >>>>>>>>>>>> will
>>> > >>>>>>>>>>>>> be a next step if this API will be implemented.
>>> > >>>>>>>>>>>>>
>>> > >>>>>>>>>>>>> I've prepared an IEP-71 for that [1] with more details.
>>> > Please
>>> > >>>>>>>>>> share your
>>> > >>>>>>>>>>>>> thoughts.
>>> > >>>>>>>>>>>>>
>>> > >>>>>>>>>>>>>
>>> > >>>>>>>>>>>>> [1]
>>> > >>>>>>>>>>>>>
>>> > >>
>>> >
>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>
>>> > >>> --
>>> > >>> Taras Ledkov
>>> > >>> Mail-To: tledkov@gridgain.com
>>> > >>>
>>> > >>>
>>> > --
>>> > Taras Ledkov
>>> > Mail-To: tledkov@gridgain.com
>>> >
>>> >
>>>
>>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Maksim Timonin <ti...@gmail.com>.

Hi!

It's a PR re-review request.

I've reworked PR [1]. The major changes are:
1. indexName is a required param for building IndexQuery.
2. the order of criteria fields doesn't make sense - now it checks only
that criteria is a prefix for index fields.
3. IndexQuery can accept a value type (only value class was before). Thanks
a lot to Taras Ledkov for this comment, I missed this.
4. Also refactored code in places where Andrey Mashenkov put his comments.

Also created a separate ticket [2] for building IndexQuery from a list of
criteria fields only.

[1] https://github.com/apache/ignite/pull/9118
[2] https://issues.apache.org/jira/browse/IGNITE-15391

On Fri, Aug 27, 2021 at 10:49 AM Maksim Timonin <ti...@gmail.com>
wrote:

> Hi!
>
> Thanks all for your comments!
>
> 1. I'm OK on the first step to make the indexName param required and not
> check fields order for this case. So, I'll remove code with search index by
> fields from current PR.
>
> 2. But I propose to create a ticket for possible improvement - IndexQuery
> should be created with a list of fields only.
> I believe that it's a useful feature. My main argument is still the same:
> - Ignite provides to users the possibility to create an index without name
> setting. Then Ignite should provide the possibility to query them.
> - There is an example of using such API in other databases: as Ivan
> mentioned, MongoDB doesn't require to specify an index name [1], it
> searches it by fields only. Note, MongoDB doesn't require strict fields
> order for querying indexes, but requires it for sorting [2].
>
> I don't see reasons not to provide a similar API for our users. Agree,
> that order of fields is under discussion.
>
> So, let's postpone step 2, and discuss it later again, as currently it is
> a blocker for Index API. If somebody sees strong arguments against this
> proposal and my arguments, please write them.
>
> [1] https://docs.mongodb.com/manual/indexes/
> [2]
> https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/#sort-on-multiple-fields
>
>
>
>
> On Thu, Aug 26, 2021 at 8:00 PM Courtney Robinson <
> courtney.robinson@hypi.io> wrote:
>
>> Prefer 1 from Teras' response. Specifying index name is preferred.
>> I've seen customers do idx(A,B) and idx(B,A) where semantics change
>> between
>> the two.
>>
>> Regards,
>> Courtney Robinson
>> Founder and CEO, Hypi
>> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>>
>> <https://hypi.io>
>> https://hypi.io
>>
>>
>> On Thu, Aug 26, 2021 at 4:28 PM Taras Ledkov <tl...@gridgain.com>
>> wrote:
>>
>> > Hi,
>> >
>> > My proposal:
>> > 1. Don't search index by criteria, specify the index name always
>> > (preferred).
>> >
>> > OR
>> >
>> > 2. Search index by criteria without check the order of criteriones.
>> > Use the Set of criterions instead of the ordered collection.
>> > In the strange case when the both index exist (a, b) and (b, a) - use
>> > the any index
>> > when index name isn't specified.
>> >
>> > On 26.08.2021 16:49, Maksim Timonin wrote:
>> > > There are some thoughts about strict field order:
>> > > 1. Index (A, B) is not equivalent to index (B, A). Some queries may
>> have
>> > > different performance on such indexes, and users have to specify the
>> > right
>> > > index. What if both indexes exist?
>> > > 2. We should avoid cases when a user uses in query only field B for
>> index
>> > > (A, B). We have to force the user to specify range for (A) too, or
>> > > explicitly set it (null, null). Otherwise it looks like a mistake.
>> > >
>> > >
>> > >
>> > >
>> > > On Thu, Aug 26, 2021 at 4:39 PM Ivan Daschinsky <iv...@gmail.com>
>> > wrote:
>> > >
>> > >> 1. I suppose, that the next step is to implement the api for manually
>> > >> creating index. I think that user wants to create index that will
>> speed
>> > up
>> > >> his criteria base queries, so he or she will use the same criteria to
>> > >> define the index. So no problem at all
>> > >> 2. We should print warning or throws exception if there is not any
>> index
>> > >> that match specific criteria.
>> > >>
>> > >> BTW, Mongo DB doesn't make user to write index name in query. It just
>> > >> works.
>> > >>
>> > >> чт, 26 авг. 2021 г., 15:52 Taras Ledkov <tl...@gridgain.com>:
>> > >>
>> > >>> Hi,
>> > >>>
>> > >>>> It is an usability nightmare to make user write index name in all
>> > >> cases.
>> > >>> I don't see any difference between specifying the index name and
>> > >>> specifying the index fields in the right order.
>> > >>> Do you see?
>> > >>>
>> > >>> Let's there is the index:
>> > >>> idx_A_B ON TBL (A, B)
>> > >>>
>> > >>> Is it OK that the query like below doesn't math the index 'idx_A_B'?
>> > >>> new IndexQuery<>(..)
>> > >>>       .setCriteria(lt("b", 1), lt("a", 2));
>> > >>>
>> > >>> On 26.08.2021 15:23, Ivan Daschinsky wrote:
>> > >>>> I am against to make user write index name. It is quite simple and
>> > >>>> straightforward algorithm to match index to field names, so it is
>> > >> strange
>> > >>>> to compare it to sql engine optimizer.
>> > >>>>
>> > >>>> It is an usability nightmare to make user write index name in all
>> > >> cases.
>> > >>>> чт, 26 авг. 2021 г., 14:42 Maksim Timonin <timonin.maxim@gmail.com
>> >:
>> > >>>>
>> > >>>>> Hi, Igniters!
>> > >>>>>
>> > >>>>> There is a discussion about how to specify an index to query with
>> an
>> > >>>>> IndexQuery [1]. Currently my PR provides 2 ways to specify index:
>> > >>>>> 1. With a table and index name;
>> > >>>>> 2. With a table and list of index fields (without index name). In
>> > this
>> > >>> case
>> > >>>>> IndexQueryProcessor tries to find an index that matches table and
>> > >> index
>> > >>>>> fields in strict order (order of fields in criteria has to match
>> the
>> > >>> order
>> > >>>>> of fields in index).
>> > >>>>>
>> > >>>>> Discussion is whether is the second approach valid?
>> > >>>>>
>> > >>>>> Pros:
>> > >>>>> 1. Currently index name is an optional field for QueryIndex and
>> > >>>>> QuerySqlField. Then users can create an index with a table and
>> list
>> > of
>> > >>>>> fields. Then, we should provide an opportunity to define an index
>> for
>> > >>>>> querying the same way as we do for creating.
>> > >>>>> 2. It's required to know the index name to query it (in case the
>> > index
>> > >>> was
>> > >>>>> created without an explicit name). Users can find it and then use
>> it
>> > >> as
>> > >>> a
>> > >>>>> constant in code, but I see some troubles there:
>> > >>>>> 2.1. Get index name by querying the system view INDEXES. Note,
>> that
>> > >>> system
>> > >>>>> views are marked as an experimental feature [2].
>> > >>>>> 2.2. There is a workaround to know an index name with EXPLAIN
>> clause
>> > >> for
>> > >>>>> sql query that uses the required index (but it depends on SQL
>> > >>> optimizer).
>> > >>>>> 2.3. Users can use the index name builder, but it is in the
>> > >>>>> internal package
>> > >>>>>
>> (org.apache.ignite.internal.processors.query.QueryUtils#indexName).
>> > >>> Then it
>> > >>>>> can be changed from version to version without warning, and then
>> the
>> > >>> user
>> > >>>>> can't rely on it in code.
>> > >>>>> 3. Name of the Primary Key index (_key_PK) is predefined and
>> > hardcoded
>> > >>> in
>> > >>>>> Ignite. Users can't set it while creating it, and the name of PK
>> > index
>> > >>> is
>> > >>>>> hardcoded in the internal package too
>> (QueryUtils.PRIMARY_KEY_INDEX).
>> > >>>>>
>> > >>>>> Cons:
>> > >>>>> 1. It's declared that IndexQuery avoids some SQL steps (like
>> > planning,
>> > >>>>> optimizer) in favor of speed. It looks like that looking for an
>> index
>> > >> by
>> > >>>>> list of fields is the work of an optimizer.
>> > >>>>> 2. It can be not obvious that the order of fields in a query has
>> to
>> > >>> match
>> > >>>>> the order of fields in the related index. We should have it in
>> mind
>> > >> when
>> > >>>>> building a query - there should be a check for order of fields
>> before
>> > >>>>> querying.
>> > >>>>>
>> > >>>>>   From my side, I think that arguments for enforcing usage of an
>> > index
>> > >>> name
>> > >>>>> for queries are strong enough. But for me it's strange that it's
>> > >>> possible
>> > >>>>> to create an index without a name, but it's required to use name
>> to
>> > >>> query
>> > >>>>> it. Also taking in consideration that there is no guaranteed way
>> to
>> > >> get
>> > >>> an
>> > >>>>> index name (or I don't know it).
>> > >>>>>
>> > >>>>> Igniters, what do you think?
>> > >>>>> [1]
>> https://github.com/apache/ignite/pull/9118#discussion_r642557531
>> > >>>>> [2]
>> > >>>
>> https://ignite.apache.org/docs/latest/monitoring-metrics/system-views
>> > >>>>>
>> > >>>>> On Fri, Aug 6, 2021 at 4:04 PM Maksim Timonin <
>> > >> timonin.maxim@gmail.com>
>> > >>>>> wrote:
>> > >>>>>
>> > >>>>>> Hi, all!
>> > >>>>>>
>> > >>>>>> It's a gentle reminder. There is a PR for the new Index API [1].
>> It
>> > >> was
>> > >>>>>> approved by Alex Plekhanov. Does anybody want to review this API
>> > too?
>> > >>> If
>> > >>>>>> there won't be objections we're going to merge it Monday, 16th of
>> > >>> August.
>> > >>>>>> Thanks!
>> > >>>>>>
>> > >>>>>> [1] https://github.com/apache/ignite/pull/9118
>> > >>>>>>
>> > >>>>>> On Fri, May 21, 2021 at 10:43 PM Maksim Timonin <
>> > >>> timonin.maxim@gmail.com
>> > >>>>>> wrote:
>> > >>>>>>
>> > >>>>>>> Andrey, hi!
>> > >>>>>>>
>> > >>>>>>> Some updates, there.
>> > >>>>>>>
>> > >>>>>>> I've submitted a PR for IndexQuery [1]. There is an issue about
>> > lazy
>> > >>>>> page
>> > >>>>>>> loading, that is also related to Text query ticket IGNITE-12291.
>> > >>>>>>>
>> > >>>>>>> CacheQueries already have pending pages functionality, it's done
>> > >> with
>> > >>>>>>> multiple sending GridCacheQueryRequest. There was an issue with
>> > >>>>> TextQuery
>> > >>>>>>> and limit, after exceeding a limit we still send requests, so I
>> > >>>>> submitted a
>> > >>>>>>> patch to fix this [2].
>> > >>>>>>>
>> > >>>>>>> But currently, TextQuery, as SqlFieldsQuery also does, prepares
>> > >> whole
>> > >>>>>>> data on query request, holds it, and provides a cursor over this
>> > >>>>>>> collection.
>> > >>>>>>>
>> > >>>>>>> As I understand you correctly, you propose to run TextQuery over
>> > >> index
>> > >>>>>>> with every poll page request. We can do this with Lucene
>> > >>>>>>> IndexSearcher.searchAfter. So from one side, it will save
>> > resources.
>> > >>> But
>> > >>>>>>> from the other side, no queries (no TextQuery, no
>> SqlFieldsQuery)
>> > >> lock
>> > >>>>>>> index for querying. So there can be data inconsistency, as there
>> > can
>> > >>> be
>> > >>>>>>> concurrent operations on an index while a user iterates over the
>> > >>>>> cursor. It
>> > >>>>>>> also could be for queries now, due to no index lock being there,
>> > but
>> > >>> the
>> > >>>>>>> window of time of such inconsistency is much shorter.
>> > >>>>>>>
>> > >>>>>>> The same dilemma I have for IndexQuery. In my patch [1] I
>> provide
>> > >> lazy
>> > >>>>>>> iteration over BPlusTree. There is no lock on an index too while
>> > >>>>> querying.
>> > >>>>>>> And I want to discuss the right way. I have in mind the next
>> > things:
>> > >>>>>>> 1. Indexes currently doesn't support transactions, also SQL
>> queries
>> > >>>>> don't
>> > >>>>>>> lock index for queries, so Ignite don't guarantee data
>> consistency;
>> > >>>>>>> 2. As I understand preparing whole data for SQL queries is
>> required
>> > >>> due
>> > >>>>>>> to relations between tables. The more complex query and
>> relations
>> > we
>> > >>>>> have,
>> > >>>>>>> the much consistency issues we have in result in case of
>> parallel
>> > >>>>>>> operations;
>> > >>>>>>> 3. Querying a single index only (by TextQuery or IndexQuery)
>> > doesn't
>> > >>>>>>> affect any relations, so we can allow concurrent updates, as it
>> > >> could
>> > >>>>>>> affect a query result but it doesn't hurt.
>> > >>>>>>>
>> > >>>>>>> And following these thoughts, it's right to implement lazy
>> > >> iterations
>> > >>>>>>> over indexes. What do you think?
>> > >>>>>>>
>> > >>>>>>> Also, there is a second topic to discuss. BPlusTree indexes
>> support
>> > >>>>> query
>> > >>>>>>> parallelism. But CacheQueries don't. There needs to be a change
>> to
>> > >>>>>>> infrastructure to support query parallelism, so on this patch
>> [1] I
>> > >>>>> handle
>> > >>>>>>> multiple segments in a single thread. And this works OK, as in
>> the
>> > >>> case
>> > >>>>> of
>> > >>>>>>> lazy querying it's very fast to initialize a cursor, so there is
>> > not
>> > >>>>> much
>> > >>>>>>> overhead on multiple segments. I ran performance tests and found
>> > >> that
>> > >>> in
>> > >>>>>>> some cases, IndexQuery beats SqlFieldsQuery even with enabled
>> > >>>>>>> queryParallelism (it helps a SqlFieldsQuery much). So the need
>> for
>> > >>>>>>> supporting queryParallelism for IndexQuery is required to be
>> tested
>> > >>>>> well.
>> > >>>>>>> As IndexQuery already can help users to speed up some queries I
>> > >>> propose
>> > >>>>> to
>> > >>>>>>> check queryParallelism a little bit later. WDYT?
>> > >>>>>>>
>> > >>>>>>> So, those 2 things affect the Apache Ignite release that
>> IndexQuery
>> > >>> will
>> > >>>>>>> be delivered with. So, please let me know your thoughts.
>> > >>>>>>>
>> > >>>>>>> Any thoughts from the community are welcome too.
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>> [1] https://github.com/apache/ignite/pull/9118
>> > >>>>>>> [2] https://github.com/apache/ignite/pull/9086
>> > >>>>>>>
>> > >>>>>>> On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <
>> > >>> timonin.maxim@gmail.com
>> > >>>>>>> wrote:
>> > >>>>>>>
>> > >>>>>>>> Andrey,
>> > >>>>>>>>
>> > >>>>>>>> Thanks! I picked it.
>> > >>>>>>>>
>> > >>>>>>>> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <
>> > >>>>> timonin.maxim@gmail.com>
>> > >>>>>>>> wrote:
>> > >>>>>>>>
>> > >>>>>>>>> Stephen,
>> > >>>>>>>>>
>> > >>>>>>>>> I don't see a reason to replace or deprecate IndexingSpi. I'm
>> not
>> > >>>>>>>>> sure how smbd uses it, but it works now.
>> > >>>>>>>>>
>> > >>>>>>>>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
>> > >>>>>>>>> stephen.darlington@gridgain.com> wrote:
>> > >>>>>>>>>
>> > >>>>>>>>>> Is this a replacement for IndexingSpi? Put bluntly, do we
>> > >> deprecate
>> > >>>>>>>>>> (and remove) it?
>> > >>>>>>>>>>
>> > >>>>>>>>>> Or do you see them as complimentary?
>> > >>>>>>>>>>
>> > >>>>>>>>>>> On 12 Apr 2021, at 11:29, Maksim Timonin <
>> > >> timonin.maxim@gmail.com
>> > >>>>>>>>>> wrote:
>> > >>>>>>>>>>> Hi Stephen!
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Please have a look at the QueryProcessing paragraph [1].
>> I've
>> > >>>>>>>>>> described
>> > >>>>>>>>>>> why IndexingSpi doesn't fit us well.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> [1]
>> > >>>>>>>>>>>
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
>> > >>>>>>>>>>> On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
>> > >>>>>>>>>>> stephen.darlington@gridgain.com> wrote:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>> How does this fit with the current IndexingSpi?
>> Superficially
>> > >>> they
>> > >>>>>>>>>> appear
>> > >>>>>>>>>>>> to do very similar things?
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> Regards,
>> > >>>>>>>>>>>> Stephen
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>> On 6 Apr 2021, at 14:13, Maksim Timonin <
>> > >>> timonin.maxim@gmail.com
>> > >>>>>>>>>> wrote:
>> > >>>>>>>>>>>>> Hi, Igniters!
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> I'd like to propose a new feature - opportunity to query
>> and
>> > >>>>> create
>> > >>>>>>>>>>>> indexes
>> > >>>>>>>>>>>>> from public API.
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> It will help in some cases, where:
>> > >>>>>>>>>>>>> 1. SQL is not applicable by design of user application;
>> > >>>>>>>>>>>>> 2. Where IndexScan is preferable than ScanQuery for
>> > >> performance
>> > >>>>>>>>>> reasons;
>> > >>>>>>>>>>>>> 3. Functional indexes are required.
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> Also it'll be great to have a transactional support for
>> such
>> > >>>>>>>>>> queries,
>> > >>>>>>>>>>>> like
>> > >>>>>>>>>>>>> the "select for update" query provides. But I don't dig
>> there
>> > >>>>>>>>>> much. It
>> > >>>>>>>>>>>> will
>> > >>>>>>>>>>>>> be a next step if this API will be implemented.
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> I've prepared an IEP-71 for that [1] with more details.
>> > Please
>> > >>>>>>>>>> share your
>> > >>>>>>>>>>>>> thoughts.
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> [1]
>> > >>>>>>>>>>>>>
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>
>> > >>> --
>> > >>> Taras Ledkov
>> > >>> Mail-To: tledkov@gridgain.com
>> > >>>
>> > >>>
>> > --
>> > Taras Ledkov
>> > Mail-To: tledkov@gridgain.com
>> >
>> >
>>
>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Maksim Timonin <ti...@gmail.com>.

Hi!

Thanks all for your comments!

1. I'm OK on the first step to make the indexName param required and not
check fields order for this case. So, I'll remove code with search index by
fields from current PR.

2. But I propose to create a ticket for possible improvement - IndexQuery
should be created with a list of fields only.
I believe that it's a useful feature. My main argument is still the same:
- Ignite provides to users the possibility to create an index without name
setting. Then Ignite should provide the possibility to query them.
- There is an example of using such API in other databases: as Ivan
mentioned, MongoDB doesn't require to specify an index name [1], it
searches it by fields only. Note, MongoDB doesn't require strict fields
order for querying indexes, but requires it for sorting [2].

I don't see reasons not to provide a similar API for our users. Agree, that
order of fields is under discussion.

So, let's postpone step 2, and discuss it later again, as currently it is a
blocker for Index API. If somebody sees strong arguments against this
proposal and my arguments, please write them.

[1] https://docs.mongodb.com/manual/indexes/
[2]
https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/#sort-on-multiple-fields




On Thu, Aug 26, 2021 at 8:00 PM Courtney Robinson <co...@hypi.io>
wrote:

> Prefer 1 from Teras' response. Specifying index name is preferred.
> I've seen customers do idx(A,B) and idx(B,A) where semantics change between
> the two.
>
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>
> <https://hypi.io>
> https://hypi.io
>
>
> On Thu, Aug 26, 2021 at 4:28 PM Taras Ledkov <tl...@gridgain.com> wrote:
>
> > Hi,
> >
> > My proposal:
> > 1. Don't search index by criteria, specify the index name always
> > (preferred).
> >
> > OR
> >
> > 2. Search index by criteria without check the order of criteriones.
> > Use the Set of criterions instead of the ordered collection.
> > In the strange case when the both index exist (a, b) and (b, a) - use
> > the any index
> > when index name isn't specified.
> >
> > On 26.08.2021 16:49, Maksim Timonin wrote:
> > > There are some thoughts about strict field order:
> > > 1. Index (A, B) is not equivalent to index (B, A). Some queries may
> have
> > > different performance on such indexes, and users have to specify the
> > right
> > > index. What if both indexes exist?
> > > 2. We should avoid cases when a user uses in query only field B for
> index
> > > (A, B). We have to force the user to specify range for (A) too, or
> > > explicitly set it (null, null). Otherwise it looks like a mistake.
> > >
> > >
> > >
> > >
> > > On Thu, Aug 26, 2021 at 4:39 PM Ivan Daschinsky <iv...@gmail.com>
> > wrote:
> > >
> > >> 1. I suppose, that the next step is to implement the api for manually
> > >> creating index. I think that user wants to create index that will
> speed
> > up
> > >> his criteria base queries, so he or she will use the same criteria to
> > >> define the index. So no problem at all
> > >> 2. We should print warning or throws exception if there is not any
> index
> > >> that match specific criteria.
> > >>
> > >> BTW, Mongo DB doesn't make user to write index name in query. It just
> > >> works.
> > >>
> > >> чт, 26 авг. 2021 г., 15:52 Taras Ledkov <tl...@gridgain.com>:
> > >>
> > >>> Hi,
> > >>>
> > >>>> It is an usability nightmare to make user write index name in all
> > >> cases.
> > >>> I don't see any difference between specifying the index name and
> > >>> specifying the index fields in the right order.
> > >>> Do you see?
> > >>>
> > >>> Let's there is the index:
> > >>> idx_A_B ON TBL (A, B)
> > >>>
> > >>> Is it OK that the query like below doesn't math the index 'idx_A_B'?
> > >>> new IndexQuery<>(..)
> > >>>       .setCriteria(lt("b", 1), lt("a", 2));
> > >>>
> > >>> On 26.08.2021 15:23, Ivan Daschinsky wrote:
> > >>>> I am against to make user write index name. It is quite simple and
> > >>>> straightforward algorithm to match index to field names, so it is
> > >> strange
> > >>>> to compare it to sql engine optimizer.
> > >>>>
> > >>>> It is an usability nightmare to make user write index name in all
> > >> cases.
> > >>>> чт, 26 авг. 2021 г., 14:42 Maksim Timonin <timonin.maxim@gmail.com
> >:
> > >>>>
> > >>>>> Hi, Igniters!
> > >>>>>
> > >>>>> There is a discussion about how to specify an index to query with
> an
> > >>>>> IndexQuery [1]. Currently my PR provides 2 ways to specify index:
> > >>>>> 1. With a table and index name;
> > >>>>> 2. With a table and list of index fields (without index name). In
> > this
> > >>> case
> > >>>>> IndexQueryProcessor tries to find an index that matches table and
> > >> index
> > >>>>> fields in strict order (order of fields in criteria has to match
> the
> > >>> order
> > >>>>> of fields in index).
> > >>>>>
> > >>>>> Discussion is whether is the second approach valid?
> > >>>>>
> > >>>>> Pros:
> > >>>>> 1. Currently index name is an optional field for QueryIndex and
> > >>>>> QuerySqlField. Then users can create an index with a table and list
> > of
> > >>>>> fields. Then, we should provide an opportunity to define an index
> for
> > >>>>> querying the same way as we do for creating.
> > >>>>> 2. It's required to know the index name to query it (in case the
> > index
> > >>> was
> > >>>>> created without an explicit name). Users can find it and then use
> it
> > >> as
> > >>> a
> > >>>>> constant in code, but I see some troubles there:
> > >>>>> 2.1. Get index name by querying the system view INDEXES. Note, that
> > >>> system
> > >>>>> views are marked as an experimental feature [2].
> > >>>>> 2.2. There is a workaround to know an index name with EXPLAIN
> clause
> > >> for
> > >>>>> sql query that uses the required index (but it depends on SQL
> > >>> optimizer).
> > >>>>> 2.3. Users can use the index name builder, but it is in the
> > >>>>> internal package
> > >>>>> (org.apache.ignite.internal.processors.query.QueryUtils#indexName).
> > >>> Then it
> > >>>>> can be changed from version to version without warning, and then
> the
> > >>> user
> > >>>>> can't rely on it in code.
> > >>>>> 3. Name of the Primary Key index (_key_PK) is predefined and
> > hardcoded
> > >>> in
> > >>>>> Ignite. Users can't set it while creating it, and the name of PK
> > index
> > >>> is
> > >>>>> hardcoded in the internal package too
> (QueryUtils.PRIMARY_KEY_INDEX).
> > >>>>>
> > >>>>> Cons:
> > >>>>> 1. It's declared that IndexQuery avoids some SQL steps (like
> > planning,
> > >>>>> optimizer) in favor of speed. It looks like that looking for an
> index
> > >> by
> > >>>>> list of fields is the work of an optimizer.
> > >>>>> 2. It can be not obvious that the order of fields in a query has to
> > >>> match
> > >>>>> the order of fields in the related index. We should have it in mind
> > >> when
> > >>>>> building a query - there should be a check for order of fields
> before
> > >>>>> querying.
> > >>>>>
> > >>>>>   From my side, I think that arguments for enforcing usage of an
> > index
> > >>> name
> > >>>>> for queries are strong enough. But for me it's strange that it's
> > >>> possible
> > >>>>> to create an index without a name, but it's required to use name to
> > >>> query
> > >>>>> it. Also taking in consideration that there is no guaranteed way to
> > >> get
> > >>> an
> > >>>>> index name (or I don't know it).
> > >>>>>
> > >>>>> Igniters, what do you think?
> > >>>>> [1]
> https://github.com/apache/ignite/pull/9118#discussion_r642557531
> > >>>>> [2]
> > >>>
> https://ignite.apache.org/docs/latest/monitoring-metrics/system-views
> > >>>>>
> > >>>>> On Fri, Aug 6, 2021 at 4:04 PM Maksim Timonin <
> > >> timonin.maxim@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi, all!
> > >>>>>>
> > >>>>>> It's a gentle reminder. There is a PR for the new Index API [1].
> It
> > >> was
> > >>>>>> approved by Alex Plekhanov. Does anybody want to review this API
> > too?
> > >>> If
> > >>>>>> there won't be objections we're going to merge it Monday, 16th of
> > >>> August.
> > >>>>>> Thanks!
> > >>>>>>
> > >>>>>> [1] https://github.com/apache/ignite/pull/9118
> > >>>>>>
> > >>>>>> On Fri, May 21, 2021 at 10:43 PM Maksim Timonin <
> > >>> timonin.maxim@gmail.com
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Andrey, hi!
> > >>>>>>>
> > >>>>>>> Some updates, there.
> > >>>>>>>
> > >>>>>>> I've submitted a PR for IndexQuery [1]. There is an issue about
> > lazy
> > >>>>> page
> > >>>>>>> loading, that is also related to Text query ticket IGNITE-12291.
> > >>>>>>>
> > >>>>>>> CacheQueries already have pending pages functionality, it's done
> > >> with
> > >>>>>>> multiple sending GridCacheQueryRequest. There was an issue with
> > >>>>> TextQuery
> > >>>>>>> and limit, after exceeding a limit we still send requests, so I
> > >>>>> submitted a
> > >>>>>>> patch to fix this [2].
> > >>>>>>>
> > >>>>>>> But currently, TextQuery, as SqlFieldsQuery also does, prepares
> > >> whole
> > >>>>>>> data on query request, holds it, and provides a cursor over this
> > >>>>>>> collection.
> > >>>>>>>
> > >>>>>>> As I understand you correctly, you propose to run TextQuery over
> > >> index
> > >>>>>>> with every poll page request. We can do this with Lucene
> > >>>>>>> IndexSearcher.searchAfter. So from one side, it will save
> > resources.
> > >>> But
> > >>>>>>> from the other side, no queries (no TextQuery, no SqlFieldsQuery)
> > >> lock
> > >>>>>>> index for querying. So there can be data inconsistency, as there
> > can
> > >>> be
> > >>>>>>> concurrent operations on an index while a user iterates over the
> > >>>>> cursor. It
> > >>>>>>> also could be for queries now, due to no index lock being there,
> > but
> > >>> the
> > >>>>>>> window of time of such inconsistency is much shorter.
> > >>>>>>>
> > >>>>>>> The same dilemma I have for IndexQuery. In my patch [1] I provide
> > >> lazy
> > >>>>>>> iteration over BPlusTree. There is no lock on an index too while
> > >>>>> querying.
> > >>>>>>> And I want to discuss the right way. I have in mind the next
> > things:
> > >>>>>>> 1. Indexes currently doesn't support transactions, also SQL
> queries
> > >>>>> don't
> > >>>>>>> lock index for queries, so Ignite don't guarantee data
> consistency;
> > >>>>>>> 2. As I understand preparing whole data for SQL queries is
> required
> > >>> due
> > >>>>>>> to relations between tables. The more complex query and relations
> > we
> > >>>>> have,
> > >>>>>>> the much consistency issues we have in result in case of parallel
> > >>>>>>> operations;
> > >>>>>>> 3. Querying a single index only (by TextQuery or IndexQuery)
> > doesn't
> > >>>>>>> affect any relations, so we can allow concurrent updates, as it
> > >> could
> > >>>>>>> affect a query result but it doesn't hurt.
> > >>>>>>>
> > >>>>>>> And following these thoughts, it's right to implement lazy
> > >> iterations
> > >>>>>>> over indexes. What do you think?
> > >>>>>>>
> > >>>>>>> Also, there is a second topic to discuss. BPlusTree indexes
> support
> > >>>>> query
> > >>>>>>> parallelism. But CacheQueries don't. There needs to be a change
> to
> > >>>>>>> infrastructure to support query parallelism, so on this patch
> [1] I
> > >>>>> handle
> > >>>>>>> multiple segments in a single thread. And this works OK, as in
> the
> > >>> case
> > >>>>> of
> > >>>>>>> lazy querying it's very fast to initialize a cursor, so there is
> > not
> > >>>>> much
> > >>>>>>> overhead on multiple segments. I ran performance tests and found
> > >> that
> > >>> in
> > >>>>>>> some cases, IndexQuery beats SqlFieldsQuery even with enabled
> > >>>>>>> queryParallelism (it helps a SqlFieldsQuery much). So the need
> for
> > >>>>>>> supporting queryParallelism for IndexQuery is required to be
> tested
> > >>>>> well.
> > >>>>>>> As IndexQuery already can help users to speed up some queries I
> > >>> propose
> > >>>>> to
> > >>>>>>> check queryParallelism a little bit later. WDYT?
> > >>>>>>>
> > >>>>>>> So, those 2 things affect the Apache Ignite release that
> IndexQuery
> > >>> will
> > >>>>>>> be delivered with. So, please let me know your thoughts.
> > >>>>>>>
> > >>>>>>> Any thoughts from the community are welcome too.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> [1] https://github.com/apache/ignite/pull/9118
> > >>>>>>> [2] https://github.com/apache/ignite/pull/9086
> > >>>>>>>
> > >>>>>>> On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <
> > >>> timonin.maxim@gmail.com
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Andrey,
> > >>>>>>>>
> > >>>>>>>> Thanks! I picked it.
> > >>>>>>>>
> > >>>>>>>> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <
> > >>>>> timonin.maxim@gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Stephen,
> > >>>>>>>>>
> > >>>>>>>>> I don't see a reason to replace or deprecate IndexingSpi. I'm
> not
> > >>>>>>>>> sure how smbd uses it, but it works now.
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
> > >>>>>>>>> stephen.darlington@gridgain.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Is this a replacement for IndexingSpi? Put bluntly, do we
> > >> deprecate
> > >>>>>>>>>> (and remove) it?
> > >>>>>>>>>>
> > >>>>>>>>>> Or do you see them as complimentary?
> > >>>>>>>>>>
> > >>>>>>>>>>> On 12 Apr 2021, at 11:29, Maksim Timonin <
> > >> timonin.maxim@gmail.com
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>> Hi Stephen!
> > >>>>>>>>>>>
> > >>>>>>>>>>> Please have a look at the QueryProcessing paragraph [1]. I've
> > >>>>>>>>>> described
> > >>>>>>>>>>> why IndexingSpi doesn't fit us well.
> > >>>>>>>>>>>
> > >>>>>>>>>>> [1]
> > >>>>>>>>>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
> > >>>>>>>>>>> On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
> > >>>>>>>>>>> stephen.darlington@gridgain.com> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> How does this fit with the current IndexingSpi?
> Superficially
> > >>> they
> > >>>>>>>>>> appear
> > >>>>>>>>>>>> to do very similar things?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Regards,
> > >>>>>>>>>>>> Stephen
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> On 6 Apr 2021, at 14:13, Maksim Timonin <
> > >>> timonin.maxim@gmail.com
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>>> Hi, Igniters!
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I'd like to propose a new feature - opportunity to query
> and
> > >>>>> create
> > >>>>>>>>>>>> indexes
> > >>>>>>>>>>>>> from public API.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> It will help in some cases, where:
> > >>>>>>>>>>>>> 1. SQL is not applicable by design of user application;
> > >>>>>>>>>>>>> 2. Where IndexScan is preferable than ScanQuery for
> > >> performance
> > >>>>>>>>>> reasons;
> > >>>>>>>>>>>>> 3. Functional indexes are required.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Also it'll be great to have a transactional support for
> such
> > >>>>>>>>>> queries,
> > >>>>>>>>>>>> like
> > >>>>>>>>>>>>> the "select for update" query provides. But I don't dig
> there
> > >>>>>>>>>> much. It
> > >>>>>>>>>>>> will
> > >>>>>>>>>>>>> be a next step if this API will be implemented.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I've prepared an IEP-71 for that [1] with more details.
> > Please
> > >>>>>>>>>> share your
> > >>>>>>>>>>>>> thoughts.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> [1]
> > >>>>>>>>>>>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
> > >>>>>>>>>>>>
> > >>>>>>>>>>
> > >>> --
> > >>> Taras Ledkov
> > >>> Mail-To: tledkov@gridgain.com
> > >>>
> > >>>
> > --
> > Taras Ledkov
> > Mail-To: tledkov@gridgain.com
> >
> >
>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Courtney Robinson <co...@hypi.io>.

Prefer 1 from Teras' response. Specifying index name is preferred.
I've seen customers do idx(A,B) and idx(B,A) where semantics change between
the two.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Thu, Aug 26, 2021 at 4:28 PM Taras Ledkov <tl...@gridgain.com> wrote:

> Hi,
>
> My proposal:
> 1. Don't search index by criteria, specify the index name always
> (preferred).
>
> OR
>
> 2. Search index by criteria without check the order of criteriones.
> Use the Set of criterions instead of the ordered collection.
> In the strange case when the both index exist (a, b) and (b, a) - use
> the any index
> when index name isn't specified.
>
> On 26.08.2021 16:49, Maksim Timonin wrote:
> > There are some thoughts about strict field order:
> > 1. Index (A, B) is not equivalent to index (B, A). Some queries may have
> > different performance on such indexes, and users have to specify the
> right
> > index. What if both indexes exist?
> > 2. We should avoid cases when a user uses in query only field B for index
> > (A, B). We have to force the user to specify range for (A) too, or
> > explicitly set it (null, null). Otherwise it looks like a mistake.
> >
> >
> >
> >
> > On Thu, Aug 26, 2021 at 4:39 PM Ivan Daschinsky <iv...@gmail.com>
> wrote:
> >
> >> 1. I suppose, that the next step is to implement the api for manually
> >> creating index. I think that user wants to create index that will speed
> up
> >> his criteria base queries, so he or she will use the same criteria to
> >> define the index. So no problem at all
> >> 2. We should print warning or throws exception if there is not any index
> >> that match specific criteria.
> >>
> >> BTW, Mongo DB doesn't make user to write index name in query. It just
> >> works.
> >>
> >> чт, 26 авг. 2021 г., 15:52 Taras Ledkov <tl...@gridgain.com>:
> >>
> >>> Hi,
> >>>
> >>>> It is an usability nightmare to make user write index name in all
> >> cases.
> >>> I don't see any difference between specifying the index name and
> >>> specifying the index fields in the right order.
> >>> Do you see?
> >>>
> >>> Let's there is the index:
> >>> idx_A_B ON TBL (A, B)
> >>>
> >>> Is it OK that the query like below doesn't math the index 'idx_A_B'?
> >>> new IndexQuery<>(..)
> >>>       .setCriteria(lt("b", 1), lt("a", 2));
> >>>
> >>> On 26.08.2021 15:23, Ivan Daschinsky wrote:
> >>>> I am against to make user write index name. It is quite simple and
> >>>> straightforward algorithm to match index to field names, so it is
> >> strange
> >>>> to compare it to sql engine optimizer.
> >>>>
> >>>> It is an usability nightmare to make user write index name in all
> >> cases.
> >>>> чт, 26 авг. 2021 г., 14:42 Maksim Timonin <ti...@gmail.com>:
> >>>>
> >>>>> Hi, Igniters!
> >>>>>
> >>>>> There is a discussion about how to specify an index to query with an
> >>>>> IndexQuery [1]. Currently my PR provides 2 ways to specify index:
> >>>>> 1. With a table and index name;
> >>>>> 2. With a table and list of index fields (without index name). In
> this
> >>> case
> >>>>> IndexQueryProcessor tries to find an index that matches table and
> >> index
> >>>>> fields in strict order (order of fields in criteria has to match the
> >>> order
> >>>>> of fields in index).
> >>>>>
> >>>>> Discussion is whether is the second approach valid?
> >>>>>
> >>>>> Pros:
> >>>>> 1. Currently index name is an optional field for QueryIndex and
> >>>>> QuerySqlField. Then users can create an index with a table and list
> of
> >>>>> fields. Then, we should provide an opportunity to define an index for
> >>>>> querying the same way as we do for creating.
> >>>>> 2. It's required to know the index name to query it (in case the
> index
> >>> was
> >>>>> created without an explicit name). Users can find it and then use it
> >> as
> >>> a
> >>>>> constant in code, but I see some troubles there:
> >>>>> 2.1. Get index name by querying the system view INDEXES. Note, that
> >>> system
> >>>>> views are marked as an experimental feature [2].
> >>>>> 2.2. There is a workaround to know an index name with EXPLAIN clause
> >> for
> >>>>> sql query that uses the required index (but it depends on SQL
> >>> optimizer).
> >>>>> 2.3. Users can use the index name builder, but it is in the
> >>>>> internal package
> >>>>> (org.apache.ignite.internal.processors.query.QueryUtils#indexName).
> >>> Then it
> >>>>> can be changed from version to version without warning, and then the
> >>> user
> >>>>> can't rely on it in code.
> >>>>> 3. Name of the Primary Key index (_key_PK) is predefined and
> hardcoded
> >>> in
> >>>>> Ignite. Users can't set it while creating it, and the name of PK
> index
> >>> is
> >>>>> hardcoded in the internal package too (QueryUtils.PRIMARY_KEY_INDEX).
> >>>>>
> >>>>> Cons:
> >>>>> 1. It's declared that IndexQuery avoids some SQL steps (like
> planning,
> >>>>> optimizer) in favor of speed. It looks like that looking for an index
> >> by
> >>>>> list of fields is the work of an optimizer.
> >>>>> 2. It can be not obvious that the order of fields in a query has to
> >>> match
> >>>>> the order of fields in the related index. We should have it in mind
> >> when
> >>>>> building a query - there should be a check for order of fields before
> >>>>> querying.
> >>>>>
> >>>>>   From my side, I think that arguments for enforcing usage of an
> index
> >>> name
> >>>>> for queries are strong enough. But for me it's strange that it's
> >>> possible
> >>>>> to create an index without a name, but it's required to use name to
> >>> query
> >>>>> it. Also taking in consideration that there is no guaranteed way to
> >> get
> >>> an
> >>>>> index name (or I don't know it).
> >>>>>
> >>>>> Igniters, what do you think?
> >>>>> [1] https://github.com/apache/ignite/pull/9118#discussion_r642557531
> >>>>> [2]
> >>> https://ignite.apache.org/docs/latest/monitoring-metrics/system-views
> >>>>>
> >>>>> On Fri, Aug 6, 2021 at 4:04 PM Maksim Timonin <
> >> timonin.maxim@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi, all!
> >>>>>>
> >>>>>> It's a gentle reminder. There is a PR for the new Index API [1]. It
> >> was
> >>>>>> approved by Alex Plekhanov. Does anybody want to review this API
> too?
> >>> If
> >>>>>> there won't be objections we're going to merge it Monday, 16th of
> >>> August.
> >>>>>> Thanks!
> >>>>>>
> >>>>>> [1] https://github.com/apache/ignite/pull/9118
> >>>>>>
> >>>>>> On Fri, May 21, 2021 at 10:43 PM Maksim Timonin <
> >>> timonin.maxim@gmail.com
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Andrey, hi!
> >>>>>>>
> >>>>>>> Some updates, there.
> >>>>>>>
> >>>>>>> I've submitted a PR for IndexQuery [1]. There is an issue about
> lazy
> >>>>> page
> >>>>>>> loading, that is also related to Text query ticket IGNITE-12291.
> >>>>>>>
> >>>>>>> CacheQueries already have pending pages functionality, it's done
> >> with
> >>>>>>> multiple sending GridCacheQueryRequest. There was an issue with
> >>>>> TextQuery
> >>>>>>> and limit, after exceeding a limit we still send requests, so I
> >>>>> submitted a
> >>>>>>> patch to fix this [2].
> >>>>>>>
> >>>>>>> But currently, TextQuery, as SqlFieldsQuery also does, prepares
> >> whole
> >>>>>>> data on query request, holds it, and provides a cursor over this
> >>>>>>> collection.
> >>>>>>>
> >>>>>>> As I understand you correctly, you propose to run TextQuery over
> >> index
> >>>>>>> with every poll page request. We can do this with Lucene
> >>>>>>> IndexSearcher.searchAfter. So from one side, it will save
> resources.
> >>> But
> >>>>>>> from the other side, no queries (no TextQuery, no SqlFieldsQuery)
> >> lock
> >>>>>>> index for querying. So there can be data inconsistency, as there
> can
> >>> be
> >>>>>>> concurrent operations on an index while a user iterates over the
> >>>>> cursor. It
> >>>>>>> also could be for queries now, due to no index lock being there,
> but
> >>> the
> >>>>>>> window of time of such inconsistency is much shorter.
> >>>>>>>
> >>>>>>> The same dilemma I have for IndexQuery. In my patch [1] I provide
> >> lazy
> >>>>>>> iteration over BPlusTree. There is no lock on an index too while
> >>>>> querying.
> >>>>>>> And I want to discuss the right way. I have in mind the next
> things:
> >>>>>>> 1. Indexes currently doesn't support transactions, also SQL queries
> >>>>> don't
> >>>>>>> lock index for queries, so Ignite don't guarantee data consistency;
> >>>>>>> 2. As I understand preparing whole data for SQL queries is required
> >>> due
> >>>>>>> to relations between tables. The more complex query and relations
> we
> >>>>> have,
> >>>>>>> the much consistency issues we have in result in case of parallel
> >>>>>>> operations;
> >>>>>>> 3. Querying a single index only (by TextQuery or IndexQuery)
> doesn't
> >>>>>>> affect any relations, so we can allow concurrent updates, as it
> >> could
> >>>>>>> affect a query result but it doesn't hurt.
> >>>>>>>
> >>>>>>> And following these thoughts, it's right to implement lazy
> >> iterations
> >>>>>>> over indexes. What do you think?
> >>>>>>>
> >>>>>>> Also, there is a second topic to discuss. BPlusTree indexes support
> >>>>> query
> >>>>>>> parallelism. But CacheQueries don't. There needs to be a change to
> >>>>>>> infrastructure to support query parallelism, so on this patch [1] I
> >>>>> handle
> >>>>>>> multiple segments in a single thread. And this works OK, as in the
> >>> case
> >>>>> of
> >>>>>>> lazy querying it's very fast to initialize a cursor, so there is
> not
> >>>>> much
> >>>>>>> overhead on multiple segments. I ran performance tests and found
> >> that
> >>> in
> >>>>>>> some cases, IndexQuery beats SqlFieldsQuery even with enabled
> >>>>>>> queryParallelism (it helps a SqlFieldsQuery much). So the need for
> >>>>>>> supporting queryParallelism for IndexQuery is required to be tested
> >>>>> well.
> >>>>>>> As IndexQuery already can help users to speed up some queries I
> >>> propose
> >>>>> to
> >>>>>>> check queryParallelism a little bit later. WDYT?
> >>>>>>>
> >>>>>>> So, those 2 things affect the Apache Ignite release that IndexQuery
> >>> will
> >>>>>>> be delivered with. So, please let me know your thoughts.
> >>>>>>>
> >>>>>>> Any thoughts from the community are welcome too.
> >>>>>>>
> >>>>>>>
> >>>>>>> [1] https://github.com/apache/ignite/pull/9118
> >>>>>>> [2] https://github.com/apache/ignite/pull/9086
> >>>>>>>
> >>>>>>> On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <
> >>> timonin.maxim@gmail.com
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Andrey,
> >>>>>>>>
> >>>>>>>> Thanks! I picked it.
> >>>>>>>>
> >>>>>>>> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <
> >>>>> timonin.maxim@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Stephen,
> >>>>>>>>>
> >>>>>>>>> I don't see a reason to replace or deprecate IndexingSpi. I'm not
> >>>>>>>>> sure how smbd uses it, but it works now.
> >>>>>>>>>
> >>>>>>>>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
> >>>>>>>>> stephen.darlington@gridgain.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> Is this a replacement for IndexingSpi? Put bluntly, do we
> >> deprecate
> >>>>>>>>>> (and remove) it?
> >>>>>>>>>>
> >>>>>>>>>> Or do you see them as complimentary?
> >>>>>>>>>>
> >>>>>>>>>>> On 12 Apr 2021, at 11:29, Maksim Timonin <
> >> timonin.maxim@gmail.com
> >>>>>>>>>> wrote:
> >>>>>>>>>>> Hi Stephen!
> >>>>>>>>>>>
> >>>>>>>>>>> Please have a look at the QueryProcessing paragraph [1]. I've
> >>>>>>>>>> described
> >>>>>>>>>>> why IndexingSpi doesn't fit us well.
> >>>>>>>>>>>
> >>>>>>>>>>> [1]
> >>>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
> >>>>>>>>>>> On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
> >>>>>>>>>>> stephen.darlington@gridgain.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> How does this fit with the current IndexingSpi? Superficially
> >>> they
> >>>>>>>>>> appear
> >>>>>>>>>>>> to do very similar things?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>> Stephen
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On 6 Apr 2021, at 14:13, Maksim Timonin <
> >>> timonin.maxim@gmail.com
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>> Hi, Igniters!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'd like to propose a new feature - opportunity to query and
> >>>>> create
> >>>>>>>>>>>> indexes
> >>>>>>>>>>>>> from public API.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It will help in some cases, where:
> >>>>>>>>>>>>> 1. SQL is not applicable by design of user application;
> >>>>>>>>>>>>> 2. Where IndexScan is preferable than ScanQuery for
> >> performance
> >>>>>>>>>> reasons;
> >>>>>>>>>>>>> 3. Functional indexes are required.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Also it'll be great to have a transactional support for such
> >>>>>>>>>> queries,
> >>>>>>>>>>>> like
> >>>>>>>>>>>>> the "select for update" query provides. But I don't dig there
> >>>>>>>>>> much. It
> >>>>>>>>>>>> will
> >>>>>>>>>>>>> be a next step if this API will be implemented.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I've prepared an IEP-71 for that [1] with more details.
> Please
> >>>>>>>>>> share your
> >>>>>>>>>>>>> thoughts.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>> --
> >>> Taras Ledkov
> >>> Mail-To: tledkov@gridgain.com
> >>>
> >>>
> --
> Taras Ledkov
> Mail-To: tledkov@gridgain.com
>
>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Taras Ledkov <tl...@gridgain.com>.

Hi,

My proposal:
1. Don't search index by criteria, specify the index name always 
(preferred).

OR

2. Search index by criteria without check the order of criteriones.
Use the Set of criterions instead of the ordered collection.
In the strange case when the both index exist (a, b) and (b, a) - use 
the any index
when index name isn't specified.

On 26.08.2021 16:49, Maksim Timonin wrote:
> There are some thoughts about strict field order:
> 1. Index (A, B) is not equivalent to index (B, A). Some queries may have
> different performance on such indexes, and users have to specify the right
> index. What if both indexes exist?
> 2. We should avoid cases when a user uses in query only field B for index
> (A, B). We have to force the user to specify range for (A) too, or
> explicitly set it (null, null). Otherwise it looks like a mistake.
>
>
>
>
> On Thu, Aug 26, 2021 at 4:39 PM Ivan Daschinsky <iv...@gmail.com> wrote:
>
>> 1. I suppose, that the next step is to implement the api for manually
>> creating index. I think that user wants to create index that will speed up
>> his criteria base queries, so he or she will use the same criteria to
>> define the index. So no problem at all
>> 2. We should print warning or throws exception if there is not any index
>> that match specific criteria.
>>
>> BTW, Mongo DB doesn't make user to write index name in query. It just
>> works.
>>
>> чт, 26 авг. 2021 г., 15:52 Taras Ledkov <tl...@gridgain.com>:
>>
>>> Hi,
>>>
>>>> It is an usability nightmare to make user write index name in all
>> cases.
>>> I don't see any difference between specifying the index name and
>>> specifying the index fields in the right order.
>>> Do you see?
>>>
>>> Let's there is the index:
>>> idx_A_B ON TBL (A, B)
>>>
>>> Is it OK that the query like below doesn't math the index 'idx_A_B'?
>>> new IndexQuery<>(..)
>>>       .setCriteria(lt("b", 1), lt("a", 2));
>>>
>>> On 26.08.2021 15:23, Ivan Daschinsky wrote:
>>>> I am against to make user write index name. It is quite simple and
>>>> straightforward algorithm to match index to field names, so it is
>> strange
>>>> to compare it to sql engine optimizer.
>>>>
>>>> It is an usability nightmare to make user write index name in all
>> cases.
>>>> чт, 26 авг. 2021 г., 14:42 Maksim Timonin <ti...@gmail.com>:
>>>>
>>>>> Hi, Igniters!
>>>>>
>>>>> There is a discussion about how to specify an index to query with an
>>>>> IndexQuery [1]. Currently my PR provides 2 ways to specify index:
>>>>> 1. With a table and index name;
>>>>> 2. With a table and list of index fields (without index name). In this
>>> case
>>>>> IndexQueryProcessor tries to find an index that matches table and
>> index
>>>>> fields in strict order (order of fields in criteria has to match the
>>> order
>>>>> of fields in index).
>>>>>
>>>>> Discussion is whether is the second approach valid?
>>>>>
>>>>> Pros:
>>>>> 1. Currently index name is an optional field for QueryIndex and
>>>>> QuerySqlField. Then users can create an index with a table and list of
>>>>> fields. Then, we should provide an opportunity to define an index for
>>>>> querying the same way as we do for creating.
>>>>> 2. It's required to know the index name to query it (in case the index
>>> was
>>>>> created without an explicit name). Users can find it and then use it
>> as
>>> a
>>>>> constant in code, but I see some troubles there:
>>>>> 2.1. Get index name by querying the system view INDEXES. Note, that
>>> system
>>>>> views are marked as an experimental feature [2].
>>>>> 2.2. There is a workaround to know an index name with EXPLAIN clause
>> for
>>>>> sql query that uses the required index (but it depends on SQL
>>> optimizer).
>>>>> 2.3. Users can use the index name builder, but it is in the
>>>>> internal package
>>>>> (org.apache.ignite.internal.processors.query.QueryUtils#indexName).
>>> Then it
>>>>> can be changed from version to version without warning, and then the
>>> user
>>>>> can't rely on it in code.
>>>>> 3. Name of the Primary Key index (_key_PK) is predefined and hardcoded
>>> in
>>>>> Ignite. Users can't set it while creating it, and the name of PK index
>>> is
>>>>> hardcoded in the internal package too (QueryUtils.PRIMARY_KEY_INDEX).
>>>>>
>>>>> Cons:
>>>>> 1. It's declared that IndexQuery avoids some SQL steps (like planning,
>>>>> optimizer) in favor of speed. It looks like that looking for an index
>> by
>>>>> list of fields is the work of an optimizer.
>>>>> 2. It can be not obvious that the order of fields in a query has to
>>> match
>>>>> the order of fields in the related index. We should have it in mind
>> when
>>>>> building a query - there should be a check for order of fields before
>>>>> querying.
>>>>>
>>>>>   From my side, I think that arguments for enforcing usage of an index
>>> name
>>>>> for queries are strong enough. But for me it's strange that it's
>>> possible
>>>>> to create an index without a name, but it's required to use name to
>>> query
>>>>> it. Also taking in consideration that there is no guaranteed way to
>> get
>>> an
>>>>> index name (or I don't know it).
>>>>>
>>>>> Igniters, what do you think?
>>>>> [1] https://github.com/apache/ignite/pull/9118#discussion_r642557531
>>>>> [2]
>>> https://ignite.apache.org/docs/latest/monitoring-metrics/system-views
>>>>>
>>>>> On Fri, Aug 6, 2021 at 4:04 PM Maksim Timonin <
>> timonin.maxim@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi, all!
>>>>>>
>>>>>> It's a gentle reminder. There is a PR for the new Index API [1]. It
>> was
>>>>>> approved by Alex Plekhanov. Does anybody want to review this API too?
>>> If
>>>>>> there won't be objections we're going to merge it Monday, 16th of
>>> August.
>>>>>> Thanks!
>>>>>>
>>>>>> [1] https://github.com/apache/ignite/pull/9118
>>>>>>
>>>>>> On Fri, May 21, 2021 at 10:43 PM Maksim Timonin <
>>> timonin.maxim@gmail.com
>>>>>> wrote:
>>>>>>
>>>>>>> Andrey, hi!
>>>>>>>
>>>>>>> Some updates, there.
>>>>>>>
>>>>>>> I've submitted a PR for IndexQuery [1]. There is an issue about lazy
>>>>> page
>>>>>>> loading, that is also related to Text query ticket IGNITE-12291.
>>>>>>>
>>>>>>> CacheQueries already have pending pages functionality, it's done
>> with
>>>>>>> multiple sending GridCacheQueryRequest. There was an issue with
>>>>> TextQuery
>>>>>>> and limit, after exceeding a limit we still send requests, so I
>>>>> submitted a
>>>>>>> patch to fix this [2].
>>>>>>>
>>>>>>> But currently, TextQuery, as SqlFieldsQuery also does, prepares
>> whole
>>>>>>> data on query request, holds it, and provides a cursor over this
>>>>>>> collection.
>>>>>>>
>>>>>>> As I understand you correctly, you propose to run TextQuery over
>> index
>>>>>>> with every poll page request. We can do this with Lucene
>>>>>>> IndexSearcher.searchAfter. So from one side, it will save resources.
>>> But
>>>>>>> from the other side, no queries (no TextQuery, no SqlFieldsQuery)
>> lock
>>>>>>> index for querying. So there can be data inconsistency, as there can
>>> be
>>>>>>> concurrent operations on an index while a user iterates over the
>>>>> cursor. It
>>>>>>> also could be for queries now, due to no index lock being there, but
>>> the
>>>>>>> window of time of such inconsistency is much shorter.
>>>>>>>
>>>>>>> The same dilemma I have for IndexQuery. In my patch [1] I provide
>> lazy
>>>>>>> iteration over BPlusTree. There is no lock on an index too while
>>>>> querying.
>>>>>>> And I want to discuss the right way. I have in mind the next things:
>>>>>>> 1. Indexes currently doesn't support transactions, also SQL queries
>>>>> don't
>>>>>>> lock index for queries, so Ignite don't guarantee data consistency;
>>>>>>> 2. As I understand preparing whole data for SQL queries is required
>>> due
>>>>>>> to relations between tables. The more complex query and relations we
>>>>> have,
>>>>>>> the much consistency issues we have in result in case of parallel
>>>>>>> operations;
>>>>>>> 3. Querying a single index only (by TextQuery or IndexQuery) doesn't
>>>>>>> affect any relations, so we can allow concurrent updates, as it
>> could
>>>>>>> affect a query result but it doesn't hurt.
>>>>>>>
>>>>>>> And following these thoughts, it's right to implement lazy
>> iterations
>>>>>>> over indexes. What do you think?
>>>>>>>
>>>>>>> Also, there is a second topic to discuss. BPlusTree indexes support
>>>>> query
>>>>>>> parallelism. But CacheQueries don't. There needs to be a change to
>>>>>>> infrastructure to support query parallelism, so on this patch [1] I
>>>>> handle
>>>>>>> multiple segments in a single thread. And this works OK, as in the
>>> case
>>>>> of
>>>>>>> lazy querying it's very fast to initialize a cursor, so there is not
>>>>> much
>>>>>>> overhead on multiple segments. I ran performance tests and found
>> that
>>> in
>>>>>>> some cases, IndexQuery beats SqlFieldsQuery even with enabled
>>>>>>> queryParallelism (it helps a SqlFieldsQuery much). So the need for
>>>>>>> supporting queryParallelism for IndexQuery is required to be tested
>>>>> well.
>>>>>>> As IndexQuery already can help users to speed up some queries I
>>> propose
>>>>> to
>>>>>>> check queryParallelism a little bit later. WDYT?
>>>>>>>
>>>>>>> So, those 2 things affect the Apache Ignite release that IndexQuery
>>> will
>>>>>>> be delivered with. So, please let me know your thoughts.
>>>>>>>
>>>>>>> Any thoughts from the community are welcome too.
>>>>>>>
>>>>>>>
>>>>>>> [1] https://github.com/apache/ignite/pull/9118
>>>>>>> [2] https://github.com/apache/ignite/pull/9086
>>>>>>>
>>>>>>> On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <
>>> timonin.maxim@gmail.com
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Andrey,
>>>>>>>>
>>>>>>>> Thanks! I picked it.
>>>>>>>>
>>>>>>>> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <
>>>>> timonin.maxim@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Stephen,
>>>>>>>>>
>>>>>>>>> I don't see a reason to replace or deprecate IndexingSpi. I'm not
>>>>>>>>> sure how smbd uses it, but it works now.
>>>>>>>>>
>>>>>>>>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
>>>>>>>>> stephen.darlington@gridgain.com> wrote:
>>>>>>>>>
>>>>>>>>>> Is this a replacement for IndexingSpi? Put bluntly, do we
>> deprecate
>>>>>>>>>> (and remove) it?
>>>>>>>>>>
>>>>>>>>>> Or do you see them as complimentary?
>>>>>>>>>>
>>>>>>>>>>> On 12 Apr 2021, at 11:29, Maksim Timonin <
>> timonin.maxim@gmail.com
>>>>>>>>>> wrote:
>>>>>>>>>>> Hi Stephen!
>>>>>>>>>>>
>>>>>>>>>>> Please have a look at the QueryProcessing paragraph [1]. I've
>>>>>>>>>> described
>>>>>>>>>>> why IndexingSpi doesn't fit us well.
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
>>>>>>>>>>> On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
>>>>>>>>>>> stephen.darlington@gridgain.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> How does this fit with the current IndexingSpi? Superficially
>>> they
>>>>>>>>>> appear
>>>>>>>>>>>> to do very similar things?
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Stephen
>>>>>>>>>>>>
>>>>>>>>>>>>> On 6 Apr 2021, at 14:13, Maksim Timonin <
>>> timonin.maxim@gmail.com
>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi, Igniters!
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'd like to propose a new feature - opportunity to query and
>>>>> create
>>>>>>>>>>>> indexes
>>>>>>>>>>>>> from public API.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It will help in some cases, where:
>>>>>>>>>>>>> 1. SQL is not applicable by design of user application;
>>>>>>>>>>>>> 2. Where IndexScan is preferable than ScanQuery for
>> performance
>>>>>>>>>> reasons;
>>>>>>>>>>>>> 3. Functional indexes are required.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also it'll be great to have a transactional support for such
>>>>>>>>>> queries,
>>>>>>>>>>>> like
>>>>>>>>>>>>> the "select for update" query provides. But I don't dig there
>>>>>>>>>> much. It
>>>>>>>>>>>> will
>>>>>>>>>>>>> be a next step if this API will be implemented.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've prepared an IEP-71 for that [1] with more details. Please
>>>>>>>>>> share your
>>>>>>>>>>>>> thoughts.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
>>>>>>>>>>>>
>>>>>>>>>>
>>> --
>>> Taras Ledkov
>>> Mail-To: tledkov@gridgain.com
>>>
>>>
-- 
Taras Ledkov
Mail-To: tledkov@gridgain.com

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Maksim Timonin <ti...@gmail.com>.

> But lt("b", 1) AND lt("a", 2) is equivalent to lt("a", 1) AND lt("b",
2) according with criteria API and javadoc of the 'IndexQuery#setCriteria'
method.

I updated javadoc, Andrey suggested writing this in javadocs, so now there
is a note about field order.

> If the user uses logical criteria instead of index bounds why he must
remember about the order of the index's fields

In case of non-strict fields, the choice of index (A, B) or (B, A) depends
on field order in the user query in any case (whether we make it strict or
non-strict), as IndexQueryProcessor doesn't analyze which of those 2
indexes are better. So Ignite silently may choose the wrong index. So, I
propose to restrict fields in order to avoid such confusing situations.

This case also can be avoided if we make the index name as required param.
And I think that in the first step we can make an index name required (and
fields order too), to make those restrictions more soft later? WDYT?

But currently I don't see a *100% legal* way to find an index name in
Ignite for an index that was created without specifying it? Am I missing
smth there?




On Thu, Aug 26, 2021 at 5:55 PM Taras Ledkov <tl...@gridgain.com> wrote:

> Hi,
>
>  > 1. Index (A, B) is not equivalent to index (B, A).
>
> But
> lt("b", 1) AND lt("a", 2) is equivalent to lt("a", 1) AND lt("b", 2)
> according with criteria API and javadoc of the 'IndexQuery#setCriteria'
> method.
>
> If the user uses logical criteria instead of index bounds
> why he must remember about the order of the index's fields?
>
>  > 2. We should avoid cases when a user uses in query only field B for
> index (A, B).
> Sure. This case must be checked. Is checking of this case related to
> search index by criteria conditions?
>
> On 26.08.2021 16:49, Maksim Timonin wrote:
>
> > There are some thoughts about strict field order:
> > 1. Index (A, B) is not equivalent to index (B, A). Some queries may have
> > different performance on such indexes, and users have to specify the
> right
> > index. What if both indexes exist?
> > 2. We should avoid cases when a user uses in query only field B for index
> > (A, B). We have to force the user to specify range for (A) too, or
> > explicitly set it (null, null). Otherwise it looks like a mistake.
> >
> >
> >
> >
> > On Thu, Aug 26, 2021 at 4:39 PM Ivan Daschinsky <iv...@gmail.com>
> wrote:
> >
> >> 1. I suppose, that the next step is to implement the api for manually
> >> creating index. I think that user wants to create index that will speed
> up
> >> his criteria base queries, so he or she will use the same criteria to
> >> define the index. So no problem at all
> >> 2. We should print warning or throws exception if there is not any index
> >> that match specific criteria.
> >>
> >> BTW, Mongo DB doesn't make user to write index name in query. It just
> >> works.
> >>
> >> чт, 26 авг. 2021 г., 15:52 Taras Ledkov <tl...@gridgain.com>:
> >>
> >>> Hi,
> >>>
> >>>> It is an usability nightmare to make user write index name in all
> >> cases.
> >>> I don't see any difference between specifying the index name and
> >>> specifying the index fields in the right order.
> >>> Do you see?
> >>>
> >>> Let's there is the index:
> >>> idx_A_B ON TBL (A, B)
> >>>
> >>> Is it OK that the query like below doesn't math the index 'idx_A_B'?
> >>> new IndexQuery<>(..)
> >>>       .setCriteria(lt("b", 1), lt("a", 2));
> >>>
> >>> On 26.08.2021 15:23, Ivan Daschinsky wrote:
> >>>> I am against to make user write index name. It is quite simple and
> >>>> straightforward algorithm to match index to field names, so it is
> >> strange
> >>>> to compare it to sql engine optimizer.
> >>>>
> >>>> It is an usability nightmare to make user write index name in all
> >> cases.
> >>>> чт, 26 авг. 2021 г., 14:42 Maksim Timonin <ti...@gmail.com>:
> >>>>
> >>>>> Hi, Igniters!
> >>>>>
> >>>>> There is a discussion about how to specify an index to query with an
> >>>>> IndexQuery [1]. Currently my PR provides 2 ways to specify index:
> >>>>> 1. With a table and index name;
> >>>>> 2. With a table and list of index fields (without index name). In
> this
> >>> case
> >>>>> IndexQueryProcessor tries to find an index that matches table and
> >> index
> >>>>> fields in strict order (order of fields in criteria has to match the
> >>> order
> >>>>> of fields in index).
> >>>>>
> >>>>> Discussion is whether is the second approach valid?
> >>>>>
> >>>>> Pros:
> >>>>> 1. Currently index name is an optional field for QueryIndex and
> >>>>> QuerySqlField. Then users can create an index with a table and list
> of
> >>>>> fields. Then, we should provide an opportunity to define an index for
> >>>>> querying the same way as we do for creating.
> >>>>> 2. It's required to know the index name to query it (in case the
> index
> >>> was
> >>>>> created without an explicit name). Users can find it and then use it
> >> as
> >>> a
> >>>>> constant in code, but I see some troubles there:
> >>>>> 2.1. Get index name by querying the system view INDEXES. Note, that
> >>> system
> >>>>> views are marked as an experimental feature [2].
> >>>>> 2.2. There is a workaround to know an index name with EXPLAIN clause
> >> for
> >>>>> sql query that uses the required index (but it depends on SQL
> >>> optimizer).
> >>>>> 2.3. Users can use the index name builder, but it is in the
> >>>>> internal package
> >>>>> (org.apache.ignite.internal.processors.query.QueryUtils#indexName).
> >>> Then it
> >>>>> can be changed from version to version without warning, and then the
> >>> user
> >>>>> can't rely on it in code.
> >>>>> 3. Name of the Primary Key index (_key_PK) is predefined and
> hardcoded
> >>> in
> >>>>> Ignite. Users can't set it while creating it, and the name of PK
> index
> >>> is
> >>>>> hardcoded in the internal package too (QueryUtils.PRIMARY_KEY_INDEX).
> >>>>>
> >>>>> Cons:
> >>>>> 1. It's declared that IndexQuery avoids some SQL steps (like
> planning,
> >>>>> optimizer) in favor of speed. It looks like that looking for an index
> >> by
> >>>>> list of fields is the work of an optimizer.
> >>>>> 2. It can be not obvious that the order of fields in a query has to
> >>> match
> >>>>> the order of fields in the related index. We should have it in mind
> >> when
> >>>>> building a query - there should be a check for order of fields before
> >>>>> querying.
> >>>>>
> >>>>>   From my side, I think that arguments for enforcing usage of an
> index
> >>> name
> >>>>> for queries are strong enough. But for me it's strange that it's
> >>> possible
> >>>>> to create an index without a name, but it's required to use name to
> >>> query
> >>>>> it. Also taking in consideration that there is no guaranteed way to
> >> get
> >>> an
> >>>>> index name (or I don't know it).
> >>>>>
> >>>>> Igniters, what do you think?
> >>>>> [1] https://github.com/apache/ignite/pull/9118#discussion_r642557531
> >>>>> [2]
> >>> https://ignite.apache.org/docs/latest/monitoring-metrics/system-views
> >>>>>
> >>>>> On Fri, Aug 6, 2021 at 4:04 PM Maksim Timonin <
> >> timonin.maxim@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi, all!
> >>>>>>
> >>>>>> It's a gentle reminder. There is a PR for the new Index API [1]. It
> >> was
> >>>>>> approved by Alex Plekhanov. Does anybody want to review this API
> too?
> >>> If
> >>>>>> there won't be objections we're going to merge it Monday, 16th of
> >>> August.
> >>>>>> Thanks!
> >>>>>>
> >>>>>> [1] https://github.com/apache/ignite/pull/9118
> >>>>>>
> >>>>>> On Fri, May 21, 2021 at 10:43 PM Maksim Timonin <
> >>> timonin.maxim@gmail.com
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Andrey, hi!
> >>>>>>>
> >>>>>>> Some updates, there.
> >>>>>>>
> >>>>>>> I've submitted a PR for IndexQuery [1]. There is an issue about
> lazy
> >>>>> page
> >>>>>>> loading, that is also related to Text query ticket IGNITE-12291.
> >>>>>>>
> >>>>>>> CacheQueries already have pending pages functionality, it's done
> >> with
> >>>>>>> multiple sending GridCacheQueryRequest. There was an issue with
> >>>>> TextQuery
> >>>>>>> and limit, after exceeding a limit we still send requests, so I
> >>>>> submitted a
> >>>>>>> patch to fix this [2].
> >>>>>>>
> >>>>>>> But currently, TextQuery, as SqlFieldsQuery also does, prepares
> >> whole
> >>>>>>> data on query request, holds it, and provides a cursor over this
> >>>>>>> collection.
> >>>>>>>
> >>>>>>> As I understand you correctly, you propose to run TextQuery over
> >> index
> >>>>>>> with every poll page request. We can do this with Lucene
> >>>>>>> IndexSearcher.searchAfter. So from one side, it will save
> resources.
> >>> But
> >>>>>>> from the other side, no queries (no TextQuery, no SqlFieldsQuery)
> >> lock
> >>>>>>> index for querying. So there can be data inconsistency, as there
> can
> >>> be
> >>>>>>> concurrent operations on an index while a user iterates over the
> >>>>> cursor. It
> >>>>>>> also could be for queries now, due to no index lock being there,
> but
> >>> the
> >>>>>>> window of time of such inconsistency is much shorter.
> >>>>>>>
> >>>>>>> The same dilemma I have for IndexQuery. In my patch [1] I provide
> >> lazy
> >>>>>>> iteration over BPlusTree. There is no lock on an index too while
> >>>>> querying.
> >>>>>>> And I want to discuss the right way. I have in mind the next
> things:
> >>>>>>> 1. Indexes currently doesn't support transactions, also SQL queries
> >>>>> don't
> >>>>>>> lock index for queries, so Ignite don't guarantee data consistency;
> >>>>>>> 2. As I understand preparing whole data for SQL queries is required
> >>> due
> >>>>>>> to relations between tables. The more complex query and relations
> we
> >>>>> have,
> >>>>>>> the much consistency issues we have in result in case of parallel
> >>>>>>> operations;
> >>>>>>> 3. Querying a single index only (by TextQuery or IndexQuery)
> doesn't
> >>>>>>> affect any relations, so we can allow concurrent updates, as it
> >> could
> >>>>>>> affect a query result but it doesn't hurt.
> >>>>>>>
> >>>>>>> And following these thoughts, it's right to implement lazy
> >> iterations
> >>>>>>> over indexes. What do you think?
> >>>>>>>
> >>>>>>> Also, there is a second topic to discuss. BPlusTree indexes support
> >>>>> query
> >>>>>>> parallelism. But CacheQueries don't. There needs to be a change to
> >>>>>>> infrastructure to support query parallelism, so on this patch [1] I
> >>>>> handle
> >>>>>>> multiple segments in a single thread. And this works OK, as in the
> >>> case
> >>>>> of
> >>>>>>> lazy querying it's very fast to initialize a cursor, so there is
> not
> >>>>> much
> >>>>>>> overhead on multiple segments. I ran performance tests and found
> >> that
> >>> in
> >>>>>>> some cases, IndexQuery beats SqlFieldsQuery even with enabled
> >>>>>>> queryParallelism (it helps a SqlFieldsQuery much). So the need for
> >>>>>>> supporting queryParallelism for IndexQuery is required to be tested
> >>>>> well.
> >>>>>>> As IndexQuery already can help users to speed up some queries I
> >>> propose
> >>>>> to
> >>>>>>> check queryParallelism a little bit later. WDYT?
> >>>>>>>
> >>>>>>> So, those 2 things affect the Apache Ignite release that IndexQuery
> >>> will
> >>>>>>> be delivered with. So, please let me know your thoughts.
> >>>>>>>
> >>>>>>> Any thoughts from the community are welcome too.
> >>>>>>>
> >>>>>>>
> >>>>>>> [1] https://github.com/apache/ignite/pull/9118
> >>>>>>> [2] https://github.com/apache/ignite/pull/9086
> >>>>>>>
> >>>>>>> On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <
> >>> timonin.maxim@gmail.com
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Andrey,
> >>>>>>>>
> >>>>>>>> Thanks! I picked it.
> >>>>>>>>
> >>>>>>>> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <
> >>>>> timonin.maxim@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Stephen,
> >>>>>>>>>
> >>>>>>>>> I don't see a reason to replace or deprecate IndexingSpi. I'm not
> >>>>>>>>> sure how smbd uses it, but it works now.
> >>>>>>>>>
> >>>>>>>>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
> >>>>>>>>> stephen.darlington@gridgain.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> Is this a replacement for IndexingSpi? Put bluntly, do we
> >> deprecate
> >>>>>>>>>> (and remove) it?
> >>>>>>>>>>
> >>>>>>>>>> Or do you see them as complimentary?
> >>>>>>>>>>
> >>>>>>>>>>> On 12 Apr 2021, at 11:29, Maksim Timonin <
> >> timonin.maxim@gmail.com
> >>>>>>>>>> wrote:
> >>>>>>>>>>> Hi Stephen!
> >>>>>>>>>>>
> >>>>>>>>>>> Please have a look at the QueryProcessing paragraph [1]. I've
> >>>>>>>>>> described
> >>>>>>>>>>> why IndexingSpi doesn't fit us well.
> >>>>>>>>>>>
> >>>>>>>>>>> [1]
> >>>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
> >>>>>>>>>>> On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
> >>>>>>>>>>> stephen.darlington@gridgain.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> How does this fit with the current IndexingSpi? Superficially
> >>> they
> >>>>>>>>>> appear
> >>>>>>>>>>>> to do very similar things?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>> Stephen
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On 6 Apr 2021, at 14:13, Maksim Timonin <
> >>> timonin.maxim@gmail.com
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>> Hi, Igniters!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'd like to propose a new feature - opportunity to query and
> >>>>> create
> >>>>>>>>>>>> indexes
> >>>>>>>>>>>>> from public API.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It will help in some cases, where:
> >>>>>>>>>>>>> 1. SQL is not applicable by design of user application;
> >>>>>>>>>>>>> 2. Where IndexScan is preferable than ScanQuery for
> >> performance
> >>>>>>>>>> reasons;
> >>>>>>>>>>>>> 3. Functional indexes are required.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Also it'll be great to have a transactional support for such
> >>>>>>>>>> queries,
> >>>>>>>>>>>> like
> >>>>>>>>>>>>> the "select for update" query provides. But I don't dig there
> >>>>>>>>>> much. It
> >>>>>>>>>>>> will
> >>>>>>>>>>>>> be a next step if this API will be implemented.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I've prepared an IEP-71 for that [1] with more details.
> Please
> >>>>>>>>>> share your
> >>>>>>>>>>>>> thoughts.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>> --
> >>> Taras Ledkov
> >>> Mail-To: tledkov@gridgain.com
> >>>
> >>>
> --
> Taras Ledkov
> Mail-To: tledkov@gridgain.com
>
>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Maksim Timonin <ti...@gmail.com>.

There are some thoughts about strict field order:
1. Index (A, B) is not equivalent to index (B, A). Some queries may have
different performance on such indexes, and users have to specify the right
index. What if both indexes exist?
2. We should avoid cases when a user uses in query only field B for index
(A, B). We have to force the user to specify range for (A) too, or
explicitly set it (null, null). Otherwise it looks like a mistake.




On Thu, Aug 26, 2021 at 4:39 PM Ivan Daschinsky <iv...@gmail.com> wrote:

> 1. I suppose, that the next step is to implement the api for manually
> creating index. I think that user wants to create index that will speed up
> his criteria base queries, so he or she will use the same criteria to
> define the index. So no problem at all
> 2. We should print warning or throws exception if there is not any index
> that match specific criteria.
>
> BTW, Mongo DB doesn't make user to write index name in query. It just
> works.
>
> чт, 26 авг. 2021 г., 15:52 Taras Ledkov <tl...@gridgain.com>:
>
> > Hi,
> >
> > > It is an usability nightmare to make user write index name in all
> cases.
> > I don't see any difference between specifying the index name and
> > specifying the index fields in the right order.
> > Do you see?
> >
> > Let's there is the index:
> > idx_A_B ON TBL (A, B)
> >
> > Is it OK that the query like below doesn't math the index 'idx_A_B'?
> > new IndexQuery<>(..)
> >      .setCriteria(lt("b", 1), lt("a", 2));
> >
> > On 26.08.2021 15:23, Ivan Daschinsky wrote:
> > > I am against to make user write index name. It is quite simple and
> > > straightforward algorithm to match index to field names, so it is
> strange
> > > to compare it to sql engine optimizer.
> > >
> > > It is an usability nightmare to make user write index name in all
> cases.
> > >
> > > чт, 26 авг. 2021 г., 14:42 Maksim Timonin <ti...@gmail.com>:
> > >
> > >> Hi, Igniters!
> > >>
> > >> There is a discussion about how to specify an index to query with an
> > >> IndexQuery [1]. Currently my PR provides 2 ways to specify index:
> > >> 1. With a table and index name;
> > >> 2. With a table and list of index fields (without index name). In this
> > case
> > >> IndexQueryProcessor tries to find an index that matches table and
> index
> > >> fields in strict order (order of fields in criteria has to match the
> > order
> > >> of fields in index).
> > >>
> > >> Discussion is whether is the second approach valid?
> > >>
> > >> Pros:
> > >> 1. Currently index name is an optional field for QueryIndex and
> > >> QuerySqlField. Then users can create an index with a table and list of
> > >> fields. Then, we should provide an opportunity to define an index for
> > >> querying the same way as we do for creating.
> > >> 2. It's required to know the index name to query it (in case the index
> > was
> > >> created without an explicit name). Users can find it and then use it
> as
> > a
> > >> constant in code, but I see some troubles there:
> > >> 2.1. Get index name by querying the system view INDEXES. Note, that
> > system
> > >> views are marked as an experimental feature [2].
> > >> 2.2. There is a workaround to know an index name with EXPLAIN clause
> for
> > >> sql query that uses the required index (but it depends on SQL
> > optimizer).
> > >> 2.3. Users can use the index name builder, but it is in the
> > >> internal package
> > >> (org.apache.ignite.internal.processors.query.QueryUtils#indexName).
> > Then it
> > >> can be changed from version to version without warning, and then the
> > user
> > >> can't rely on it in code.
> > >> 3. Name of the Primary Key index (_key_PK) is predefined and hardcoded
> > in
> > >> Ignite. Users can't set it while creating it, and the name of PK index
> > is
> > >> hardcoded in the internal package too (QueryUtils.PRIMARY_KEY_INDEX).
> > >>
> > >> Cons:
> > >> 1. It's declared that IndexQuery avoids some SQL steps (like planning,
> > >> optimizer) in favor of speed. It looks like that looking for an index
> by
> > >> list of fields is the work of an optimizer.
> > >> 2. It can be not obvious that the order of fields in a query has to
> > match
> > >> the order of fields in the related index. We should have it in mind
> when
> > >> building a query - there should be a check for order of fields before
> > >> querying.
> > >>
> > >>  From my side, I think that arguments for enforcing usage of an index
> > name
> > >> for queries are strong enough. But for me it's strange that it's
> > possible
> > >> to create an index without a name, but it's required to use name to
> > query
> > >> it. Also taking in consideration that there is no guaranteed way to
> get
> > an
> > >> index name (or I don't know it).
> > >>
> > >> Igniters, what do you think?
> > >> [1] https://github.com/apache/ignite/pull/9118#discussion_r642557531
> > >> [2]
> > https://ignite.apache.org/docs/latest/monitoring-metrics/system-views
> > >>
> > >>
> > >> On Fri, Aug 6, 2021 at 4:04 PM Maksim Timonin <
> timonin.maxim@gmail.com>
> > >> wrote:
> > >>
> > >>> Hi, all!
> > >>>
> > >>> It's a gentle reminder. There is a PR for the new Index API [1]. It
> was
> > >>> approved by Alex Plekhanov. Does anybody want to review this API too?
> > If
> > >>> there won't be objections we're going to merge it Monday, 16th of
> > August.
> > >>>
> > >>> Thanks!
> > >>>
> > >>> [1] https://github.com/apache/ignite/pull/9118
> > >>>
> > >>> On Fri, May 21, 2021 at 10:43 PM Maksim Timonin <
> > timonin.maxim@gmail.com
> > >>>
> > >>> wrote:
> > >>>
> > >>>> Andrey, hi!
> > >>>>
> > >>>> Some updates, there.
> > >>>>
> > >>>> I've submitted a PR for IndexQuery [1]. There is an issue about lazy
> > >> page
> > >>>> loading, that is also related to Text query ticket IGNITE-12291.
> > >>>>
> > >>>> CacheQueries already have pending pages functionality, it's done
> with
> > >>>> multiple sending GridCacheQueryRequest. There was an issue with
> > >> TextQuery
> > >>>> and limit, after exceeding a limit we still send requests, so I
> > >> submitted a
> > >>>> patch to fix this [2].
> > >>>>
> > >>>> But currently, TextQuery, as SqlFieldsQuery also does, prepares
> whole
> > >>>> data on query request, holds it, and provides a cursor over this
> > >>>> collection.
> > >>>>
> > >>>> As I understand you correctly, you propose to run TextQuery over
> index
> > >>>> with every poll page request. We can do this with Lucene
> > >>>> IndexSearcher.searchAfter. So from one side, it will save resources.
> > But
> > >>>> from the other side, no queries (no TextQuery, no SqlFieldsQuery)
> lock
> > >>>> index for querying. So there can be data inconsistency, as there can
> > be
> > >>>> concurrent operations on an index while a user iterates over the
> > >> cursor. It
> > >>>> also could be for queries now, due to no index lock being there, but
> > the
> > >>>> window of time of such inconsistency is much shorter.
> > >>>>
> > >>>> The same dilemma I have for IndexQuery. In my patch [1] I provide
> lazy
> > >>>> iteration over BPlusTree. There is no lock on an index too while
> > >> querying.
> > >>>> And I want to discuss the right way. I have in mind the next things:
> > >>>> 1. Indexes currently doesn't support transactions, also SQL queries
> > >> don't
> > >>>> lock index for queries, so Ignite don't guarantee data consistency;
> > >>>> 2. As I understand preparing whole data for SQL queries is required
> > due
> > >>>> to relations between tables. The more complex query and relations we
> > >> have,
> > >>>> the much consistency issues we have in result in case of parallel
> > >>>> operations;
> > >>>> 3. Querying a single index only (by TextQuery or IndexQuery) doesn't
> > >>>> affect any relations, so we can allow concurrent updates, as it
> could
> > >>>> affect a query result but it doesn't hurt.
> > >>>>
> > >>>> And following these thoughts, it's right to implement lazy
> iterations
> > >>>> over indexes. What do you think?
> > >>>>
> > >>>> Also, there is a second topic to discuss. BPlusTree indexes support
> > >> query
> > >>>> parallelism. But CacheQueries don't. There needs to be a change to
> > >>>> infrastructure to support query parallelism, so on this patch [1] I
> > >> handle
> > >>>> multiple segments in a single thread. And this works OK, as in the
> > case
> > >> of
> > >>>> lazy querying it's very fast to initialize a cursor, so there is not
> > >> much
> > >>>> overhead on multiple segments. I ran performance tests and found
> that
> > in
> > >>>> some cases, IndexQuery beats SqlFieldsQuery even with enabled
> > >>>> queryParallelism (it helps a SqlFieldsQuery much). So the need for
> > >>>> supporting queryParallelism for IndexQuery is required to be tested
> > >> well.
> > >>>> As IndexQuery already can help users to speed up some queries I
> > propose
> > >> to
> > >>>> check queryParallelism a little bit later. WDYT?
> > >>>>
> > >>>> So, those 2 things affect the Apache Ignite release that IndexQuery
> > will
> > >>>> be delivered with. So, please let me know your thoughts.
> > >>>>
> > >>>> Any thoughts from the community are welcome too.
> > >>>>
> > >>>>
> > >>>> [1] https://github.com/apache/ignite/pull/9118
> > >>>> [2] https://github.com/apache/ignite/pull/9086
> > >>>>
> > >>>> On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <
> > timonin.maxim@gmail.com
> > >>>> wrote:
> > >>>>
> > >>>>> Andrey,
> > >>>>>
> > >>>>> Thanks! I picked it.
> > >>>>>
> > >>>>> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <
> > >> timonin.maxim@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Stephen,
> > >>>>>>
> > >>>>>> I don't see a reason to replace or deprecate IndexingSpi. I'm not
> > >>>>>> sure how smbd uses it, but it works now.
> > >>>>>>
> > >>>>>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
> > >>>>>> stephen.darlington@gridgain.com> wrote:
> > >>>>>>
> > >>>>>>> Is this a replacement for IndexingSpi? Put bluntly, do we
> deprecate
> > >>>>>>> (and remove) it?
> > >>>>>>>
> > >>>>>>> Or do you see them as complimentary?
> > >>>>>>>
> > >>>>>>>> On 12 Apr 2021, at 11:29, Maksim Timonin <
> timonin.maxim@gmail.com
> > >
> > >>>>>>> wrote:
> > >>>>>>>> Hi Stephen!
> > >>>>>>>>
> > >>>>>>>> Please have a look at the QueryProcessing paragraph [1]. I've
> > >>>>>>> described
> > >>>>>>>> why IndexingSpi doesn't fit us well.
> > >>>>>>>>
> > >>>>>>>> [1]
> > >>>>>>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
> > >>>>>>>> On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
> > >>>>>>>> stephen.darlington@gridgain.com> wrote:
> > >>>>>>>>
> > >>>>>>>>> How does this fit with the current IndexingSpi? Superficially
> > they
> > >>>>>>> appear
> > >>>>>>>>> to do very similar things?
> > >>>>>>>>>
> > >>>>>>>>> Regards,
> > >>>>>>>>> Stephen
> > >>>>>>>>>
> > >>>>>>>>>> On 6 Apr 2021, at 14:13, Maksim Timonin <
> > timonin.maxim@gmail.com
> > >>>>>>> wrote:
> > >>>>>>>>>> Hi, Igniters!
> > >>>>>>>>>>
> > >>>>>>>>>> I'd like to propose a new feature - opportunity to query and
> > >> create
> > >>>>>>>>> indexes
> > >>>>>>>>>> from public API.
> > >>>>>>>>>>
> > >>>>>>>>>> It will help in some cases, where:
> > >>>>>>>>>> 1. SQL is not applicable by design of user application;
> > >>>>>>>>>> 2. Where IndexScan is preferable than ScanQuery for
> performance
> > >>>>>>> reasons;
> > >>>>>>>>>> 3. Functional indexes are required.
> > >>>>>>>>>>
> > >>>>>>>>>> Also it'll be great to have a transactional support for such
> > >>>>>>> queries,
> > >>>>>>>>> like
> > >>>>>>>>>> the "select for update" query provides. But I don't dig there
> > >>>>>>> much. It
> > >>>>>>>>> will
> > >>>>>>>>>> be a next step if this API will be implemented.
> > >>>>>>>>>>
> > >>>>>>>>>> I've prepared an IEP-71 for that [1] with more details. Please
> > >>>>>>> share your
> > >>>>>>>>>> thoughts.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> [1]
> > >>>>>>>>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > --
> > Taras Ledkov
> > Mail-To: tledkov@gridgain.com
> >
> >
>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Ivan Daschinsky <iv...@gmail.com>.

1. I suppose, that the next step is to implement the api for manually
creating index. I think that user wants to create index that will speed up
his criteria base queries, so he or she will use the same criteria to
define the index. So no problem at all
2. We should print warning or throws exception if there is not any index
that match specific criteria.

BTW, Mongo DB doesn't make user to write index name in query. It just
works.

чт, 26 авг. 2021 г., 15:52 Taras Ledkov <tl...@gridgain.com>:

> Hi,
>
> > It is an usability nightmare to make user write index name in all cases.
> I don't see any difference between specifying the index name and
> specifying the index fields in the right order.
> Do you see?
>
> Let's there is the index:
> idx_A_B ON TBL (A, B)
>
> Is it OK that the query like below doesn't math the index 'idx_A_B'?
> new IndexQuery<>(..)
>      .setCriteria(lt("b", 1), lt("a", 2));
>
> On 26.08.2021 15:23, Ivan Daschinsky wrote:
> > I am against to make user write index name. It is quite simple and
> > straightforward algorithm to match index to field names, so it is strange
> > to compare it to sql engine optimizer.
> >
> > It is an usability nightmare to make user write index name in all cases.
> >
> > чт, 26 авг. 2021 г., 14:42 Maksim Timonin <ti...@gmail.com>:
> >
> >> Hi, Igniters!
> >>
> >> There is a discussion about how to specify an index to query with an
> >> IndexQuery [1]. Currently my PR provides 2 ways to specify index:
> >> 1. With a table and index name;
> >> 2. With a table and list of index fields (without index name). In this
> case
> >> IndexQueryProcessor tries to find an index that matches table and index
> >> fields in strict order (order of fields in criteria has to match the
> order
> >> of fields in index).
> >>
> >> Discussion is whether is the second approach valid?
> >>
> >> Pros:
> >> 1. Currently index name is an optional field for QueryIndex and
> >> QuerySqlField. Then users can create an index with a table and list of
> >> fields. Then, we should provide an opportunity to define an index for
> >> querying the same way as we do for creating.
> >> 2. It's required to know the index name to query it (in case the index
> was
> >> created without an explicit name). Users can find it and then use it as
> a
> >> constant in code, but I see some troubles there:
> >> 2.1. Get index name by querying the system view INDEXES. Note, that
> system
> >> views are marked as an experimental feature [2].
> >> 2.2. There is a workaround to know an index name with EXPLAIN clause for
> >> sql query that uses the required index (but it depends on SQL
> optimizer).
> >> 2.3. Users can use the index name builder, but it is in the
> >> internal package
> >> (org.apache.ignite.internal.processors.query.QueryUtils#indexName).
> Then it
> >> can be changed from version to version without warning, and then the
> user
> >> can't rely on it in code.
> >> 3. Name of the Primary Key index (_key_PK) is predefined and hardcoded
> in
> >> Ignite. Users can't set it while creating it, and the name of PK index
> is
> >> hardcoded in the internal package too (QueryUtils.PRIMARY_KEY_INDEX).
> >>
> >> Cons:
> >> 1. It's declared that IndexQuery avoids some SQL steps (like planning,
> >> optimizer) in favor of speed. It looks like that looking for an index by
> >> list of fields is the work of an optimizer.
> >> 2. It can be not obvious that the order of fields in a query has to
> match
> >> the order of fields in the related index. We should have it in mind when
> >> building a query - there should be a check for order of fields before
> >> querying.
> >>
> >>  From my side, I think that arguments for enforcing usage of an index
> name
> >> for queries are strong enough. But for me it's strange that it's
> possible
> >> to create an index without a name, but it's required to use name to
> query
> >> it. Also taking in consideration that there is no guaranteed way to get
> an
> >> index name (or I don't know it).
> >>
> >> Igniters, what do you think?
> >> [1] https://github.com/apache/ignite/pull/9118#discussion_r642557531
> >> [2]
> https://ignite.apache.org/docs/latest/monitoring-metrics/system-views
> >>
> >>
> >> On Fri, Aug 6, 2021 at 4:04 PM Maksim Timonin <ti...@gmail.com>
> >> wrote:
> >>
> >>> Hi, all!
> >>>
> >>> It's a gentle reminder. There is a PR for the new Index API [1]. It was
> >>> approved by Alex Plekhanov. Does anybody want to review this API too?
> If
> >>> there won't be objections we're going to merge it Monday, 16th of
> August.
> >>>
> >>> Thanks!
> >>>
> >>> [1] https://github.com/apache/ignite/pull/9118
> >>>
> >>> On Fri, May 21, 2021 at 10:43 PM Maksim Timonin <
> timonin.maxim@gmail.com
> >>>
> >>> wrote:
> >>>
> >>>> Andrey, hi!
> >>>>
> >>>> Some updates, there.
> >>>>
> >>>> I've submitted a PR for IndexQuery [1]. There is an issue about lazy
> >> page
> >>>> loading, that is also related to Text query ticket IGNITE-12291.
> >>>>
> >>>> CacheQueries already have pending pages functionality, it's done with
> >>>> multiple sending GridCacheQueryRequest. There was an issue with
> >> TextQuery
> >>>> and limit, after exceeding a limit we still send requests, so I
> >> submitted a
> >>>> patch to fix this [2].
> >>>>
> >>>> But currently, TextQuery, as SqlFieldsQuery also does, prepares whole
> >>>> data on query request, holds it, and provides a cursor over this
> >>>> collection.
> >>>>
> >>>> As I understand you correctly, you propose to run TextQuery over index
> >>>> with every poll page request. We can do this with Lucene
> >>>> IndexSearcher.searchAfter. So from one side, it will save resources.
> But
> >>>> from the other side, no queries (no TextQuery, no SqlFieldsQuery) lock
> >>>> index for querying. So there can be data inconsistency, as there can
> be
> >>>> concurrent operations on an index while a user iterates over the
> >> cursor. It
> >>>> also could be for queries now, due to no index lock being there, but
> the
> >>>> window of time of such inconsistency is much shorter.
> >>>>
> >>>> The same dilemma I have for IndexQuery. In my patch [1] I provide lazy
> >>>> iteration over BPlusTree. There is no lock on an index too while
> >> querying.
> >>>> And I want to discuss the right way. I have in mind the next things:
> >>>> 1. Indexes currently doesn't support transactions, also SQL queries
> >> don't
> >>>> lock index for queries, so Ignite don't guarantee data consistency;
> >>>> 2. As I understand preparing whole data for SQL queries is required
> due
> >>>> to relations between tables. The more complex query and relations we
> >> have,
> >>>> the much consistency issues we have in result in case of parallel
> >>>> operations;
> >>>> 3. Querying a single index only (by TextQuery or IndexQuery) doesn't
> >>>> affect any relations, so we can allow concurrent updates, as it could
> >>>> affect a query result but it doesn't hurt.
> >>>>
> >>>> And following these thoughts, it's right to implement lazy iterations
> >>>> over indexes. What do you think?
> >>>>
> >>>> Also, there is a second topic to discuss. BPlusTree indexes support
> >> query
> >>>> parallelism. But CacheQueries don't. There needs to be a change to
> >>>> infrastructure to support query parallelism, so on this patch [1] I
> >> handle
> >>>> multiple segments in a single thread. And this works OK, as in the
> case
> >> of
> >>>> lazy querying it's very fast to initialize a cursor, so there is not
> >> much
> >>>> overhead on multiple segments. I ran performance tests and found that
> in
> >>>> some cases, IndexQuery beats SqlFieldsQuery even with enabled
> >>>> queryParallelism (it helps a SqlFieldsQuery much). So the need for
> >>>> supporting queryParallelism for IndexQuery is required to be tested
> >> well.
> >>>> As IndexQuery already can help users to speed up some queries I
> propose
> >> to
> >>>> check queryParallelism a little bit later. WDYT?
> >>>>
> >>>> So, those 2 things affect the Apache Ignite release that IndexQuery
> will
> >>>> be delivered with. So, please let me know your thoughts.
> >>>>
> >>>> Any thoughts from the community are welcome too.
> >>>>
> >>>>
> >>>> [1] https://github.com/apache/ignite/pull/9118
> >>>> [2] https://github.com/apache/ignite/pull/9086
> >>>>
> >>>> On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <
> timonin.maxim@gmail.com
> >>>> wrote:
> >>>>
> >>>>> Andrey,
> >>>>>
> >>>>> Thanks! I picked it.
> >>>>>
> >>>>> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <
> >> timonin.maxim@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Stephen,
> >>>>>>
> >>>>>> I don't see a reason to replace or deprecate IndexingSpi. I'm not
> >>>>>> sure how smbd uses it, but it works now.
> >>>>>>
> >>>>>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
> >>>>>> stephen.darlington@gridgain.com> wrote:
> >>>>>>
> >>>>>>> Is this a replacement for IndexingSpi? Put bluntly, do we deprecate
> >>>>>>> (and remove) it?
> >>>>>>>
> >>>>>>> Or do you see them as complimentary?
> >>>>>>>
> >>>>>>>> On 12 Apr 2021, at 11:29, Maksim Timonin <timonin.maxim@gmail.com
> >
> >>>>>>> wrote:
> >>>>>>>> Hi Stephen!
> >>>>>>>>
> >>>>>>>> Please have a look at the QueryProcessing paragraph [1]. I've
> >>>>>>> described
> >>>>>>>> why IndexingSpi doesn't fit us well.
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
> >>>>>>>> On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
> >>>>>>>> stephen.darlington@gridgain.com> wrote:
> >>>>>>>>
> >>>>>>>>> How does this fit with the current IndexingSpi? Superficially
> they
> >>>>>>> appear
> >>>>>>>>> to do very similar things?
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Stephen
> >>>>>>>>>
> >>>>>>>>>> On 6 Apr 2021, at 14:13, Maksim Timonin <
> timonin.maxim@gmail.com
> >>>>>>> wrote:
> >>>>>>>>>> Hi, Igniters!
> >>>>>>>>>>
> >>>>>>>>>> I'd like to propose a new feature - opportunity to query and
> >> create
> >>>>>>>>> indexes
> >>>>>>>>>> from public API.
> >>>>>>>>>>
> >>>>>>>>>> It will help in some cases, where:
> >>>>>>>>>> 1. SQL is not applicable by design of user application;
> >>>>>>>>>> 2. Where IndexScan is preferable than ScanQuery for performance
> >>>>>>> reasons;
> >>>>>>>>>> 3. Functional indexes are required.
> >>>>>>>>>>
> >>>>>>>>>> Also it'll be great to have a transactional support for such
> >>>>>>> queries,
> >>>>>>>>> like
> >>>>>>>>>> the "select for update" query provides. But I don't dig there
> >>>>>>> much. It
> >>>>>>>>> will
> >>>>>>>>>> be a next step if this API will be implemented.
> >>>>>>>>>>
> >>>>>>>>>> I've prepared an IEP-71 for that [1] with more details. Please
> >>>>>>> share your
> >>>>>>>>>> thoughts.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> --
> Taras Ledkov
> Mail-To: tledkov@gridgain.com
>
>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Taras Ledkov <tl...@gridgain.com>.

Hi,

> It is an usability nightmare to make user write index name in all cases.
I don't see any difference between specifying the index name and specifying the index fields in the right order.
Do you see?

Let's there is the index:
idx_A_B ON TBL (A, B)

Is it OK that the query like below doesn't math the index 'idx_A_B'?
new IndexQuery<>(..)
     .setCriteria(lt("b", 1), lt("a", 2));

On 26.08.2021 15:23, Ivan Daschinsky wrote:
> I am against to make user write index name. It is quite simple and
> straightforward algorithm to match index to field names, so it is strange
> to compare it to sql engine optimizer.
>
> It is an usability nightmare to make user write index name in all cases.
>
> чт, 26 авг. 2021 г., 14:42 Maksim Timonin <ti...@gmail.com>:
>
>> Hi, Igniters!
>>
>> There is a discussion about how to specify an index to query with an
>> IndexQuery [1]. Currently my PR provides 2 ways to specify index:
>> 1. With a table and index name;
>> 2. With a table and list of index fields (without index name). In this case
>> IndexQueryProcessor tries to find an index that matches table and index
>> fields in strict order (order of fields in criteria has to match the order
>> of fields in index).
>>
>> Discussion is whether is the second approach valid?
>>
>> Pros:
>> 1. Currently index name is an optional field for QueryIndex and
>> QuerySqlField. Then users can create an index with a table and list of
>> fields. Then, we should provide an opportunity to define an index for
>> querying the same way as we do for creating.
>> 2. It's required to know the index name to query it (in case the index was
>> created without an explicit name). Users can find it and then use it as a
>> constant in code, but I see some troubles there:
>> 2.1. Get index name by querying the system view INDEXES. Note, that system
>> views are marked as an experimental feature [2].
>> 2.2. There is a workaround to know an index name with EXPLAIN clause for
>> sql query that uses the required index (but it depends on SQL optimizer).
>> 2.3. Users can use the index name builder, but it is in the
>> internal package
>> (org.apache.ignite.internal.processors.query.QueryUtils#indexName). Then it
>> can be changed from version to version without warning, and then the user
>> can't rely on it in code.
>> 3. Name of the Primary Key index (_key_PK) is predefined and hardcoded in
>> Ignite. Users can't set it while creating it, and the name of PK index is
>> hardcoded in the internal package too (QueryUtils.PRIMARY_KEY_INDEX).
>>
>> Cons:
>> 1. It's declared that IndexQuery avoids some SQL steps (like planning,
>> optimizer) in favor of speed. It looks like that looking for an index by
>> list of fields is the work of an optimizer.
>> 2. It can be not obvious that the order of fields in a query has to match
>> the order of fields in the related index. We should have it in mind when
>> building a query - there should be a check for order of fields before
>> querying.
>>
>>  From my side, I think that arguments for enforcing usage of an index name
>> for queries are strong enough. But for me it's strange that it's possible
>> to create an index without a name, but it's required to use name to query
>> it. Also taking in consideration that there is no guaranteed way to get an
>> index name (or I don't know it).
>>
>> Igniters, what do you think?
>> [1] https://github.com/apache/ignite/pull/9118#discussion_r642557531
>> [2] https://ignite.apache.org/docs/latest/monitoring-metrics/system-views
>>
>>
>> On Fri, Aug 6, 2021 at 4:04 PM Maksim Timonin <ti...@gmail.com>
>> wrote:
>>
>>> Hi, all!
>>>
>>> It's a gentle reminder. There is a PR for the new Index API [1]. It was
>>> approved by Alex Plekhanov. Does anybody want to review this API too? If
>>> there won't be objections we're going to merge it Monday, 16th of August.
>>>
>>> Thanks!
>>>
>>> [1] https://github.com/apache/ignite/pull/9118
>>>
>>> On Fri, May 21, 2021 at 10:43 PM Maksim Timonin <timonin.maxim@gmail.com
>>>
>>> wrote:
>>>
>>>> Andrey, hi!
>>>>
>>>> Some updates, there.
>>>>
>>>> I've submitted a PR for IndexQuery [1]. There is an issue about lazy
>> page
>>>> loading, that is also related to Text query ticket IGNITE-12291.
>>>>
>>>> CacheQueries already have pending pages functionality, it's done with
>>>> multiple sending GridCacheQueryRequest. There was an issue with
>> TextQuery
>>>> and limit, after exceeding a limit we still send requests, so I
>> submitted a
>>>> patch to fix this [2].
>>>>
>>>> But currently, TextQuery, as SqlFieldsQuery also does, prepares whole
>>>> data on query request, holds it, and provides a cursor over this
>>>> collection.
>>>>
>>>> As I understand you correctly, you propose to run TextQuery over index
>>>> with every poll page request. We can do this with Lucene
>>>> IndexSearcher.searchAfter. So from one side, it will save resources. But
>>>> from the other side, no queries (no TextQuery, no SqlFieldsQuery) lock
>>>> index for querying. So there can be data inconsistency, as there can be
>>>> concurrent operations on an index while a user iterates over the
>> cursor. It
>>>> also could be for queries now, due to no index lock being there, but the
>>>> window of time of such inconsistency is much shorter.
>>>>
>>>> The same dilemma I have for IndexQuery. In my patch [1] I provide lazy
>>>> iteration over BPlusTree. There is no lock on an index too while
>> querying.
>>>> And I want to discuss the right way. I have in mind the next things:
>>>> 1. Indexes currently doesn't support transactions, also SQL queries
>> don't
>>>> lock index for queries, so Ignite don't guarantee data consistency;
>>>> 2. As I understand preparing whole data for SQL queries is required due
>>>> to relations between tables. The more complex query and relations we
>> have,
>>>> the much consistency issues we have in result in case of parallel
>>>> operations;
>>>> 3. Querying a single index only (by TextQuery or IndexQuery) doesn't
>>>> affect any relations, so we can allow concurrent updates, as it could
>>>> affect a query result but it doesn't hurt.
>>>>
>>>> And following these thoughts, it's right to implement lazy iterations
>>>> over indexes. What do you think?
>>>>
>>>> Also, there is a second topic to discuss. BPlusTree indexes support
>> query
>>>> parallelism. But CacheQueries don't. There needs to be a change to
>>>> infrastructure to support query parallelism, so on this patch [1] I
>> handle
>>>> multiple segments in a single thread. And this works OK, as in the case
>> of
>>>> lazy querying it's very fast to initialize a cursor, so there is not
>> much
>>>> overhead on multiple segments. I ran performance tests and found that in
>>>> some cases, IndexQuery beats SqlFieldsQuery even with enabled
>>>> queryParallelism (it helps a SqlFieldsQuery much). So the need for
>>>> supporting queryParallelism for IndexQuery is required to be tested
>> well.
>>>> As IndexQuery already can help users to speed up some queries I propose
>> to
>>>> check queryParallelism a little bit later. WDYT?
>>>>
>>>> So, those 2 things affect the Apache Ignite release that IndexQuery will
>>>> be delivered with. So, please let me know your thoughts.
>>>>
>>>> Any thoughts from the community are welcome too.
>>>>
>>>>
>>>> [1] https://github.com/apache/ignite/pull/9118
>>>> [2] https://github.com/apache/ignite/pull/9086
>>>>
>>>> On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <timonin.maxim@gmail.com
>>>> wrote:
>>>>
>>>>> Andrey,
>>>>>
>>>>> Thanks! I picked it.
>>>>>
>>>>> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <
>> timonin.maxim@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Stephen,
>>>>>>
>>>>>> I don't see a reason to replace or deprecate IndexingSpi. I'm not
>>>>>> sure how smbd uses it, but it works now.
>>>>>>
>>>>>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
>>>>>> stephen.darlington@gridgain.com> wrote:
>>>>>>
>>>>>>> Is this a replacement for IndexingSpi? Put bluntly, do we deprecate
>>>>>>> (and remove) it?
>>>>>>>
>>>>>>> Or do you see them as complimentary?
>>>>>>>
>>>>>>>> On 12 Apr 2021, at 11:29, Maksim Timonin <ti...@gmail.com>
>>>>>>> wrote:
>>>>>>>> Hi Stephen!
>>>>>>>>
>>>>>>>> Please have a look at the QueryProcessing paragraph [1]. I've
>>>>>>> described
>>>>>>>> why IndexingSpi doesn't fit us well.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
>>>>>>>> On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
>>>>>>>> stephen.darlington@gridgain.com> wrote:
>>>>>>>>
>>>>>>>>> How does this fit with the current IndexingSpi? Superficially they
>>>>>>> appear
>>>>>>>>> to do very similar things?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Stephen
>>>>>>>>>
>>>>>>>>>> On 6 Apr 2021, at 14:13, Maksim Timonin <timonin.maxim@gmail.com
>>>>>>> wrote:
>>>>>>>>>> Hi, Igniters!
>>>>>>>>>>
>>>>>>>>>> I'd like to propose a new feature - opportunity to query and
>> create
>>>>>>>>> indexes
>>>>>>>>>> from public API.
>>>>>>>>>>
>>>>>>>>>> It will help in some cases, where:
>>>>>>>>>> 1. SQL is not applicable by design of user application;
>>>>>>>>>> 2. Where IndexScan is preferable than ScanQuery for performance
>>>>>>> reasons;
>>>>>>>>>> 3. Functional indexes are required.
>>>>>>>>>>
>>>>>>>>>> Also it'll be great to have a transactional support for such
>>>>>>> queries,
>>>>>>>>> like
>>>>>>>>>> the "select for update" query provides. But I don't dig there
>>>>>>> much. It
>>>>>>>>> will
>>>>>>>>>> be a next step if this API will be implemented.
>>>>>>>>>>
>>>>>>>>>> I've prepared an IEP-71 for that [1] with more details. Please
>>>>>>> share your
>>>>>>>>>> thoughts.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
-- 
Taras Ledkov
Mail-To: tledkov@gridgain.com

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Andrey Mashenkov <an...@gmail.com>.

@Ivan Daschinsky <iv...@gmail.com>

By the way, users are forced to specify index condition in order that match
column order in the index.

On Thu, Aug 26, 2021 at 3:24 PM Ivan Daschinsky <iv...@gmail.com> wrote:

> I am against to make user write index name. It is quite simple and
> straightforward algorithm to match index to field names, so it is strange
> to compare it to sql engine optimizer.
>
> It is an usability nightmare to make user write index name in all cases.
>
> чт, 26 авг. 2021 г., 14:42 Maksim Timonin <ti...@gmail.com>:
>
> > Hi, Igniters!
> >
> > There is a discussion about how to specify an index to query with an
> > IndexQuery [1]. Currently my PR provides 2 ways to specify index:
> > 1. With a table and index name;
> > 2. With a table and list of index fields (without index name). In this
> case
> > IndexQueryProcessor tries to find an index that matches table and index
> > fields in strict order (order of fields in criteria has to match the
> order
> > of fields in index).
> >
> > Discussion is whether is the second approach valid?
> >
> > Pros:
> > 1. Currently index name is an optional field for QueryIndex and
> > QuerySqlField. Then users can create an index with a table and list of
> > fields. Then, we should provide an opportunity to define an index for
> > querying the same way as we do for creating.
> > 2. It's required to know the index name to query it (in case the index
> was
> > created without an explicit name). Users can find it and then use it as a
> > constant in code, but I see some troubles there:
> > 2.1. Get index name by querying the system view INDEXES. Note, that
> system
> > views are marked as an experimental feature [2].
> > 2.2. There is a workaround to know an index name with EXPLAIN clause for
> > sql query that uses the required index (but it depends on SQL optimizer).
> > 2.3. Users can use the index name builder, but it is in the
> > internal package
> > (org.apache.ignite.internal.processors.query.QueryUtils#indexName). Then
> it
> > can be changed from version to version without warning, and then the user
> > can't rely on it in code.
> > 3. Name of the Primary Key index (_key_PK) is predefined and hardcoded in
> > Ignite. Users can't set it while creating it, and the name of PK index is
> > hardcoded in the internal package too (QueryUtils.PRIMARY_KEY_INDEX).
> >
> > Cons:
> > 1. It's declared that IndexQuery avoids some SQL steps (like planning,
> > optimizer) in favor of speed. It looks like that looking for an index by
> > list of fields is the work of an optimizer.
> > 2. It can be not obvious that the order of fields in a query has to match
> > the order of fields in the related index. We should have it in mind when
> > building a query - there should be a check for order of fields before
> > querying.
> >
> > From my side, I think that arguments for enforcing usage of an index name
> > for queries are strong enough. But for me it's strange that it's possible
> > to create an index without a name, but it's required to use name to query
> > it. Also taking in consideration that there is no guaranteed way to get
> an
> > index name (or I don't know it).
> >
> > Igniters, what do you think?
> > [1] https://github.com/apache/ignite/pull/9118#discussion_r642557531
> > [2]
> https://ignite.apache.org/docs/latest/monitoring-metrics/system-views
> >
> >
> > On Fri, Aug 6, 2021 at 4:04 PM Maksim Timonin <ti...@gmail.com>
> > wrote:
> >
> > > Hi, all!
> > >
> > > It's a gentle reminder. There is a PR for the new Index API [1]. It was
> > > approved by Alex Plekhanov. Does anybody want to review this API too?
> If
> > > there won't be objections we're going to merge it Monday, 16th of
> August.
> > >
> > > Thanks!
> > >
> > > [1] https://github.com/apache/ignite/pull/9118
> > >
> > > On Fri, May 21, 2021 at 10:43 PM Maksim Timonin <
> timonin.maxim@gmail.com
> > >
> > > wrote:
> > >
> > >> Andrey, hi!
> > >>
> > >> Some updates, there.
> > >>
> > >> I've submitted a PR for IndexQuery [1]. There is an issue about lazy
> > page
> > >> loading, that is also related to Text query ticket IGNITE-12291.
> > >>
> > >> CacheQueries already have pending pages functionality, it's done with
> > >> multiple sending GridCacheQueryRequest. There was an issue with
> > TextQuery
> > >> and limit, after exceeding a limit we still send requests, so I
> > submitted a
> > >> patch to fix this [2].
> > >>
> > >> But currently, TextQuery, as SqlFieldsQuery also does, prepares whole
> > >> data on query request, holds it, and provides a cursor over this
> > >> collection.
> > >>
> > >> As I understand you correctly, you propose to run TextQuery over index
> > >> with every poll page request. We can do this with Lucene
> > >> IndexSearcher.searchAfter. So from one side, it will save resources.
> But
> > >> from the other side, no queries (no TextQuery, no SqlFieldsQuery) lock
> > >> index for querying. So there can be data inconsistency, as there can
> be
> > >> concurrent operations on an index while a user iterates over the
> > cursor. It
> > >> also could be for queries now, due to no index lock being there, but
> the
> > >> window of time of such inconsistency is much shorter.
> > >>
> > >> The same dilemma I have for IndexQuery. In my patch [1] I provide lazy
> > >> iteration over BPlusTree. There is no lock on an index too while
> > querying.
> > >> And I want to discuss the right way. I have in mind the next things:
> > >> 1. Indexes currently doesn't support transactions, also SQL queries
> > don't
> > >> lock index for queries, so Ignite don't guarantee data consistency;
> > >> 2. As I understand preparing whole data for SQL queries is required
> due
> > >> to relations between tables. The more complex query and relations we
> > have,
> > >> the much consistency issues we have in result in case of parallel
> > >> operations;
> > >> 3. Querying a single index only (by TextQuery or IndexQuery) doesn't
> > >> affect any relations, so we can allow concurrent updates, as it could
> > >> affect a query result but it doesn't hurt.
> > >>
> > >> And following these thoughts, it's right to implement lazy iterations
> > >> over indexes. What do you think?
> > >>
> > >> Also, there is a second topic to discuss. BPlusTree indexes support
> > query
> > >> parallelism. But CacheQueries don't. There needs to be a change to
> > >> infrastructure to support query parallelism, so on this patch [1] I
> > handle
> > >> multiple segments in a single thread. And this works OK, as in the
> case
> > of
> > >> lazy querying it's very fast to initialize a cursor, so there is not
> > much
> > >> overhead on multiple segments. I ran performance tests and found that
> in
> > >> some cases, IndexQuery beats SqlFieldsQuery even with enabled
> > >> queryParallelism (it helps a SqlFieldsQuery much). So the need for
> > >> supporting queryParallelism for IndexQuery is required to be tested
> > well.
> > >> As IndexQuery already can help users to speed up some queries I
> propose
> > to
> > >> check queryParallelism a little bit later. WDYT?
> > >>
> > >> So, those 2 things affect the Apache Ignite release that IndexQuery
> will
> > >> be delivered with. So, please let me know your thoughts.
> > >>
> > >> Any thoughts from the community are welcome too.
> > >>
> > >>
> > >> [1] https://github.com/apache/ignite/pull/9118
> > >> [2] https://github.com/apache/ignite/pull/9086
> > >>
> > >> On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <
> timonin.maxim@gmail.com
> > >
> > >> wrote:
> > >>
> > >>> Andrey,
> > >>>
> > >>> Thanks! I picked it.
> > >>>
> > >>> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <
> > timonin.maxim@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Stephen,
> > >>>>
> > >>>> I don't see a reason to replace or deprecate IndexingSpi. I'm not
> > >>>> sure how smbd uses it, but it works now.
> > >>>>
> > >>>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
> > >>>> stephen.darlington@gridgain.com> wrote:
> > >>>>
> > >>>>> Is this a replacement for IndexingSpi? Put bluntly, do we deprecate
> > >>>>> (and remove) it?
> > >>>>>
> > >>>>> Or do you see them as complimentary?
> > >>>>>
> > >>>>> > On 12 Apr 2021, at 11:29, Maksim Timonin <
> timonin.maxim@gmail.com>
> > >>>>> wrote:
> > >>>>> >
> > >>>>> > Hi Stephen!
> > >>>>> >
> > >>>>> > Please have a look at the QueryProcessing paragraph [1]. I've
> > >>>>> described
> > >>>>> > why IndexingSpi doesn't fit us well.
> > >>>>> >
> > >>>>> > [1]
> > >>>>> >
> > >>>>>
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
> > >>>>> >
> > >>>>> > On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
> > >>>>> > stephen.darlington@gridgain.com> wrote:
> > >>>>> >
> > >>>>> >> How does this fit with the current IndexingSpi? Superficially
> they
> > >>>>> appear
> > >>>>> >> to do very similar things?
> > >>>>> >>
> > >>>>> >> Regards,
> > >>>>> >> Stephen
> > >>>>> >>
> > >>>>> >>> On 6 Apr 2021, at 14:13, Maksim Timonin <
> timonin.maxim@gmail.com
> > >
> > >>>>> wrote:
> > >>>>> >>>
> > >>>>> >>> Hi, Igniters!
> > >>>>> >>>
> > >>>>> >>> I'd like to propose a new feature - opportunity to query and
> > create
> > >>>>> >> indexes
> > >>>>> >>> from public API.
> > >>>>> >>>
> > >>>>> >>> It will help in some cases, where:
> > >>>>> >>> 1. SQL is not applicable by design of user application;
> > >>>>> >>> 2. Where IndexScan is preferable than ScanQuery for performance
> > >>>>> reasons;
> > >>>>> >>> 3. Functional indexes are required.
> > >>>>> >>>
> > >>>>> >>> Also it'll be great to have a transactional support for such
> > >>>>> queries,
> > >>>>> >> like
> > >>>>> >>> the "select for update" query provides. But I don't dig there
> > >>>>> much. It
> > >>>>> >> will
> > >>>>> >>> be a next step if this API will be implemented.
> > >>>>> >>>
> > >>>>> >>> I've prepared an IEP-71 for that [1] with more details. Please
> > >>>>> share your
> > >>>>> >>> thoughts.
> > >>>>> >>>
> > >>>>> >>>
> > >>>>> >>> [1]
> > >>>>> >>>
> > >>>>> >>
> > >>>>>
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
> > >>>>> >>
> > >>>>> >>
> > >>>>> >>
> > >>>>>
> > >>>>>
> > >>>>>
> >
>


-- 
Best regards,
Andrey V. Mashenkov

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Ivan Daschinsky <iv...@gmail.com>.

I am against to make user write index name. It is quite simple and
straightforward algorithm to match index to field names, so it is strange
to compare it to sql engine optimizer.

It is an usability nightmare to make user write index name in all cases.

чт, 26 авг. 2021 г., 14:42 Maksim Timonin <ti...@gmail.com>:

> Hi, Igniters!
>
> There is a discussion about how to specify an index to query with an
> IndexQuery [1]. Currently my PR provides 2 ways to specify index:
> 1. With a table and index name;
> 2. With a table and list of index fields (without index name). In this case
> IndexQueryProcessor tries to find an index that matches table and index
> fields in strict order (order of fields in criteria has to match the order
> of fields in index).
>
> Discussion is whether is the second approach valid?
>
> Pros:
> 1. Currently index name is an optional field for QueryIndex and
> QuerySqlField. Then users can create an index with a table and list of
> fields. Then, we should provide an opportunity to define an index for
> querying the same way as we do for creating.
> 2. It's required to know the index name to query it (in case the index was
> created without an explicit name). Users can find it and then use it as a
> constant in code, but I see some troubles there:
> 2.1. Get index name by querying the system view INDEXES. Note, that system
> views are marked as an experimental feature [2].
> 2.2. There is a workaround to know an index name with EXPLAIN clause for
> sql query that uses the required index (but it depends on SQL optimizer).
> 2.3. Users can use the index name builder, but it is in the
> internal package
> (org.apache.ignite.internal.processors.query.QueryUtils#indexName). Then it
> can be changed from version to version without warning, and then the user
> can't rely on it in code.
> 3. Name of the Primary Key index (_key_PK) is predefined and hardcoded in
> Ignite. Users can't set it while creating it, and the name of PK index is
> hardcoded in the internal package too (QueryUtils.PRIMARY_KEY_INDEX).
>
> Cons:
> 1. It's declared that IndexQuery avoids some SQL steps (like planning,
> optimizer) in favor of speed. It looks like that looking for an index by
> list of fields is the work of an optimizer.
> 2. It can be not obvious that the order of fields in a query has to match
> the order of fields in the related index. We should have it in mind when
> building a query - there should be a check for order of fields before
> querying.
>
> From my side, I think that arguments for enforcing usage of an index name
> for queries are strong enough. But for me it's strange that it's possible
> to create an index without a name, but it's required to use name to query
> it. Also taking in consideration that there is no guaranteed way to get an
> index name (or I don't know it).
>
> Igniters, what do you think?
> [1] https://github.com/apache/ignite/pull/9118#discussion_r642557531
> [2] https://ignite.apache.org/docs/latest/monitoring-metrics/system-views
>
>
> On Fri, Aug 6, 2021 at 4:04 PM Maksim Timonin <ti...@gmail.com>
> wrote:
>
> > Hi, all!
> >
> > It's a gentle reminder. There is a PR for the new Index API [1]. It was
> > approved by Alex Plekhanov. Does anybody want to review this API too? If
> > there won't be objections we're going to merge it Monday, 16th of August.
> >
> > Thanks!
> >
> > [1] https://github.com/apache/ignite/pull/9118
> >
> > On Fri, May 21, 2021 at 10:43 PM Maksim Timonin <timonin.maxim@gmail.com
> >
> > wrote:
> >
> >> Andrey, hi!
> >>
> >> Some updates, there.
> >>
> >> I've submitted a PR for IndexQuery [1]. There is an issue about lazy
> page
> >> loading, that is also related to Text query ticket IGNITE-12291.
> >>
> >> CacheQueries already have pending pages functionality, it's done with
> >> multiple sending GridCacheQueryRequest. There was an issue with
> TextQuery
> >> and limit, after exceeding a limit we still send requests, so I
> submitted a
> >> patch to fix this [2].
> >>
> >> But currently, TextQuery, as SqlFieldsQuery also does, prepares whole
> >> data on query request, holds it, and provides a cursor over this
> >> collection.
> >>
> >> As I understand you correctly, you propose to run TextQuery over index
> >> with every poll page request. We can do this with Lucene
> >> IndexSearcher.searchAfter. So from one side, it will save resources. But
> >> from the other side, no queries (no TextQuery, no SqlFieldsQuery) lock
> >> index for querying. So there can be data inconsistency, as there can be
> >> concurrent operations on an index while a user iterates over the
> cursor. It
> >> also could be for queries now, due to no index lock being there, but the
> >> window of time of such inconsistency is much shorter.
> >>
> >> The same dilemma I have for IndexQuery. In my patch [1] I provide lazy
> >> iteration over BPlusTree. There is no lock on an index too while
> querying.
> >> And I want to discuss the right way. I have in mind the next things:
> >> 1. Indexes currently doesn't support transactions, also SQL queries
> don't
> >> lock index for queries, so Ignite don't guarantee data consistency;
> >> 2. As I understand preparing whole data for SQL queries is required due
> >> to relations between tables. The more complex query and relations we
> have,
> >> the much consistency issues we have in result in case of parallel
> >> operations;
> >> 3. Querying a single index only (by TextQuery or IndexQuery) doesn't
> >> affect any relations, so we can allow concurrent updates, as it could
> >> affect a query result but it doesn't hurt.
> >>
> >> And following these thoughts, it's right to implement lazy iterations
> >> over indexes. What do you think?
> >>
> >> Also, there is a second topic to discuss. BPlusTree indexes support
> query
> >> parallelism. But CacheQueries don't. There needs to be a change to
> >> infrastructure to support query parallelism, so on this patch [1] I
> handle
> >> multiple segments in a single thread. And this works OK, as in the case
> of
> >> lazy querying it's very fast to initialize a cursor, so there is not
> much
> >> overhead on multiple segments. I ran performance tests and found that in
> >> some cases, IndexQuery beats SqlFieldsQuery even with enabled
> >> queryParallelism (it helps a SqlFieldsQuery much). So the need for
> >> supporting queryParallelism for IndexQuery is required to be tested
> well.
> >> As IndexQuery already can help users to speed up some queries I propose
> to
> >> check queryParallelism a little bit later. WDYT?
> >>
> >> So, those 2 things affect the Apache Ignite release that IndexQuery will
> >> be delivered with. So, please let me know your thoughts.
> >>
> >> Any thoughts from the community are welcome too.
> >>
> >>
> >> [1] https://github.com/apache/ignite/pull/9118
> >> [2] https://github.com/apache/ignite/pull/9086
> >>
> >> On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <timonin.maxim@gmail.com
> >
> >> wrote:
> >>
> >>> Andrey,
> >>>
> >>> Thanks! I picked it.
> >>>
> >>> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <
> timonin.maxim@gmail.com>
> >>> wrote:
> >>>
> >>>> Stephen,
> >>>>
> >>>> I don't see a reason to replace or deprecate IndexingSpi. I'm not
> >>>> sure how smbd uses it, but it works now.
> >>>>
> >>>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
> >>>> stephen.darlington@gridgain.com> wrote:
> >>>>
> >>>>> Is this a replacement for IndexingSpi? Put bluntly, do we deprecate
> >>>>> (and remove) it?
> >>>>>
> >>>>> Or do you see them as complimentary?
> >>>>>
> >>>>> > On 12 Apr 2021, at 11:29, Maksim Timonin <ti...@gmail.com>
> >>>>> wrote:
> >>>>> >
> >>>>> > Hi Stephen!
> >>>>> >
> >>>>> > Please have a look at the QueryProcessing paragraph [1]. I've
> >>>>> described
> >>>>> > why IndexingSpi doesn't fit us well.
> >>>>> >
> >>>>> > [1]
> >>>>> >
> >>>>>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
> >>>>> >
> >>>>> > On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
> >>>>> > stephen.darlington@gridgain.com> wrote:
> >>>>> >
> >>>>> >> How does this fit with the current IndexingSpi? Superficially they
> >>>>> appear
> >>>>> >> to do very similar things?
> >>>>> >>
> >>>>> >> Regards,
> >>>>> >> Stephen
> >>>>> >>
> >>>>> >>> On 6 Apr 2021, at 14:13, Maksim Timonin <timonin.maxim@gmail.com
> >
> >>>>> wrote:
> >>>>> >>>
> >>>>> >>> Hi, Igniters!
> >>>>> >>>
> >>>>> >>> I'd like to propose a new feature - opportunity to query and
> create
> >>>>> >> indexes
> >>>>> >>> from public API.
> >>>>> >>>
> >>>>> >>> It will help in some cases, where:
> >>>>> >>> 1. SQL is not applicable by design of user application;
> >>>>> >>> 2. Where IndexScan is preferable than ScanQuery for performance
> >>>>> reasons;
> >>>>> >>> 3. Functional indexes are required.
> >>>>> >>>
> >>>>> >>> Also it'll be great to have a transactional support for such
> >>>>> queries,
> >>>>> >> like
> >>>>> >>> the "select for update" query provides. But I don't dig there
> >>>>> much. It
> >>>>> >> will
> >>>>> >>> be a next step if this API will be implemented.
> >>>>> >>>
> >>>>> >>> I've prepared an IEP-71 for that [1] with more details. Please
> >>>>> share your
> >>>>> >>> thoughts.
> >>>>> >>>
> >>>>> >>>
> >>>>> >>> [1]
> >>>>> >>>
> >>>>> >>
> >>>>>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>>
> >>>>>
> >>>>>
>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Maksim Timonin <ti...@gmail.com>.

Hi, Igniters!

There is a discussion about how to specify an index to query with an
IndexQuery [1]. Currently my PR provides 2 ways to specify index:
1. With a table and index name;
2. With a table and list of index fields (without index name). In this case
IndexQueryProcessor tries to find an index that matches table and index
fields in strict order (order of fields in criteria has to match the order
of fields in index).

Discussion is whether is the second approach valid?

Pros:
1. Currently index name is an optional field for QueryIndex and
QuerySqlField. Then users can create an index with a table and list of
fields. Then, we should provide an opportunity to define an index for
querying the same way as we do for creating.
2. It's required to know the index name to query it (in case the index was
created without an explicit name). Users can find it and then use it as a
constant in code, but I see some troubles there:
2.1. Get index name by querying the system view INDEXES. Note, that system
views are marked as an experimental feature [2].
2.2. There is a workaround to know an index name with EXPLAIN clause for
sql query that uses the required index (but it depends on SQL optimizer).
2.3. Users can use the index name builder, but it is in the
internal package
(org.apache.ignite.internal.processors.query.QueryUtils#indexName). Then it
can be changed from version to version without warning, and then the user
can't rely on it in code.
3. Name of the Primary Key index (_key_PK) is predefined and hardcoded in
Ignite. Users can't set it while creating it, and the name of PK index is
hardcoded in the internal package too (QueryUtils.PRIMARY_KEY_INDEX).

Cons:
1. It's declared that IndexQuery avoids some SQL steps (like planning,
optimizer) in favor of speed. It looks like that looking for an index by
list of fields is the work of an optimizer.
2. It can be not obvious that the order of fields in a query has to match
the order of fields in the related index. We should have it in mind when
building a query - there should be a check for order of fields before
querying.

From my side, I think that arguments for enforcing usage of an index name
for queries are strong enough. But for me it's strange that it's possible
to create an index without a name, but it's required to use name to query
it. Also taking in consideration that there is no guaranteed way to get an
index name (or I don't know it).

Igniters, what do you think?
[1] https://github.com/apache/ignite/pull/9118#discussion_r642557531
[2] https://ignite.apache.org/docs/latest/monitoring-metrics/system-views

On Fri, Aug 6, 2021 at 4:04 PM Maksim Timonin <ti...@gmail.com>
wrote:

> Hi, all!
>
> It's a gentle reminder. There is a PR for the new Index API [1]. It was
> approved by Alex Plekhanov. Does anybody want to review this API too? If
> there won't be objections we're going to merge it Monday, 16th of August.
>
> Thanks!
>
> [1] https://github.com/apache/ignite/pull/9118
>
> On Fri, May 21, 2021 at 10:43 PM Maksim Timonin <ti...@gmail.com>
> wrote:
>
>> Andrey, hi!
>>
>> Some updates, there.
>>
>> I've submitted a PR for IndexQuery [1]. There is an issue about lazy page
>> loading, that is also related to Text query ticket IGNITE-12291.
>>
>> CacheQueries already have pending pages functionality, it's done with
>> multiple sending GridCacheQueryRequest. There was an issue with TextQuery
>> and limit, after exceeding a limit we still send requests, so I submitted a
>> patch to fix this [2].
>>
>> But currently, TextQuery, as SqlFieldsQuery also does, prepares whole
>> data on query request, holds it, and provides a cursor over this
>> collection.
>>
>> As I understand you correctly, you propose to run TextQuery over index
>> with every poll page request. We can do this with Lucene
>> IndexSearcher.searchAfter. So from one side, it will save resources. But
>> from the other side, no queries (no TextQuery, no SqlFieldsQuery) lock
>> index for querying. So there can be data inconsistency, as there can be
>> concurrent operations on an index while a user iterates over the cursor. It
>> also could be for queries now, due to no index lock being there, but the
>> window of time of such inconsistency is much shorter.
>>
>> The same dilemma I have for IndexQuery. In my patch [1] I provide lazy
>> iteration over BPlusTree. There is no lock on an index too while querying.
>> And I want to discuss the right way. I have in mind the next things:
>> 1. Indexes currently doesn't support transactions, also SQL queries don't
>> lock index for queries, so Ignite don't guarantee data consistency;
>> 2. As I understand preparing whole data for SQL queries is required due
>> to relations between tables. The more complex query and relations we have,
>> the much consistency issues we have in result in case of parallel
>> operations;
>> 3. Querying a single index only (by TextQuery or IndexQuery) doesn't
>> affect any relations, so we can allow concurrent updates, as it could
>> affect a query result but it doesn't hurt.
>>
>> And following these thoughts, it's right to implement lazy iterations
>> over indexes. What do you think?
>>
>> Also, there is a second topic to discuss. BPlusTree indexes support query
>> parallelism. But CacheQueries don't. There needs to be a change to
>> infrastructure to support query parallelism, so on this patch [1] I handle
>> multiple segments in a single thread. And this works OK, as in the case of
>> lazy querying it's very fast to initialize a cursor, so there is not much
>> overhead on multiple segments. I ran performance tests and found that in
>> some cases, IndexQuery beats SqlFieldsQuery even with enabled
>> queryParallelism (it helps a SqlFieldsQuery much). So the need for
>> supporting queryParallelism for IndexQuery is required to be tested well.
>> As IndexQuery already can help users to speed up some queries I propose to
>> check queryParallelism a little bit later. WDYT?
>>
>> So, those 2 things affect the Apache Ignite release that IndexQuery will
>> be delivered with. So, please let me know your thoughts.
>>
>> Any thoughts from the community are welcome too.
>>
>>
>> [1] https://github.com/apache/ignite/pull/9118
>> [2] https://github.com/apache/ignite/pull/9086
>>
>> On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <ti...@gmail.com>
>> wrote:
>>
>>> Andrey,
>>>
>>> Thanks! I picked it.
>>>
>>> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <ti...@gmail.com>
>>> wrote:
>>>
>>>> Stephen,
>>>>
>>>> I don't see a reason to replace or deprecate IndexingSpi. I'm not
>>>> sure how smbd uses it, but it works now.
>>>>
>>>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
>>>> stephen.darlington@gridgain.com> wrote:
>>>>
>>>>> Is this a replacement for IndexingSpi? Put bluntly, do we deprecate
>>>>> (and remove) it?
>>>>>
>>>>> Or do you see them as complimentary?
>>>>>
>>>>> > On 12 Apr 2021, at 11:29, Maksim Timonin <ti...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > Hi Stephen!
>>>>> >
>>>>> > Please have a look at the QueryProcessing paragraph [1]. I've
>>>>> described
>>>>> > why IndexingSpi doesn't fit us well.
>>>>> >
>>>>> > [1]
>>>>> >
>>>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
>>>>> >
>>>>> > On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
>>>>> > stephen.darlington@gridgain.com> wrote:
>>>>> >
>>>>> >> How does this fit with the current IndexingSpi? Superficially they
>>>>> appear
>>>>> >> to do very similar things?
>>>>> >>
>>>>> >> Regards,
>>>>> >> Stephen
>>>>> >>
>>>>> >>> On 6 Apr 2021, at 14:13, Maksim Timonin <ti...@gmail.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>> Hi, Igniters!
>>>>> >>>
>>>>> >>> I'd like to propose a new feature - opportunity to query and create
>>>>> >> indexes
>>>>> >>> from public API.
>>>>> >>>
>>>>> >>> It will help in some cases, where:
>>>>> >>> 1. SQL is not applicable by design of user application;
>>>>> >>> 2. Where IndexScan is preferable than ScanQuery for performance
>>>>> reasons;
>>>>> >>> 3. Functional indexes are required.
>>>>> >>>
>>>>> >>> Also it'll be great to have a transactional support for such
>>>>> queries,
>>>>> >> like
>>>>> >>> the "select for update" query provides. But I don't dig there
>>>>> much. It
>>>>> >> will
>>>>> >>> be a next step if this API will be implemented.
>>>>> >>>
>>>>> >>> I've prepared an IEP-71 for that [1] with more details. Please
>>>>> share your
>>>>> >>> thoughts.
>>>>> >>>
>>>>> >>>
>>>>> >>> [1]
>>>>> >>>
>>>>> >>
>>>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
>>>>> >>
>>>>> >>
>>>>> >>
>>>>>
>>>>>
>>>>>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Maksim Timonin <ti...@gmail.com>.

Hi, all!

It's a gentle reminder. There is a PR for the new Index API [1]. It was
approved by Alex Plekhanov. Does anybody want to review this API too? If
there won't be objections we're going to merge it Monday, 16th of August.

Thanks!

[1] https://github.com/apache/ignite/pull/9118

On Fri, May 21, 2021 at 10:43 PM Maksim Timonin <ti...@gmail.com>
wrote:

> Andrey, hi!
>
> Some updates, there.
>
> I've submitted a PR for IndexQuery [1]. There is an issue about lazy page
> loading, that is also related to Text query ticket IGNITE-12291.
>
> CacheQueries already have pending pages functionality, it's done with
> multiple sending GridCacheQueryRequest. There was an issue with TextQuery
> and limit, after exceeding a limit we still send requests, so I submitted a
> patch to fix this [2].
>
> But currently, TextQuery, as SqlFieldsQuery also does, prepares whole data
> on query request, holds it, and provides a cursor over this collection.
>
> As I understand you correctly, you propose to run TextQuery over index
> with every poll page request. We can do this with Lucene
> IndexSearcher.searchAfter. So from one side, it will save resources. But
> from the other side, no queries (no TextQuery, no SqlFieldsQuery) lock
> index for querying. So there can be data inconsistency, as there can be
> concurrent operations on an index while a user iterates over the cursor. It
> also could be for queries now, due to no index lock being there, but the
> window of time of such inconsistency is much shorter.
>
> The same dilemma I have for IndexQuery. In my patch [1] I provide lazy
> iteration over BPlusTree. There is no lock on an index too while querying.
> And I want to discuss the right way. I have in mind the next things:
> 1. Indexes currently doesn't support transactions, also SQL queries don't
> lock index for queries, so Ignite don't guarantee data consistency;
> 2. As I understand preparing whole data for SQL queries is required due to
> relations between tables. The more complex query and relations we have, the
> much consistency issues we have in result in case of parallel operations;
> 3. Querying a single index only (by TextQuery or IndexQuery) doesn't
> affect any relations, so we can allow concurrent updates, as it could
> affect a query result but it doesn't hurt.
>
> And following these thoughts, it's right to implement lazy iterations over
> indexes. What do you think?
>
> Also, there is a second topic to discuss. BPlusTree indexes support query
> parallelism. But CacheQueries don't. There needs to be a change to
> infrastructure to support query parallelism, so on this patch [1] I handle
> multiple segments in a single thread. And this works OK, as in the case of
> lazy querying it's very fast to initialize a cursor, so there is not much
> overhead on multiple segments. I ran performance tests and found that in
> some cases, IndexQuery beats SqlFieldsQuery even with enabled
> queryParallelism (it helps a SqlFieldsQuery much). So the need for
> supporting queryParallelism for IndexQuery is required to be tested well.
> As IndexQuery already can help users to speed up some queries I propose to
> check queryParallelism a little bit later. WDYT?
>
> So, those 2 things affect the Apache Ignite release that IndexQuery will
> be delivered with. So, please let me know your thoughts.
>
> Any thoughts from the community are welcome too.
>
>
> [1] https://github.com/apache/ignite/pull/9118
> [2] https://github.com/apache/ignite/pull/9086
>
> On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <ti...@gmail.com>
> wrote:
>
>> Andrey,
>>
>> Thanks! I picked it.
>>
>> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <ti...@gmail.com>
>> wrote:
>>
>>> Stephen,
>>>
>>> I don't see a reason to replace or deprecate IndexingSpi. I'm not
>>> sure how smbd uses it, but it works now.
>>>
>>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
>>> stephen.darlington@gridgain.com> wrote:
>>>
>>>> Is this a replacement for IndexingSpi? Put bluntly, do we deprecate
>>>> (and remove) it?
>>>>
>>>> Or do you see them as complimentary?
>>>>
>>>> > On 12 Apr 2021, at 11:29, Maksim Timonin <ti...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hi Stephen!
>>>> >
>>>> > Please have a look at the QueryProcessing paragraph [1]. I've
>>>> described
>>>> > why IndexingSpi doesn't fit us well.
>>>> >
>>>> > [1]
>>>> >
>>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
>>>> >
>>>> > On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
>>>> > stephen.darlington@gridgain.com> wrote:
>>>> >
>>>> >> How does this fit with the current IndexingSpi? Superficially they
>>>> appear
>>>> >> to do very similar things?
>>>> >>
>>>> >> Regards,
>>>> >> Stephen
>>>> >>
>>>> >>> On 6 Apr 2021, at 14:13, Maksim Timonin <ti...@gmail.com>
>>>> wrote:
>>>> >>>
>>>> >>> Hi, Igniters!
>>>> >>>
>>>> >>> I'd like to propose a new feature - opportunity to query and create
>>>> >> indexes
>>>> >>> from public API.
>>>> >>>
>>>> >>> It will help in some cases, where:
>>>> >>> 1. SQL is not applicable by design of user application;
>>>> >>> 2. Where IndexScan is preferable than ScanQuery for performance
>>>> reasons;
>>>> >>> 3. Functional indexes are required.
>>>> >>>
>>>> >>> Also it'll be great to have a transactional support for such
>>>> queries,
>>>> >> like
>>>> >>> the "select for update" query provides. But I don't dig there much.
>>>> It
>>>> >> will
>>>> >>> be a next step if this API will be implemented.
>>>> >>>
>>>> >>> I've prepared an IEP-71 for that [1] with more details. Please
>>>> share your
>>>> >>> thoughts.
>>>> >>>
>>>> >>>
>>>> >>> [1]
>>>> >>>
>>>> >>
>>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
>>>> >>
>>>> >>
>>>> >>
>>>>
>>>>
>>>>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Maksim Timonin <ti...@gmail.com>.

Andrey, hi!

Some updates, there.

I've submitted a PR for IndexQuery [1]. There is an issue about lazy page
loading, that is also related to Text query ticket IGNITE-12291.

CacheQueries already have pending pages functionality, it's done with
multiple sending GridCacheQueryRequest. There was an issue with TextQuery
and limit, after exceeding a limit we still send requests, so I submitted a
patch to fix this [2].

But currently, TextQuery, as SqlFieldsQuery also does, prepares whole data
on query request, holds it, and provides a cursor over this collection.

As I understand you correctly, you propose to run TextQuery over index with
every poll page request. We can do this with Lucene
IndexSearcher.searchAfter. So from one side, it will save resources. But
from the other side, no queries (no TextQuery, no SqlFieldsQuery) lock
index for querying. So there can be data inconsistency, as there can be
concurrent operations on an index while a user iterates over the cursor. It
also could be for queries now, due to no index lock being there, but the
window of time of such inconsistency is much shorter.

The same dilemma I have for IndexQuery. In my patch [1] I provide lazy
iteration over BPlusTree. There is no lock on an index too while querying.
And I want to discuss the right way. I have in mind the next things:
1. Indexes currently doesn't support transactions, also SQL queries don't
lock index for queries, so Ignite don't guarantee data consistency;
2. As I understand preparing whole data for SQL queries is required due to
relations between tables. The more complex query and relations we have, the
much consistency issues we have in result in case of parallel operations;
3. Querying a single index only (by TextQuery or IndexQuery) doesn't affect
any relations, so we can allow concurrent updates, as it could affect a
query result but it doesn't hurt.

And following these thoughts, it's right to implement lazy iterations over
indexes. What do you think?

Also, there is a second topic to discuss. BPlusTree indexes support query
parallelism. But CacheQueries don't. There needs to be a change to
infrastructure to support query parallelism, so on this patch [1] I handle
multiple segments in a single thread. And this works OK, as in the case of
lazy querying it's very fast to initialize a cursor, so there is not much
overhead on multiple segments. I ran performance tests and found that in
some cases, IndexQuery beats SqlFieldsQuery even with enabled
queryParallelism (it helps a SqlFieldsQuery much). So the need for
supporting queryParallelism for IndexQuery is required to be tested well.
As IndexQuery already can help users to speed up some queries I propose to
check queryParallelism a little bit later. WDYT?

So, those 2 things affect the Apache Ignite release that IndexQuery will be
delivered with. So, please let me know your thoughts.

Any thoughts from the community are welcome too.

[1] https://github.com/apache/ignite/pull/9118
[2] https://github.com/apache/ignite/pull/9086

On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <ti...@gmail.com>
wrote:

> Andrey,
>
> Thanks! I picked it.
>
> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <ti...@gmail.com>
> wrote:
>
>> Stephen,
>>
>> I don't see a reason to replace or deprecate IndexingSpi. I'm not
>> sure how smbd uses it, but it works now.
>>
>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
>> stephen.darlington@gridgain.com> wrote:
>>
>>> Is this a replacement for IndexingSpi? Put bluntly, do we deprecate (and
>>> remove) it?
>>>
>>> Or do you see them as complimentary?
>>>
>>> > On 12 Apr 2021, at 11:29, Maksim Timonin <ti...@gmail.com>
>>> wrote:
>>> >
>>> > Hi Stephen!
>>> >
>>> > Please have a look at the QueryProcessing paragraph [1]. I've described
>>> > why IndexingSpi doesn't fit us well.
>>> >
>>> > [1]
>>> >
>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
>>> >
>>> > On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
>>> > stephen.darlington@gridgain.com> wrote:
>>> >
>>> >> How does this fit with the current IndexingSpi? Superficially they
>>> appear
>>> >> to do very similar things?
>>> >>
>>> >> Regards,
>>> >> Stephen
>>> >>
>>> >>> On 6 Apr 2021, at 14:13, Maksim Timonin <ti...@gmail.com>
>>> wrote:
>>> >>>
>>> >>> Hi, Igniters!
>>> >>>
>>> >>> I'd like to propose a new feature - opportunity to query and create
>>> >> indexes
>>> >>> from public API.
>>> >>>
>>> >>> It will help in some cases, where:
>>> >>> 1. SQL is not applicable by design of user application;
>>> >>> 2. Where IndexScan is preferable than ScanQuery for performance
>>> reasons;
>>> >>> 3. Functional indexes are required.
>>> >>>
>>> >>> Also it'll be great to have a transactional support for such queries,
>>> >> like
>>> >>> the "select for update" query provides. But I don't dig there much.
>>> It
>>> >> will
>>> >>> be a next step if this API will be implemented.
>>> >>>
>>> >>> I've prepared an IEP-71 for that [1] with more details. Please share
>>> your
>>> >>> thoughts.
>>> >>>
>>> >>>
>>> >>> [1]
>>> >>>
>>> >>
>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
>>> >>
>>> >>
>>> >>
>>>
>>>
>>>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Maksim Timonin <ti...@gmail.com>.

Andrey,

Thanks! I picked it.

On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <ti...@gmail.com>
wrote:

> Stephen,
>
> I don't see a reason to replace or deprecate IndexingSpi. I'm not sure how
> smbd uses it, but it works now.
>
> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
> stephen.darlington@gridgain.com> wrote:
>
>> Is this a replacement for IndexingSpi? Put bluntly, do we deprecate (and
>> remove) it?
>>
>> Or do you see them as complimentary?
>>
>> > On 12 Apr 2021, at 11:29, Maksim Timonin <ti...@gmail.com>
>> wrote:
>> >
>> > Hi Stephen!
>> >
>> > Please have a look at the QueryProcessing paragraph [1]. I've described
>> > why IndexingSpi doesn't fit us well.
>> >
>> > [1]
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
>> >
>> > On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
>> > stephen.darlington@gridgain.com> wrote:
>> >
>> >> How does this fit with the current IndexingSpi? Superficially they
>> appear
>> >> to do very similar things?
>> >>
>> >> Regards,
>> >> Stephen
>> >>
>> >>> On 6 Apr 2021, at 14:13, Maksim Timonin <ti...@gmail.com>
>> wrote:
>> >>>
>> >>> Hi, Igniters!
>> >>>
>> >>> I'd like to propose a new feature - opportunity to query and create
>> >> indexes
>> >>> from public API.
>> >>>
>> >>> It will help in some cases, where:
>> >>> 1. SQL is not applicable by design of user application;
>> >>> 2. Where IndexScan is preferable than ScanQuery for performance
>> reasons;
>> >>> 3. Functional indexes are required.
>> >>>
>> >>> Also it'll be great to have a transactional support for such queries,
>> >> like
>> >>> the "select for update" query provides. But I don't dig there much. It
>> >> will
>> >>> be a next step if this API will be implemented.
>> >>>
>> >>> I've prepared an IEP-71 for that [1] with more details. Please share
>> your
>> >>> thoughts.
>> >>>
>> >>>
>> >>> [1]
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
>> >>
>> >>
>> >>
>>
>>
>>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Maksim Timonin <ti...@gmail.com>.

Stephen,

I don't see a reason to replace or deprecate IndexingSpi. I'm not sure how
smbd uses it, but it works now.

On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
stephen.darlington@gridgain.com> wrote:

> Is this a replacement for IndexingSpi? Put bluntly, do we deprecate (and
> remove) it?
>
> Or do you see them as complimentary?
>
> > On 12 Apr 2021, at 11:29, Maksim Timonin <ti...@gmail.com>
> wrote:
> >
> > Hi Stephen!
> >
> > Please have a look at the QueryProcessing paragraph [1]. I've described
> > why IndexingSpi doesn't fit us well.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
> >
> > On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
> > stephen.darlington@gridgain.com> wrote:
> >
> >> How does this fit with the current IndexingSpi? Superficially they
> appear
> >> to do very similar things?
> >>
> >> Regards,
> >> Stephen
> >>
> >>> On 6 Apr 2021, at 14:13, Maksim Timonin <ti...@gmail.com>
> wrote:
> >>>
> >>> Hi, Igniters!
> >>>
> >>> I'd like to propose a new feature - opportunity to query and create
> >> indexes
> >>> from public API.
> >>>
> >>> It will help in some cases, where:
> >>> 1. SQL is not applicable by design of user application;
> >>> 2. Where IndexScan is preferable than ScanQuery for performance
> reasons;
> >>> 3. Functional indexes are required.
> >>>
> >>> Also it'll be great to have a transactional support for such queries,
> >> like
> >>> the "select for update" query provides. But I don't dig there much. It
> >> will
> >>> be a next step if this API will be implemented.
> >>>
> >>> I've prepared an IEP-71 for that [1] with more details. Please share
> your
> >>> thoughts.
> >>>
> >>>
> >>> [1]
> >>>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
> >>
> >>
> >>
>
>
>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Stephen Darlington <st...@gridgain.com>.

Is this a replacement for IndexingSpi? Put bluntly, do we deprecate (and remove) it?

Or do you see them as complimentary?

> On 12 Apr 2021, at 11:29, Maksim Timonin <ti...@gmail.com> wrote:
> 
> Hi Stephen!
> 
> Please have a look at the QueryProcessing paragraph [1]. I've described
> why IndexingSpi doesn't fit us well.
> 
> [1]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
> 
> On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
> stephen.darlington@gridgain.com> wrote:
> 
>> How does this fit with the current IndexingSpi? Superficially they appear
>> to do very similar things?
>> 
>> Regards,
>> Stephen
>> 
>>> On 6 Apr 2021, at 14:13, Maksim Timonin <ti...@gmail.com> wrote:
>>> 
>>> Hi, Igniters!
>>> 
>>> I'd like to propose a new feature - opportunity to query and create
>> indexes
>>> from public API.
>>> 
>>> It will help in some cases, where:
>>> 1. SQL is not applicable by design of user application;
>>> 2. Where IndexScan is preferable than ScanQuery for performance reasons;
>>> 3. Functional indexes are required.
>>> 
>>> Also it'll be great to have a transactional support for such queries,
>> like
>>> the "select for update" query provides. But I don't dig there much. It
>> will
>>> be a next step if this API will be implemented.
>>> 
>>> I've prepared an IEP-71 for that [1] with more details. Please share your
>>> thoughts.
>>> 
>>> 
>>> [1]
>>> 
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
>> 
>> 
>>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Maksim Timonin <ti...@gmail.com>.

Hi Stephen!

Please have a look at the QueryProcessing paragraph [1]. I've described
why IndexingSpi doesn't fit us well.

[1]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing

On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
stephen.darlington@gridgain.com> wrote:

> How does this fit with the current IndexingSpi? Superficially they appear
> to do very similar things?
>
> Regards,
> Stephen
>
> > On 6 Apr 2021, at 14:13, Maksim Timonin <ti...@gmail.com> wrote:
> >
> > Hi, Igniters!
> >
> > I'd like to propose a new feature - opportunity to query and create
> indexes
> > from public API.
> >
> > It will help in some cases, where:
> > 1. SQL is not applicable by design of user application;
> > 2. Where IndexScan is preferable than ScanQuery for performance reasons;
> > 3. Functional indexes are required.
> >
> > Also it'll be great to have a transactional support for such queries,
> like
> > the "select for update" query provides. But I don't dig there much. It
> will
> > be a next step if this API will be implemented.
> >
> > I've prepared an IEP-71 for that [1] with more details. Please share your
> > thoughts.
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
>
>
>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Posted by Stephen Darlington <st...@gridgain.com>.

How does this fit with the current IndexingSpi? Superficially they appear to do very similar things?

Regards,
Stephen

> On 6 Apr 2021, at 14:13, Maksim Timonin <ti...@gmail.com> wrote:
> 
> Hi, Igniters!
> 
> I'd like to propose a new feature - opportunity to query and create indexes
> from public API.
> 
> It will help in some cases, where:
> 1. SQL is not applicable by design of user application;
> 2. Where IndexScan is preferable than ScanQuery for performance reasons;
> 3. Functional indexes are required.
> 
> Also it'll be great to have a transactional support for such queries, like
> the "select for update" query provides. But I don't dig there much. It will
> be a next step if this API will be implemented.
> 
> I've prepared an IEP-71 for that [1] with more details. Please share your
> thoughts.
> 
> 
> [1]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search