You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Alexander Lukyanchikov <al...@gmail.com> on 2023/07/18 21:59:25 UTC

Enabling concurrent search only for certain queries

Hi everyone,
We performed testing of the concurrent rewrite for knn vector queries in
Lucene 9.7 and the results look great, we see up to x9 improvement on large
datasets.

Our current implementation for intra-query concurrency relies on a single
IndexSearcher per index which is always configured with an executor. The
intention is to execute only heavy / long running queries in concurrent
mode, so we use either Collector or CollectorManager API to control this
behavior. But the concurrent rewrite in KnnVectorQuery is effectively
always enabled if the IndexSearcher is configured with an executor, so we
need to find another way to turn it on and off when needed.

Knowing that IndexSearcher#search(Query, Collector) is going to be removed
<https://issues.apache.org/jira/browse/LUCENE-10002> eventually, and a similar
change <https://github.com/apache/lucene/pull/632> was implemented for
DrillSideays, my understanding is that the long-term plan is to rely only
on the presence of the executor in IndexSearcher to select the
sequential/concurrent code path. Is this correct, or would people be open
to introducing an additional flag (e.g. in IndexSearch#search) to be able
to override the default behavior?

--
Regards,
Alex

Re: Enabling concurrent search only for certain queries

Posted by Adrien Grand <jp...@gmail.com>.
Hi Alexander,

It sounds likely that it will always be possible to pass an Executor
to IndexSearcher's constructor. So this sounds like a safe bet.

On Wed, Jul 19, 2023 at 7:22 AM Alexander Lukyanchikov
<al...@gmail.com> wrote:
>
> Hi Adrien,
>
> Yes, that can be done. I just wanted to make sure my understanding is correct and that's how the future API is going to look like before we do this refactoring. Thank you.
>
> --
> Regards,
> Alex
>
>
> On Tue, Jul 18, 2023 at 3:26 PM Adrien Grand <jp...@gmail.com> wrote:
>>
>> Hi Alexander,
>>
>> You mentioned that your current implementation relies on a single IndexSearcher. Could you have two instead? One that configures an executor for long running queries and another one that doesn't?
>>
>> For reference, IndexSearchers are cheap to create, it would be ok to create one per query if that helps.
>>
>>
>> Le mar. 18 juil. 2023, 23:59, Alexander Lukyanchikov <al...@gmail.com> a écrit :
>>>
>>> Hi everyone,
>>> We performed testing of the concurrent rewrite for knn vector queries in Lucene 9.7 and the results look great, we see up to x9 improvement on large datasets.
>>>
>>> Our current implementation for intra-query concurrency relies on a single IndexSearcher per index which is always configured with an executor. The intention is to execute only heavy / long running queries in concurrent mode, so we use either Collector or CollectorManager API to control this behavior. But the concurrent rewrite in KnnVectorQuery is effectively always enabled if the IndexSearcher is configured with an executor, so we need to find another way to turn it on and off when needed.
>>>
>>> Knowing that IndexSearcher#search(Query, Collector) is going to be removed eventually, and a similar change was implemented for DrillSideays, my understanding is that the long-term plan is to rely only on the presence of the executor in IndexSearcher to select the sequential/concurrent code path. Is this correct, or would people be open to introducing an additional flag (e.g. in IndexSearch#search) to be able to override the default behavior?
>>>
>>> --
>>> Regards,
>>> Alex
>>>


-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Enabling concurrent search only for certain queries

Posted by Alexander Lukyanchikov <al...@gmail.com>.
Hi Adrien,

Yes, that can be done. I just wanted to make sure my understanding is
correct and that's how the future API is going to look like before we do
this refactoring. Thank you.

--
Regards,
Alex


On Tue, Jul 18, 2023 at 3:26 PM Adrien Grand <jp...@gmail.com> wrote:

> Hi Alexander,
>
> You mentioned that your current implementation relies on a single
> IndexSearcher. Could you have two instead? One that configures an executor
> for long running queries and another one that doesn't?
>
> For reference, IndexSearchers are cheap to create, it would be ok to
> create one per query if that helps.
>
>
> Le mar. 18 juil. 2023, 23:59, Alexander Lukyanchikov <
> alexanderlukyanchikov@gmail.com> a écrit :
>
>> Hi everyone,
>> We performed testing of the concurrent rewrite for knn vector queries in
>> Lucene 9.7 and the results look great, we see up to x9 improvement on large
>> datasets.
>>
>> Our current implementation for intra-query concurrency relies on a single
>> IndexSearcher per index which is always configured with an executor. The
>> intention is to execute only heavy / long running queries in concurrent
>> mode, so we use either Collector or CollectorManager API to control this
>> behavior. But the concurrent rewrite in KnnVectorQuery is effectively
>> always enabled if the IndexSearcher is configured with an executor, so we
>> need to find another way to turn it on and off when needed.
>>
>> Knowing that IndexSearcher#search(Query, Collector) is going to be
>> removed <https://issues.apache.org/jira/browse/LUCENE-10002> eventually,
>> and a similar change <https://github.com/apache/lucene/pull/632> was
>> implemented for DrillSideays, my understanding is that the long-term plan
>> is to rely only on the presence of the executor in IndexSearcher to select
>> the sequential/concurrent code path. Is this correct, or would people be
>> open to introducing an additional flag (e.g. in IndexSearch#search) to be
>> able to override the default behavior?
>>
>> --
>> Regards,
>> Alex
>>
>>

Re: Enabling concurrent search only for certain queries

Posted by Adrien Grand <jp...@gmail.com>.
Hi Alexander,

You mentioned that your current implementation relies on a single
IndexSearcher. Could you have two instead? One that configures an executor
for long running queries and another one that doesn't?

For reference, IndexSearchers are cheap to create, it would be ok to create
one per query if that helps.


Le mar. 18 juil. 2023, 23:59, Alexander Lukyanchikov <
alexanderlukyanchikov@gmail.com> a écrit :

> Hi everyone,
> We performed testing of the concurrent rewrite for knn vector queries in
> Lucene 9.7 and the results look great, we see up to x9 improvement on large
> datasets.
>
> Our current implementation for intra-query concurrency relies on a single
> IndexSearcher per index which is always configured with an executor. The
> intention is to execute only heavy / long running queries in concurrent
> mode, so we use either Collector or CollectorManager API to control this
> behavior. But the concurrent rewrite in KnnVectorQuery is effectively
> always enabled if the IndexSearcher is configured with an executor, so we
> need to find another way to turn it on and off when needed.
>
> Knowing that IndexSearcher#search(Query, Collector) is going to be removed
> <https://issues.apache.org/jira/browse/LUCENE-10002> eventually, and a similar
> change <https://github.com/apache/lucene/pull/632> was implemented for
> DrillSideays, my understanding is that the long-term plan is to rely only
> on the presence of the executor in IndexSearcher to select the
> sequential/concurrent code path. Is this correct, or would people be open
> to introducing an additional flag (e.g. in IndexSearch#search) to be able
> to override the default behavior?
>
> --
> Regards,
> Alex
>
>