You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@solr.apache.org by "Christine Poerschke (Jira)" <ji...@apache.org> on 2023/02/23 18:31:00 UTC

[jira] [Updated] (SOLR-16651) Optimize execution of KNN sub-query to apply it only on documents remaining after the main query

     [ https://issues.apache.org/jira/browse/SOLR-16651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christine Poerschke updated SOLR-16651:
---------------------------------------
    Security:     (was: Public)

> Optimize execution of KNN sub-query to apply it only on documents remaining after the main query
> ------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-16651
>                 URL: https://issues.apache.org/jira/browse/SOLR-16651
>             Project: Solr
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 9.1.1
>            Reporter: Gabriel Magno
>            Priority: Major
>              Labels: knn, optimization, query, vector
>
> Solr 9.1 introduced pre-filtering for KNN queries, which is great and is working fine when the KNN is the main query.
> I was wondering rather it would be possible to make something similar, but for the case of KNN being a sub-query instead of the main query (q). Let me show an example use case with the films example.
> I want to query for films with “the” in the name, and filter only films with genre “Drama”, then calculate the similarity of these films vectors according to my target vector. The idea is making a simple lexical query, and using the KNN sub-query to calculate similarities (not really sorting by the similarity necessarily). Here is an example query:
>  * URL: [http://localhost:8983/solr/#/films/query?q=name:the&fq=genre:Drama&my_similarity=%7B!knn%20f%3Dfilm_vector%20topK%3D10000%7D%5B0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0%5D&fl=*,$my_similarity]
>  * Params:
>  ** {*}q{*}=name:the
>  ** {*}fq{*}=genre:Drama
>  ** {*}my_similarity{*}=\{!knn f=film_vector topK=10000}[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
>  ** {*}fl{*}=*,$my_similarity
> This query works fine, the problem is that the `my_similarity` subquery runs for all of the 1,100 film documents, instead of running only for the 51 that are relevant for the query. For a small collection like this it does not make a difference, but I have a collection with 12 million documents that makes queries similar like this to run very slow, even tough the retrieval being small.
> I tried using the cache and cost parameters to "force" the KNN sub-query running after the main query (`\{!knn cache=false cost=101 f=film_vector topK=10000}[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]`), but it does not work (I guess the PostFilter is not implemented for KNN).
> This issue might be related to the fix of the StackOverflow bug of frange with KNN (https://issues.apache.org/jira/browse/SOLR-16567).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org