You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Derek C <de...@hssl.ie> on 2022/08/19 10:08:14 UTC

SOLR 9.0 - Mixing KNN and traditional queries - possible?

Hi all,

I have a collection with about 2.5 Million documents.  I've been
experimenting with the KNN dense vector search  (The SOLR embedding
approximate nearest neighbour search) query search that's available in SOLR
9.0.  It works really well - it's very fast (if the specified k-nearest
results are not too big).

The only thing is the KNN search seems to be entirely exclusive to any
other query parameters.  So if I add in another query item (like
"is_enabled:true" or whatever) it's like the KNN search is just a filter
query so that *both* the KNN search AND the traditional search are done and
the results intersected and the result returned (and I think this is what
the docs say about knn as a filter query:
https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html
).

This is a bit of a problem because I want to be able to use "nearest
neighbour" with other refine fields ("is_enabled:true" or "color:"red"
would be examples).

Is there any way to mix the queries?  Or, if not right now, do you think
it'll be coming to later versions of SOLR ?

BTW I have also experimented with taking every float value from the
embedding vector and putting them into individual fields (and I have 512
floats in the embedding!).  Then I can use the dist() function for sorting
(so now it's a "nearest neighbour" rather than an "approximate nearest
neighbour").  This works 100% but if I query 2.5M documents it's too slow
(but if I apply query that gets me down to < 50K documents it works fine so
this is usable solution in certain situations.

Thanks for any help or info on this !

all the best,

Derek


-
Telephone (IRL): 086 856 3823
Telephone (US): (650) 443 8285
Skype: dconnrt
Email: derek@hssl.ie


*Disclaimer:* This email and any files transmitted with it are confidential
and intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please delete it
(if you are not the intended recipient you are notified that disclosing,
copying, distributing or taking any action in reliance on the contents of
this information is strictly prohibited).
*Warning*: Although HSSL have taken reasonable precautions to ensure no
viruses are present in this email, HSSL cannot accept responsibility for
any loss or damage arising from the use of this email or attachments.
P For the Environment, please only print this email if necessary.

Re: SOLR 9.0 - Mixing KNN and traditional queries - possible?

Posted by Derek C <de...@hssl.ie>.
Thanks V much for that information Joel

Actually we are just now getting by quite well with KNN re-ranking but I
don't really understand how topK works with re-ranking a normal query - I'm
picking pretty high topK numbers (if I don't it's like the re-ranking just
stops and it drops to natural database order at far lower numbers than the
specified topK).  In any case the SOLR KNN search is great - I was playing
around with Spotify Annoy before this but having approximate nearest
neighbour search in SOLR (especially when you are SOLR based) is far better.

Derek



On Mon, Aug 22, 2022 at 5:40 PM Joel Bernstein <jo...@gmail.com> wrote:

> This appears to be what you are looking for:
>
> https://issues.apache.org/jira/browse/SOLR-16246
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Fri, Aug 19, 2022 at 6:10 AM Derek C <de...@hssl.ie> wrote:
>
> > Hi all,
> >
> > I have a collection with about 2.5 Million documents.  I've been
> > experimenting with the KNN dense vector search  (The SOLR embedding
> > approximate nearest neighbour search) query search that's available in
> SOLR
> > 9.0.  It works really well - it's very fast (if the specified k-nearest
> > results are not too big).
> >
> > The only thing is the KNN search seems to be entirely exclusive to any
> > other query parameters.  So if I add in another query item (like
> > "is_enabled:true" or whatever) it's like the KNN search is just a filter
> > query so that *both* the KNN search AND the traditional search are done
> and
> > the results intersected and the result returned (and I think this is what
> > the docs say about knn as a filter query:
> >
> >
> https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html
> > ).
> >
> > This is a bit of a problem because I want to be able to use "nearest
> > neighbour" with other refine fields ("is_enabled:true" or "color:"red"
> > would be examples).
> >
> > Is there any way to mix the queries?  Or, if not right now, do you think
> > it'll be coming to later versions of SOLR ?
> >
> > BTW I have also experimented with taking every float value from the
> > embedding vector and putting them into individual fields (and I have 512
> > floats in the embedding!).  Then I can use the dist() function for
> sorting
> > (so now it's a "nearest neighbour" rather than an "approximate nearest
> > neighbour").  This works 100% but if I query 2.5M documents it's too slow
> > (but if I apply query that gets me down to < 50K documents it works fine
> so
> > this is usable solution in certain situations.
> >
> > Thanks for any help or info on this !
> >
> > all the best,
> >
> > Derek
> >
> >
> > -
> > Telephone (IRL): 086 856 3823
> > Telephone (US): (650) 443 8285
> > Skype: dconnrt
> > Email: derek@hssl.ie
> >
> >
> > *Disclaimer:* This email and any files transmitted with it are
> confidential
> > and intended solely for the use of the individual or entity to whom they
> > are addressed. If you have received this email in error please delete it
> > (if you are not the intended recipient you are notified that disclosing,
> > copying, distributing or taking any action in reliance on the contents of
> > this information is strictly prohibited).
> > *Warning*: Although HSSL have taken reasonable precautions to ensure no
> > viruses are present in this email, HSSL cannot accept responsibility for
> > any loss or damage arising from the use of this email or attachments.
> > P For the Environment, please only print this email if necessary.
> >
>


-- 
-- 
Derek Conniffe
Harvey Software Systems Ltd T/A HSSL
Telephone (IRL): 086 856 3823
Telephone (US): (650) 443 8285
Skype: dconnrt
Email: derek@hssl.ie


*Disclaimer:* This email and any files transmitted with it are confidential
and intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please delete it
(if you are not the intended recipient you are notified that disclosing,
copying, distributing or taking any action in reliance on the contents of
this information is strictly prohibited).
*Warning*: Although HSSL have taken reasonable precautions to ensure no
viruses are present in this email, HSSL cannot accept responsibility for
any loss or damage arising from the use of this email or attachments.
P For the Environment, please only print this email if necessary.

Re: SOLR 9.0 - Mixing KNN and traditional queries - possible?

Posted by Joel Bernstein <jo...@gmail.com>.
This appears to be what you are looking for:

https://issues.apache.org/jira/browse/SOLR-16246


Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Aug 19, 2022 at 6:10 AM Derek C <de...@hssl.ie> wrote:

> Hi all,
>
> I have a collection with about 2.5 Million documents.  I've been
> experimenting with the KNN dense vector search  (The SOLR embedding
> approximate nearest neighbour search) query search that's available in SOLR
> 9.0.  It works really well - it's very fast (if the specified k-nearest
> results are not too big).
>
> The only thing is the KNN search seems to be entirely exclusive to any
> other query parameters.  So if I add in another query item (like
> "is_enabled:true" or whatever) it's like the KNN search is just a filter
> query so that *both* the KNN search AND the traditional search are done and
> the results intersected and the result returned (and I think this is what
> the docs say about knn as a filter query:
>
> https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html
> ).
>
> This is a bit of a problem because I want to be able to use "nearest
> neighbour" with other refine fields ("is_enabled:true" or "color:"red"
> would be examples).
>
> Is there any way to mix the queries?  Or, if not right now, do you think
> it'll be coming to later versions of SOLR ?
>
> BTW I have also experimented with taking every float value from the
> embedding vector and putting them into individual fields (and I have 512
> floats in the embedding!).  Then I can use the dist() function for sorting
> (so now it's a "nearest neighbour" rather than an "approximate nearest
> neighbour").  This works 100% but if I query 2.5M documents it's too slow
> (but if I apply query that gets me down to < 50K documents it works fine so
> this is usable solution in certain situations.
>
> Thanks for any help or info on this !
>
> all the best,
>
> Derek
>
>
> -
> Telephone (IRL): 086 856 3823
> Telephone (US): (650) 443 8285
> Skype: dconnrt
> Email: derek@hssl.ie
>
>
> *Disclaimer:* This email and any files transmitted with it are confidential
> and intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please delete it
> (if you are not the intended recipient you are notified that disclosing,
> copying, distributing or taking any action in reliance on the contents of
> this information is strictly prohibited).
> *Warning*: Although HSSL have taken reasonable precautions to ensure no
> viruses are present in this email, HSSL cannot accept responsibility for
> any loss or damage arising from the use of this email or attachments.
> P For the Environment, please only print this email if necessary.
>