You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "mosh (JIRA)" <ji...@apache.org> on 2018/10/21 07:32:00 UTC

[jira] [Comment Edited] (SOLR-12890) Vector Search in Solr (Umbrella Issue)

    [ https://issues.apache.org/jira/browse/SOLR-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658118#comment-16658118 ] 

mosh edited comment on SOLR-12890 at 10/21/18 7:31 AM:
-------------------------------------------------------

We have been experimenting with this new use case to query vectors inside Solr,
having run using this [POC|https://github.com/moshebla/solr-vector-scoring].
This has worked for us since we have an algorithm that runs outside of Solr, which generates vectors for different inputs in our data pipeline,
and sends the enriched documents to Solr for indexing.
The LSH hash is then calculated in index time, and the vector data is encoded to binary format in either sparse or dense form(this is configurable).

The query parser is passed a certain vector, and the LSH hash for the provided  vector is then calculated and documents which contain a similar vector are queried. The user can then choose to run on the topNDocs a full cosine similarity(Or any other, provided we add different scorers), to get more precise scores for the results.

Hopefully this use case could be extended, optimized, and ultimately be included in Solr.


was (Author: moshebla):
We have been experimenting with this new use case to query vectors inside Solr,
having run using this [POC|https://github.com/moshebla/solr-vector-scoring].
This has worked for us since we have an algorithm that runs outside of Solr, which generates vectors for different inputs in our data pipeline,
and sends the enriched documents to Solr for indexing.
The LSH hash is then calculated in index time, and the vector data is encoded to binary format in either sparse or dense form(this is configurable).
Perhaps this use case could be extended and optimized, and ultimately be included in Solr.

> Vector Search in Solr (Umbrella Issue)
> --------------------------------------
>
>                 Key: SOLR-12890
>                 URL: https://issues.apache.org/jira/browse/SOLR-12890
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: mosh
>            Priority: Major
>
> We have recently come across a need to index documents containing vectors using solr, and have even worked on a small POC. We used an URP to calculate the LSH(we chose to use the superbit algorithm, but the code is designed in a way the algorithm picked can be easily chagned), and stored the vector in either sparse or dense forms, in a binary field.
> Perhaps an addition of an LSH URP in conjunction with a query parser that uses the same properties to calculate LSH(or maybe ktree, or some other algorithm all together) should be considered as a Solr feature?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org