You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Gabriel Magno <ga...@gmail.com> on 2023/02/02 21:58:05 UTC

[Dense Vectors][Streaming] Store dense vector of a query in a streaming variable and use it for vector math

Hi.

I'm exploring the streaming expressions feature together with the Dense
Vectors feature introduced in Solr 9.

I am wondering rather it is possible to make a streaming expression that I
make a query for a specific ID then store its vector field into a variable,
then use this variable as an array to use the math vector operations from
the streaming processor.

For instance, this expression kind of works:
let(
    echo=True,
    vector_1=search(films, q="*:*", fq="id:"/en/finding_nemo"",
fl="film_vector"),
    vector_2=search(films, q="*:*", fq="id:"/en/bee_movie"",
fl="film_vector")
)

The result is this:
{
  "result-set": {
    "docs": [
      {
        "vector_1": [
          {"film_vector":
["-0.11665395","0.04247921","-0.13233364","0.52578413","-0.1739291",

"-0.01880563","-0.06670809","-0.11242808","0.09724514","-0.11909142"]}
        ],
        "vector_2": [
          {"film_vector":
["-0.14272659","0.13051921","-0.19087574","0.44983688","-0.21098459",

"0.0033124345","-0.008155139","-0.09109363","0.12401622","-0.12211737"]}
        ]
      },
      {"EOF": true, "RESPONSE_TIME": 7}
    ]
  }
}

The first problem is that the `vector_1` and `vector_2` variables area
actually not the vector itself, they are a list of documents, where each
document having a field name and its value. The second problem is that the
values of the dense vector seems to be strings (not doubles), sot they will
probably not work as expected for a numerical array.

I was wondering rather there is some streaming function or operator that I
could add in the streaming expression avoe that would allow me to return
something like this:
{
  "vector_1": [-0.11665395,0.04247921,-0.13233364,0.52578413,-0.1739291,
               -0.01880563,-0.06670809,-0.11242808,0.09724514,-0.11909142],
  "vector_2": [-0.14272659,0.13051921,-0.19087574,0.44983688,-0.21098459,
               0.0033124345,-0.008155139,-0.09109363,0.12401622,-0.12211737]
}

So, if that was possible, maybe I could do something like this (which is my
actual end goal):
let(
    vector_1=search(films, q="*:*", fq="id:"/en/finding_nemo"",
fl="film_vector"),
    vector_2=search(films, q="*:*", fq="id:"/en/bee_movie"",
fl="film_vector"),
    vector_final=ebeAdd(vector_1, vector_2)
)

The above streaming expression does not work, it returns the following
exception:

class org.apache.solr.client.solrj.io.Tuple cannot be cast to class
java.lang.Number (org.apache.solr.client.solrj.io.Tuple is in unnamed
module of loader org.eclipse.jetty.webapp.WebAppClassLoader @1e886a5b;
java.lang.Number is in module java.base of loader 'bootstrap')


Best,

--
Gabriel Magno