You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Trey Grainger (Jira)" <ji...@apache.org> on 2019/10/11 05:52:00 UTC

[jira] [Updated] (SOLR-13836) Streaming Expression Query Parser

     [ https://issues.apache.org/jira/browse/SOLR-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Trey Grainger updated SOLR-13836:
---------------------------------
    Description: 
It is currently possible to hit the search handler in a streaming expression ("search(...)"), but it is not currently possible to invoke a streaming expression from within a regular search within the search handler. In some cases, it would be useful to leverage the power of streaming expressions to generate a result set and then join that result set with a normal set of search results.

This isn't expected to be particularly efficient for high cardinality streaming expression results, but it would be pretty powerful feature that could enable a bunch of use cases that aren't possible today within a normal search.
h2. Example:

*Docs:*

{code:java}
curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/food_collection/update?commit=true  --data-binary '
[
{"id": "1", "name_s":"donut","vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]},
{"id": "2", "name_s":"apple juice","vector_fs":[1.0,5.0,0.0,0.0,0.0,4.0,4.0,3.0]},
{"id": "3", "name_s":"cappuccino","vector_fs":[0.0,5.0,3.0,0.0,4.0,1.0,2.0,3.0]},
{"id": "4", "name_s":"cheese pizza","vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]},
{"id": "5", "name_s":"green tea","vector_fs":[0.0,5.0,0.0,0.0,2.0,1.0,1.0,5.0]},
{"id": "6", "name_s":"latte","vector_fs":[0.0,5.0,4.0,0.0,4.0,1.0,3.0,3.0]},
{"id": "7", "name_s":"soda","vector_fs":[0.0,5.0,0.0,0.0,3.0,5.0,5.0,0.0]},
{"id": "8", "name_s":"cheese bread sticks","vector_fs":[5.0,0.0,4.0,5.0,0.0,1.0,4.0,2.0]},
{"id": "9", "name_s":"water","vector_fs":[0.0,5.0,0.0,0.0,0.0,0.0,0.0,5.0]},
{"id": "10", "name_s":"cinnamon bread sticks","vector_fs":[5.0,0.0,1.0,5.0,0.0,3.0,4.0,2.0]}
]
{code}

 

*Query:*
{code:java}
http://localhost:8983/solr/food/select?q=*:*&fq=\{!streaming_expression}top(select(search(food,%20q=%22*:*%22,%20fl=%22id,vector_fs%22,%20sort=%22id%20asc%22),%20cosineSimilarity(vector_fs,%20array(5.1,0.0,1.0,5.0,0.0,4.0,5.0,1.0))%20as%20cos,%20id),%20n=5,%20sort=%22cos%20desc%22)&fl=id,name_s
{code}

 

*Response:*
{code:java}
{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":7,
    "params":{
      "q":"*:*",
      "fl":"id,name_s",
      "fq":"{!streaming_expression}top(select(search(food, q=\"*:*\", fl=\"id,vector_fs\", sort=\"id asc\"), cosineSimilarity(vector_fs, array(5.2,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as cos, id), n=5, sort=\"cos desc\")"}},
  "response":{"numFound":5,"start":0,"docs":[
      {
        "name_s":"donut",
        "id":"1"},
      {
        "name_s":"apple juice",
        "id":"2"},
      {
        "name_s":"cheese pizza",
        "id":"4"},
      {
        "name_s":"cheese bread sticks",
        "id":"8"},
      {
        "name_s":"cinnamon bread sticks",
        "id":"10"}]
  }}
{code}


The current implementation also supports the following additional parameters:
 *f*: (optional) The field name from the streaming expression containing the document ids upon which to filter. Defaults to the same uniqueKey field name from your documents. 
 *method: (optional) Any of termsFilter (default), booleanQuery, automaton, docValuesTermsFilter.

The method may go away, especially if we find a more efficient way to join the stream to the main query doc set.
 

  was:
It is currently possible to hit the search handler in a streaming expression ("search(...)"), but it is not currently possible to invoke a streaming expression from within a regular search within the search handler. In some cases, it would be useful to leverage the power of streaming expressions to generate a result set and then join that result set with a normal set of search results.

This isn't expected to be particularly efficient for high cardinality streaming expression results, but it would be pretty powerful feature that could enable a bunch of use cases that aren't possible today within a normal search.


> Streaming Expression Query Parser
> ---------------------------------
>
>                 Key: SOLR-13836
>                 URL: https://issues.apache.org/jira/browse/SOLR-13836
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Trey Grainger
>            Priority: Minor
>
> It is currently possible to hit the search handler in a streaming expression ("search(...)"), but it is not currently possible to invoke a streaming expression from within a regular search within the search handler. In some cases, it would be useful to leverage the power of streaming expressions to generate a result set and then join that result set with a normal set of search results.
> This isn't expected to be particularly efficient for high cardinality streaming expression results, but it would be pretty powerful feature that could enable a bunch of use cases that aren't possible today within a normal search.
> h2. Example:
> *Docs:*
> {code:java}
> curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/food_collection/update?commit=true  --data-binary '
> [
> {"id": "1", "name_s":"donut","vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]},
> {"id": "2", "name_s":"apple juice","vector_fs":[1.0,5.0,0.0,0.0,0.0,4.0,4.0,3.0]},
> {"id": "3", "name_s":"cappuccino","vector_fs":[0.0,5.0,3.0,0.0,4.0,1.0,2.0,3.0]},
> {"id": "4", "name_s":"cheese pizza","vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]},
> {"id": "5", "name_s":"green tea","vector_fs":[0.0,5.0,0.0,0.0,2.0,1.0,1.0,5.0]},
> {"id": "6", "name_s":"latte","vector_fs":[0.0,5.0,4.0,0.0,4.0,1.0,3.0,3.0]},
> {"id": "7", "name_s":"soda","vector_fs":[0.0,5.0,0.0,0.0,3.0,5.0,5.0,0.0]},
> {"id": "8", "name_s":"cheese bread sticks","vector_fs":[5.0,0.0,4.0,5.0,0.0,1.0,4.0,2.0]},
> {"id": "9", "name_s":"water","vector_fs":[0.0,5.0,0.0,0.0,0.0,0.0,0.0,5.0]},
> {"id": "10", "name_s":"cinnamon bread sticks","vector_fs":[5.0,0.0,1.0,5.0,0.0,3.0,4.0,2.0]}
> ]
> {code}
>  
> *Query:*
> {code:java}
> http://localhost:8983/solr/food/select?q=*:*&fq=\{!streaming_expression}top(select(search(food,%20q=%22*:*%22,%20fl=%22id,vector_fs%22,%20sort=%22id%20asc%22),%20cosineSimilarity(vector_fs,%20array(5.1,0.0,1.0,5.0,0.0,4.0,5.0,1.0))%20as%20cos,%20id),%20n=5,%20sort=%22cos%20desc%22)&fl=id,name_s
> {code}
>  
> *Response:*
> {code:java}
> {
>   "responseHeader":{
>     "zkConnected":true,
>     "status":0,
>     "QTime":7,
>     "params":{
>       "q":"*:*",
>       "fl":"id,name_s",
>       "fq":"{!streaming_expression}top(select(search(food, q=\"*:*\", fl=\"id,vector_fs\", sort=\"id asc\"), cosineSimilarity(vector_fs, array(5.2,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as cos, id), n=5, sort=\"cos desc\")"}},
>   "response":{"numFound":5,"start":0,"docs":[
>       {
>         "name_s":"donut",
>         "id":"1"},
>       {
>         "name_s":"apple juice",
>         "id":"2"},
>       {
>         "name_s":"cheese pizza",
>         "id":"4"},
>       {
>         "name_s":"cheese bread sticks",
>         "id":"8"},
>       {
>         "name_s":"cinnamon bread sticks",
>         "id":"10"}]
>   }}
> {code}
> The current implementation also supports the following additional parameters:
>  *f*: (optional) The field name from the streaming expression containing the document ids upon which to filter. Defaults to the same uniqueKey field name from your documents. 
>  *method: (optional) Any of termsFilter (default), booleanQuery, automaton, docValuesTermsFilter.
> The method may go away, especially if we find a more efficient way to join the stream to the main query doc set.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org