You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mathias Lux <ml...@itec.uni-klu.ac.at> on 2013/09/16 13:24:33 UTC

Re-Ranking results based on DocValues with custom function.

Hi!

I'm having quite an index with a lot of text and some binary data in
the documents (numeric vectors of arbitrary size with associated
dissimilarity functions). What I want to do is to search using common
text search and then (optionally) re-rank using some custom function
like

http://localhost:8983/solr/select?q=*:*&sort=myCustomFunction(var1) asc

I've seen that there are hooks in solrconfig.xml, but I did not find
an example or some documentation. I'd be most grateful if anyone could
either point me to one or give me a hint for another way to go :)

Btw. Using just the DocValues for search is handled by a custom
RequestHandler, which works great, but using text as a main search
feature, and my DocValues for re-ranking,  I'd rather just add a
function for sorting and use the current, stable and well performing
request handler.

cheers,
Mathias

ps. a demo of the current system is available at:
http://demo-itec.uni-klu.ac.at/liredemo/

-- 
Dr. Mathias Lux
Assistant Professor, Klagenfurt University, Austria
http://tinyurl.com/mlux-itec

Re: Re-Ranking results based on DocValues with custom function.

Posted by Mathias Lux <ml...@itec.uni-klu.ac.at>.
Got it! Just for you to share ... and maybe for inclusion in the Java
API docs of ValueSource :)

For sorting one needs to implement the method

public double doubleVal(int) of the class ValueSource

then it works like a charm.

cheers,
  Mathias

On Tue, Sep 17, 2013 at 6:28 PM, Chris Hostetter
<ho...@fucit.org> wrote:
>
> : It basically allows for searching for text (which is associated to an
> : image) in an index and then getting the distance to a sample image
> : (base64 encoded byte[] array) based on one of five different low level
> : content based features stored as DocValues.
>
> very cool.
>
> : So there one little tiny question I still have ;) When I'm trying to
> : do a "sort" I'm getting
> :
> : "msg": "sort param could not be parsed as a query, and is not a field
> : that exists in the index:
> : lirefunc(cl_hi,FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=)",
> :
> : for the call http://localhost:9000/solr/lire/select?q=*%3A*&sort=lirefunc(cl_hi%2CFQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA%3D)+asc&fl=id%2Ctitle%2Clirefunc(cl_hi%2CFQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA%3D)&wt=json&indent=true
>
> Hmmm...
>
> i think the crux of the issue is your string literal.  function parsing
> tries to make live easy for you by not requiring string literals to be
> quoted unless they conflict with other function names or field names
> etc....  on top of that the sort parsing code is kind of hueristic based
> (because it has to account for both functions or field names or wildcards,
> followed by other sort clauses, etc...) so in that context the special
> characters like '=' in your base64 string literal might be confusing hte
> hueristics.
>
> can you try to quote the string literal it and see if that works?
>
> For example, when i try using strdist with your base64 string in a sort
> param using the example configs i get the same error...
>
> http://localhost:8983/solr/select?q=*:*&sort=strdist%28name,FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=,jw%29+asc
>
> but if i quote the string literal it works fine...
>
> http://localhost:8983/solr/select?q=*:*&sort=strdist%28name,%27FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=%27,jw%29+asc
>
>
>
> -Hoss



-- 
Dr. Mathias Lux
Assistant Professor, Klagenfurt University, Austria
http://tinyurl.com/mlux-itec

Re: Re-Ranking results based on DocValues with custom function.

Posted by Chris Hostetter <ho...@fucit.org>.
: It basically allows for searching for text (which is associated to an
: image) in an index and then getting the distance to a sample image
: (base64 encoded byte[] array) based on one of five different low level
: content based features stored as DocValues.

very cool.

: So there one little tiny question I still have ;) When I'm trying to
: do a "sort" I'm getting
: 
: "msg": "sort param could not be parsed as a query, and is not a field
: that exists in the index:
: lirefunc(cl_hi,FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=)",
: 
: for the call http://localhost:9000/solr/lire/select?q=*%3A*&sort=lirefunc(cl_hi%2CFQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA%3D)+asc&fl=id%2Ctitle%2Clirefunc(cl_hi%2CFQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA%3D)&wt=json&indent=true

Hmmm...

i think the crux of the issue is your string literal.  function parsing 
tries to make live easy for you by not requiring string literals to be 
quoted unless they conflict with other function names or field names 
etc....  on top of that the sort parsing code is kind of hueristic based 
(because it has to account for both functions or field names or wildcards, 
followed by other sort clauses, etc...) so in that context the special 
characters like '=' in your base64 string literal might be confusing hte 
hueristics.

can you try to quote the string literal it and see if that works?

For example, when i try using strdist with your base64 string in a sort 
param using the example configs i get the same error...

http://localhost:8983/solr/select?q=*:*&sort=strdist%28name,FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=,jw%29+asc

but if i quote the string literal it works fine...

http://localhost:8983/solr/select?q=*:*&sort=strdist%28name,%27FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=%27,jw%29+asc



-Hoss

Re: Re-Ranking results based on DocValues with custom function.

Posted by Mathias Lux <ml...@itec.uni-klu.ac.at>.
Hi!

Thanks for the directions! I got it up and running with a custom
ValueSourceParser: http://pastebin.com/cz1rJn4A and a custom
ValueSource: http://pastebin.com/j8mhA8e0

It basically allows for searching for text (which is associated to an
image) in an index and then getting the distance to a sample image
(base64 encoded byte[] array) based on one of five different low level
content based features stored as DocValues.

A sample result is here: http://pastebin.com/V7kL3DJh

So there one little tiny question I still have ;) When I'm trying to
do a "sort" I'm getting

"msg": "sort param could not be parsed as a query, and is not a field
that exists in the index:
lirefunc(cl_hi,FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=)",

for the call http://localhost:9000/solr/lire/select?q=*%3A*&sort=lirefunc(cl_hi%2CFQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA%3D)+asc&fl=id%2Ctitle%2Clirefunc(cl_hi%2CFQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA%3D)&wt=json&indent=true

cheers,
  Mathias

On Tue, Sep 17, 2013 at 1:01 AM, Chris Hostetter
<ho...@fucit.org> wrote:
> : dissimilarity functions). What I want to do is to search using common
> : text search and then (optionally) re-rank using some custom function
> : like
> :
> : http://localhost:8983/solr/select?q=*:*&sort=myCustomFunction(var1) asc
>
> can you describe what you want your custom function to look like? it may
> already be possible using the existing functions provided out of hte box -
> just neeed to combine them to build up the mathc expression...
>
> https://wiki.apache.org/solr/FunctionQuery
>
> ...if you really want to write your own, just implement ValueSourceParser
> and register it in solrconfig.xml...
>
> https://wiki.apache.org/solr/SolrPlugins#ValueSourceParser
>
> : I've seen that there are hooks in solrconfig.xml, but I did not find
> : an example or some documentation. I'd be most grateful if anyone could
> : either point me to one or give me a hint for another way to go :)
>
> when writing a custom plugin like this, the best thing to do is look at
> the existing examples of that plugin.  almost all of hte built in
> ValueSourceParsers are really trivial, and can be found in tiny anonymous
> classes right inside the ValueSourceParser.java...
>
> For example, the function ot divide the results of two other fnctions...
>
>     addParser("div", new ValueSourceParser() {
>       @Override
>       public ValueSource parse(FunctionQParser fp) throws SyntaxError {
>         ValueSource a = fp.parseValueSource();
>         ValueSource b = fp.parseValueSource();
>         return new DivFloatFunction(a, b);
>       }
>     });
>
> ..or, if you were trying to bundle that up in your own plugin jar and
> register it in solrconfig.xml, you might write it something like...
>
> public class DivideValueSourceParser extends ValueSourceParser {
>   public DivideValueSourceParser() { }
>   public ValueSource parse(FunctionQParser fp) throws SyntaxError {
>     ValueSource a = fp.parseValueSource();
>     ValueSource b = fp.parseValueSource();
>     return new DivFloatFunction(a, b);
>   }
> }
>
> and then register it as...
>
> <valueSourceParser name="div" class="com.you.DivideValueSourceParser" />
>
>
> depending on your needs, you may also want to write a custom ValueSource
> implementation (ie: instead of DivFloatFunction above) in which case,
> again, the best examples to look at are all of the existing ValueSource
> functions...
>
> https://lucene.apache.org/core/4_4_0/queries/org/apache/lucene/queries/function/ValueSource.html
>
>
> -Hoss



-- 
Dr. Mathias Lux
Assistant Professor, Klagenfurt University, Austria
http://tinyurl.com/mlux-itec

Re: Re-Ranking results based on DocValues with custom function.

Posted by Chris Hostetter <ho...@fucit.org>.
: dissimilarity functions). What I want to do is to search using common
: text search and then (optionally) re-rank using some custom function
: like
: 
: http://localhost:8983/solr/select?q=*:*&sort=myCustomFunction(var1) asc

can you describe what you want your custom function to look like? it may 
already be possible using the existing functions provided out of hte box - 
just neeed to combine them to build up the mathc expression...

https://wiki.apache.org/solr/FunctionQuery

...if you really want to write your own, just implement ValueSourceParser 
and register it in solrconfig.xml...

https://wiki.apache.org/solr/SolrPlugins#ValueSourceParser

: I've seen that there are hooks in solrconfig.xml, but I did not find
: an example or some documentation. I'd be most grateful if anyone could
: either point me to one or give me a hint for another way to go :)

when writing a custom plugin like this, the best thing to do is look at 
the existing examples of that plugin.  almost all of hte built in 
ValueSourceParsers are really trivial, and can be found in tiny anonymous 
classes right inside the ValueSourceParser.java...

For example, the function ot divide the results of two other fnctions...

    addParser("div", new ValueSourceParser() {
      @Override
      public ValueSource parse(FunctionQParser fp) throws SyntaxError {
        ValueSource a = fp.parseValueSource();
        ValueSource b = fp.parseValueSource();
        return new DivFloatFunction(a, b);
      }
    });

..or, if you were trying to bundle that up in your own plugin jar and 
register it in solrconfig.xml, you might write it something like...

public class DivideValueSourceParser extends ValueSourceParser {
  public DivideValueSourceParser() { }
  public ValueSource parse(FunctionQParser fp) throws SyntaxError {
    ValueSource a = fp.parseValueSource();
    ValueSource b = fp.parseValueSource();
    return new DivFloatFunction(a, b);
  }
}

and then register it as...

<valueSourceParser name="div" class="com.you.DivideValueSourceParser" />


depending on your needs, you may also want to write a custom ValueSource 
implementation (ie: instead of DivFloatFunction above) in which case, 
again, the best examples to look at are all of the existing ValueSource 
functions...

https://lucene.apache.org/core/4_4_0/queries/org/apache/lucene/queries/function/ValueSource.html


-Hoss