You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Sokolov <ms...@safaribooksonline.com> on 2013/11/18 17:55:07 UTC

getting matching term count for a query

Some of our customers want to display a "number of matches" score next 
to each search result.  I think what they want is to list the number of 
matches that will be displayed when the entire document is highlighted.  
But this can be slow to do for every search result (some documents can 
be very large), so what we'd like to do is to count the number of terms 
that match the query for each document, and display that.

It looks like Solr's function query has some support for this - I see 
the termfreq function, for example.  My question is:

1. Is it possible to execute the query as usual, retrieving document 
stored field values, and also to run a function, and return the result 
as the value of a computed "pseudo-field"?

2. Is there an existing function known to the function query parser that 
counts the total number of occurrences of all terms in the query (for 
the current hit document)?

-Mike

Re: NPE in function query, was: Re: getting matching term count for a query

Posted by Michael Sokolov <ms...@safaribooksonline.com>.
OK, nevermind - I was the one adding the null --- working example 
below.  Last question -- does anybody know if it's possible to rewrite 
MultiTermQueries in this context?  I don't see how to get a hold of an 
IndexReader to do that, but if it were possible, it would enable this 
function to handle wildcards, etc.

-Mike


/**
* Defines the Solr function hitcount([field, ...]) which returns the total
* of termfreq(term) for all terms in the query. The arguments specify
* fields whose terms are to be counted. If no arguments are passed, terms
* from every field are counted.
*/
public class HitCount extends ValueSourceParser {
@Override
public ValueSource parse(FunctionQParser fp) throws SyntaxError {
// hitcount() takes no arguments. If we wanted to pass a query
// we could call fp.parseNestedQuery()
HashSet<String> fields = new HashSet<String>();
while (fp.hasMoreArguments()) {
fields.add(fp.parseArg());
}
Query q = fp.subQuery(fp.getParams().get("q"), "lucene").getQuery();
HashSet<Term> terms = new HashSet<Term>();
try {
q.extractTerms(terms);
} catch (UnsupportedOperationException e) {
return new DoubleConstValueSource (1);
}
ArrayList<ValueSource> termcounts = new ArrayList<ValueSource>();
for (Term t : terms) {
if (fields.isEmpty() || fields.contains (t.field())) {
termcounts.add (new TermFreqValueSource(t.field(), t.text(), t.field(), 
t.bytes()));
}
}
return new SumFloatFunction(termcounts.toArray(new 
ValueSource[termcounts.size()]));
}
}



On 11/18/13 8:38 PM, Michael Sokolov wrote:
> So for posterity, this what I ended up doing is below.  But I have a 
> problem I don't understand; when I use fl=*,hitcount(), I get the 
> results I expect, but when I use 
> fl=*,hitcount(),hitcount('fulltext_t'), I get an NPE in Solr.  This is 
> with Solr 4.2.0.  Is there a known bug?  I googled a bit but couldn't 
> find any reference to it.
>
> Caused by: java.lang.RuntimeException: java.lang.NullPointerException
>     at 
> org.apache.solr.response.BinaryResponseWriter.getParsedResponse(BinaryResponseWriter.java:252)
>     at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.getParsedResponse(EmbeddedSolrServer.java:241)
>     at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:213)
>     ... 37 more
> Caused by: java.lang.NullPointerException
>     at 
> org.apache.lucene.queries.function.valuesource.MultiFloatFunction.createWeight(MultiFloatFunction.java:95)
>     at 
> org.apache.solr.response.transform.ValueSourceAugmenter.setContext(ValueSourceAugmenter.java:71)
>     at 
> org.apache.solr.response.transform.DocTransformers.setContext(DocTransformers.java:70)
>     at 
> org.apache.solr.response.BinaryResponseWriter$Resolver.writeResultsBody(BinaryResponseWriter.java:139)
>     at 
> org.apache.solr.response.BinaryResponseWriter$Resolver.writeResults(BinaryResponseWriter.java:173)
>     at 
> org.apache.solr.response.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:86)
>     at 
> org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:154)
>     at 
> org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:144)
>     at 
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:234)
>     at 
> org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:149)
>     at 
> org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:92)
>     at 
> org.apache.solr.response.BinaryResponseWriter.getParsedResponse(BinaryResponseWriter.java:246)
>     ... 39 more
>
>
>
> /**
>  * Defines the Solr function hitcount([field, ...]) which returns the 
> total
>  * of termfreq(term) for all terms in the query.  The argument specify
>  * fields whose terms are to be counted.  If no arguments are passed, 
> terms
>  * from every field are counted.
>  */
> public class HitCount extends ValueSourceParser {
>
>     @Override
>     public ValueSource parse(FunctionQParser fp) throws SyntaxError {
>         HashSet<String> fields = new HashSet<String>();
>         while (fp.hasMoreArguments()) {
>             fields.add(fp.parseArg());
>         }
>         Query q = fp.subQuery(fp.getParams().get("q"), 
> "lucene").getQuery();
>         HashSet<Term> terms = new HashSet<Term>();
>         q.extractTerms(terms);
>         ValueSource[] termcounts = new ValueSource[terms.size()];
>         int i = 0;
>         for (Term t : terms) {
>             if (fields.isEmpty() || fields.contains (t.field())) {
>                 termcounts[i++] = new TermFreqValueSource(t.field(), 
> t.text(), t.field(), t.bytes());
>             }
>         }
>         return new SumFloatFunction(termcounts);
> }
> }
> On 11/18/13 2:19 PM, Michael Sokolov wrote:
>> OK -- I did find SOLR-1298 
>> <https://issues.apache.org/jira/browse/SOLR-1298>which explains how 
>> to request the function as a field value.  Still looking for a 
>> function that does what I asked for ...
>>
>> On 11/18/2013 11:55 AM, Michael Sokolov wrote:
>>> Some of our customers want to display a "number of matches" score 
>>> next to each search result.  I think what they want is to list the 
>>> number of matches that will be displayed when the entire document is 
>>> highlighted.  But this can be slow to do for every search result 
>>> (some documents can be very large), so what we'd like to do is to 
>>> count the number of terms that match the query for each document, 
>>> and display that.
>>>
>>> It looks like Solr's function query has some support for this - I 
>>> see the termfreq function, for example.  My question is:
>>>
>>> 1. Is it possible to execute the query as usual, retrieving document 
>>> stored field values, and also to run a function, and return the 
>>> result as the value of a computed "pseudo-field"?
>>>
>>> 2. Is there an existing function known to the function query parser 
>>> that counts the total number of occurrences of all terms in the 
>>> query (for the current hit document)?
>>>
>>> -Mike
>>
>


NPE in function query, was: Re: getting matching term count for a query

Posted by Michael Sokolov <ms...@safaribooksonline.com>.
So for posterity, this what I ended up doing is below.  But I have a 
problem I don't understand; when I use fl=*,hitcount(), I get the 
results I expect, but when I use fl=*,hitcount(),hitcount('fulltext_t'), 
I get an NPE in Solr. This is with Solr 4.2.0.  Is there a known bug?  I 
googled a bit but couldn't find any reference to it.

Caused by: java.lang.RuntimeException: java.lang.NullPointerException
     at 
org.apache.solr.response.BinaryResponseWriter.getParsedResponse(BinaryResponseWriter.java:252)
     at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.getParsedResponse(EmbeddedSolrServer.java:241)
     at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:213)
     ... 37 more
Caused by: java.lang.NullPointerException
     at 
org.apache.lucene.queries.function.valuesource.MultiFloatFunction.createWeight(MultiFloatFunction.java:95)
     at 
org.apache.solr.response.transform.ValueSourceAugmenter.setContext(ValueSourceAugmenter.java:71)
     at 
org.apache.solr.response.transform.DocTransformers.setContext(DocTransformers.java:70)
     at 
org.apache.solr.response.BinaryResponseWriter$Resolver.writeResultsBody(BinaryResponseWriter.java:139)
     at 
org.apache.solr.response.BinaryResponseWriter$Resolver.writeResults(BinaryResponseWriter.java:173)
     at 
org.apache.solr.response.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:86)
     at 
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:154)
     at 
org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:144)
     at 
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:234)
     at 
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:149)
     at 
org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:92)
     at 
org.apache.solr.response.BinaryResponseWriter.getParsedResponse(BinaryResponseWriter.java:246)
     ... 39 more



/**
  * Defines the Solr function hitcount([field, ...]) which returns the 
total
  * of termfreq(term) for all terms in the query.  The argument specify
  * fields whose terms are to be counted.  If no arguments are passed, terms
  * from every field are counted.
  */
public class HitCount extends ValueSourceParser {

     @Override
     public ValueSource parse(FunctionQParser fp) throws SyntaxError {
         HashSet<String> fields = new HashSet<String>();
         while (fp.hasMoreArguments()) {
             fields.add(fp.parseArg());
         }
         Query q = fp.subQuery(fp.getParams().get("q"), 
"lucene").getQuery();
         HashSet<Term> terms = new HashSet<Term>();
         q.extractTerms(terms);
         ValueSource[] termcounts = new ValueSource[terms.size()];
         int i = 0;
         for (Term t : terms) {
             if (fields.isEmpty() || fields.contains (t.field())) {
                 termcounts[i++] = new TermFreqValueSource(t.field(), 
t.text(), t.field(), t.bytes());
             }
         }
         return new SumFloatFunction(termcounts);
}
}
On 11/18/13 2:19 PM, Michael Sokolov wrote:
> OK -- I did find SOLR-1298 
> <https://issues.apache.org/jira/browse/SOLR-1298>which explains how to 
> request the function as a field value. Still looking for a function 
> that does what I asked for ...
>
> On 11/18/2013 11:55 AM, Michael Sokolov wrote:
>> Some of our customers want to display a "number of matches" score 
>> next to each search result.  I think what they want is to list the 
>> number of matches that will be displayed when the entire document is 
>> highlighted.  But this can be slow to do for every search result 
>> (some documents can be very large), so what we'd like to do is to 
>> count the number of terms that match the query for each document, and 
>> display that.
>>
>> It looks like Solr's function query has some support for this - I see 
>> the termfreq function, for example.  My question is:
>>
>> 1. Is it possible to execute the query as usual, retrieving document 
>> stored field values, and also to run a function, and return the 
>> result as the value of a computed "pseudo-field"?
>>
>> 2. Is there an existing function known to the function query parser 
>> that counts the total number of occurrences of all terms in the query 
>> (for the current hit document)?
>>
>> -Mike
>


Re: getting matching term count for a query

Posted by Michael Sokolov <ms...@safaribooksonline.com>.
OK -- I did find SOLR-1298 
<https://issues.apache.org/jira/browse/SOLR-1298>which explains how to 
request the function as a field value.  Still looking for a function 
that does what I asked for ...

<https://issues.apache.org/jira/browse/SOLR-1298>
On 11/18/2013 11:55 AM, Michael Sokolov wrote:
> Some of our customers want to display a "number of matches" score next 
> to each search result.  I think what they want is to list the number 
> of matches that will be displayed when the entire document is 
> highlighted.  But this can be slow to do for every search result (some 
> documents can be very large), so what we'd like to do is to count the 
> number of terms that match the query for each document, and display that.
>
> It looks like Solr's function query has some support for this - I see 
> the termfreq function, for example.  My question is:
>
> 1. Is it possible to execute the query as usual, retrieving document 
> stored field values, and also to run a function, and return the result 
> as the value of a computed "pseudo-field"?
>
> 2. Is there an existing function known to the function query parser 
> that counts the total number of occurrences of all terms in the query 
> (for the current hit document)?
>
> -Mike