You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Sokolov <ms...@safaribooksonline.com> on 2013/11/18 17:55:07 UTC
getting matching term count for a query
Some of our customers want to display a "number of matches" score next
to each search result. I think what they want is to list the number of
matches that will be displayed when the entire document is highlighted.
But this can be slow to do for every search result (some documents can
be very large), so what we'd like to do is to count the number of terms
that match the query for each document, and display that.
It looks like Solr's function query has some support for this - I see
the termfreq function, for example. My question is:
1. Is it possible to execute the query as usual, retrieving document
stored field values, and also to run a function, and return the result
as the value of a computed "pseudo-field"?
2. Is there an existing function known to the function query parser that
counts the total number of occurrences of all terms in the query (for
the current hit document)?
-Mike
Re: NPE in function query, was: Re: getting matching term count for
a query
Posted by Michael Sokolov <ms...@safaribooksonline.com>.
OK, nevermind - I was the one adding the null --- working example
below. Last question -- does anybody know if it's possible to rewrite
MultiTermQueries in this context? I don't see how to get a hold of an
IndexReader to do that, but if it were possible, it would enable this
function to handle wildcards, etc.
-Mike
/**
* Defines the Solr function hitcount([field, ...]) which returns the total
* of termfreq(term) for all terms in the query. The arguments specify
* fields whose terms are to be counted. If no arguments are passed, terms
* from every field are counted.
*/
public class HitCount extends ValueSourceParser {
@Override
public ValueSource parse(FunctionQParser fp) throws SyntaxError {
// hitcount() takes no arguments. If we wanted to pass a query
// we could call fp.parseNestedQuery()
HashSet<String> fields = new HashSet<String>();
while (fp.hasMoreArguments()) {
fields.add(fp.parseArg());
}
Query q = fp.subQuery(fp.getParams().get("q"), "lucene").getQuery();
HashSet<Term> terms = new HashSet<Term>();
try {
q.extractTerms(terms);
} catch (UnsupportedOperationException e) {
return new DoubleConstValueSource (1);
}
ArrayList<ValueSource> termcounts = new ArrayList<ValueSource>();
for (Term t : terms) {
if (fields.isEmpty() || fields.contains (t.field())) {
termcounts.add (new TermFreqValueSource(t.field(), t.text(), t.field(),
t.bytes()));
}
}
return new SumFloatFunction(termcounts.toArray(new
ValueSource[termcounts.size()]));
}
}
On 11/18/13 8:38 PM, Michael Sokolov wrote:
> So for posterity, this what I ended up doing is below. But I have a
> problem I don't understand; when I use fl=*,hitcount(), I get the
> results I expect, but when I use
> fl=*,hitcount(),hitcount('fulltext_t'), I get an NPE in Solr. This is
> with Solr 4.2.0. Is there a known bug? I googled a bit but couldn't
> find any reference to it.
>
> Caused by: java.lang.RuntimeException: java.lang.NullPointerException
> at
> org.apache.solr.response.BinaryResponseWriter.getParsedResponse(BinaryResponseWriter.java:252)
> at
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.getParsedResponse(EmbeddedSolrServer.java:241)
> at
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:213)
> ... 37 more
> Caused by: java.lang.NullPointerException
> at
> org.apache.lucene.queries.function.valuesource.MultiFloatFunction.createWeight(MultiFloatFunction.java:95)
> at
> org.apache.solr.response.transform.ValueSourceAugmenter.setContext(ValueSourceAugmenter.java:71)
> at
> org.apache.solr.response.transform.DocTransformers.setContext(DocTransformers.java:70)
> at
> org.apache.solr.response.BinaryResponseWriter$Resolver.writeResultsBody(BinaryResponseWriter.java:139)
> at
> org.apache.solr.response.BinaryResponseWriter$Resolver.writeResults(BinaryResponseWriter.java:173)
> at
> org.apache.solr.response.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:86)
> at
> org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:154)
> at
> org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:144)
> at
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:234)
> at
> org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:149)
> at
> org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:92)
> at
> org.apache.solr.response.BinaryResponseWriter.getParsedResponse(BinaryResponseWriter.java:246)
> ... 39 more
>
>
>
> /**
> * Defines the Solr function hitcount([field, ...]) which returns the
> total
> * of termfreq(term) for all terms in the query. The argument specify
> * fields whose terms are to be counted. If no arguments are passed,
> terms
> * from every field are counted.
> */
> public class HitCount extends ValueSourceParser {
>
> @Override
> public ValueSource parse(FunctionQParser fp) throws SyntaxError {
> HashSet<String> fields = new HashSet<String>();
> while (fp.hasMoreArguments()) {
> fields.add(fp.parseArg());
> }
> Query q = fp.subQuery(fp.getParams().get("q"),
> "lucene").getQuery();
> HashSet<Term> terms = new HashSet<Term>();
> q.extractTerms(terms);
> ValueSource[] termcounts = new ValueSource[terms.size()];
> int i = 0;
> for (Term t : terms) {
> if (fields.isEmpty() || fields.contains (t.field())) {
> termcounts[i++] = new TermFreqValueSource(t.field(),
> t.text(), t.field(), t.bytes());
> }
> }
> return new SumFloatFunction(termcounts);
> }
> }
> On 11/18/13 2:19 PM, Michael Sokolov wrote:
>> OK -- I did find SOLR-1298
>> <https://issues.apache.org/jira/browse/SOLR-1298>which explains how
>> to request the function as a field value. Still looking for a
>> function that does what I asked for ...
>>
>> On 11/18/2013 11:55 AM, Michael Sokolov wrote:
>>> Some of our customers want to display a "number of matches" score
>>> next to each search result. I think what they want is to list the
>>> number of matches that will be displayed when the entire document is
>>> highlighted. But this can be slow to do for every search result
>>> (some documents can be very large), so what we'd like to do is to
>>> count the number of terms that match the query for each document,
>>> and display that.
>>>
>>> It looks like Solr's function query has some support for this - I
>>> see the termfreq function, for example. My question is:
>>>
>>> 1. Is it possible to execute the query as usual, retrieving document
>>> stored field values, and also to run a function, and return the
>>> result as the value of a computed "pseudo-field"?
>>>
>>> 2. Is there an existing function known to the function query parser
>>> that counts the total number of occurrences of all terms in the
>>> query (for the current hit document)?
>>>
>>> -Mike
>>
>
NPE in function query, was: Re: getting matching term count for a
query
Posted by Michael Sokolov <ms...@safaribooksonline.com>.
So for posterity, this what I ended up doing is below. But I have a
problem I don't understand; when I use fl=*,hitcount(), I get the
results I expect, but when I use fl=*,hitcount(),hitcount('fulltext_t'),
I get an NPE in Solr. This is with Solr 4.2.0. Is there a known bug? I
googled a bit but couldn't find any reference to it.
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at
org.apache.solr.response.BinaryResponseWriter.getParsedResponse(BinaryResponseWriter.java:252)
at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.getParsedResponse(EmbeddedSolrServer.java:241)
at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:213)
... 37 more
Caused by: java.lang.NullPointerException
at
org.apache.lucene.queries.function.valuesource.MultiFloatFunction.createWeight(MultiFloatFunction.java:95)
at
org.apache.solr.response.transform.ValueSourceAugmenter.setContext(ValueSourceAugmenter.java:71)
at
org.apache.solr.response.transform.DocTransformers.setContext(DocTransformers.java:70)
at
org.apache.solr.response.BinaryResponseWriter$Resolver.writeResultsBody(BinaryResponseWriter.java:139)
at
org.apache.solr.response.BinaryResponseWriter$Resolver.writeResults(BinaryResponseWriter.java:173)
at
org.apache.solr.response.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:86)
at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:154)
at
org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:144)
at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:234)
at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:149)
at
org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:92)
at
org.apache.solr.response.BinaryResponseWriter.getParsedResponse(BinaryResponseWriter.java:246)
... 39 more
/**
* Defines the Solr function hitcount([field, ...]) which returns the
total
* of termfreq(term) for all terms in the query. The argument specify
* fields whose terms are to be counted. If no arguments are passed, terms
* from every field are counted.
*/
public class HitCount extends ValueSourceParser {
@Override
public ValueSource parse(FunctionQParser fp) throws SyntaxError {
HashSet<String> fields = new HashSet<String>();
while (fp.hasMoreArguments()) {
fields.add(fp.parseArg());
}
Query q = fp.subQuery(fp.getParams().get("q"),
"lucene").getQuery();
HashSet<Term> terms = new HashSet<Term>();
q.extractTerms(terms);
ValueSource[] termcounts = new ValueSource[terms.size()];
int i = 0;
for (Term t : terms) {
if (fields.isEmpty() || fields.contains (t.field())) {
termcounts[i++] = new TermFreqValueSource(t.field(),
t.text(), t.field(), t.bytes());
}
}
return new SumFloatFunction(termcounts);
}
}
On 11/18/13 2:19 PM, Michael Sokolov wrote:
> OK -- I did find SOLR-1298
> <https://issues.apache.org/jira/browse/SOLR-1298>which explains how to
> request the function as a field value. Still looking for a function
> that does what I asked for ...
>
> On 11/18/2013 11:55 AM, Michael Sokolov wrote:
>> Some of our customers want to display a "number of matches" score
>> next to each search result. I think what they want is to list the
>> number of matches that will be displayed when the entire document is
>> highlighted. But this can be slow to do for every search result
>> (some documents can be very large), so what we'd like to do is to
>> count the number of terms that match the query for each document, and
>> display that.
>>
>> It looks like Solr's function query has some support for this - I see
>> the termfreq function, for example. My question is:
>>
>> 1. Is it possible to execute the query as usual, retrieving document
>> stored field values, and also to run a function, and return the
>> result as the value of a computed "pseudo-field"?
>>
>> 2. Is there an existing function known to the function query parser
>> that counts the total number of occurrences of all terms in the query
>> (for the current hit document)?
>>
>> -Mike
>
Re: getting matching term count for a query
Posted by Michael Sokolov <ms...@safaribooksonline.com>.
OK -- I did find SOLR-1298
<https://issues.apache.org/jira/browse/SOLR-1298>which explains how to
request the function as a field value. Still looking for a function
that does what I asked for ...
<https://issues.apache.org/jira/browse/SOLR-1298>
On 11/18/2013 11:55 AM, Michael Sokolov wrote:
> Some of our customers want to display a "number of matches" score next
> to each search result. I think what they want is to list the number
> of matches that will be displayed when the entire document is
> highlighted. But this can be slow to do for every search result (some
> documents can be very large), so what we'd like to do is to count the
> number of terms that match the query for each document, and display that.
>
> It looks like Solr's function query has some support for this - I see
> the termfreq function, for example. My question is:
>
> 1. Is it possible to execute the query as usual, retrieving document
> stored field values, and also to run a function, and return the result
> as the value of a computed "pseudo-field"?
>
> 2. Is there an existing function known to the function query parser
> that counts the total number of occurrences of all terms in the query
> (for the current hit document)?
>
> -Mike