You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Peter Keegan <pe...@gmail.com> on 2013/11/07 14:48:40 UTC

Function query matching

Why does this function query return docs that don't match the embedded
query?
select?qq=text:news&q={!func}sum(query($qq),0)

Re: Function query matching

Posted by Peter Keegan <pe...@gmail.com>.

  >But for your specific goal Peter: Yes, if the whole point of a function
  >you have is to wrap generated a "scaled" score of your base $qq, ...

Thanks for the confirmation, Chris. So, to do this efficiently, I think I
need to implement a custom Collector that performs the scaling (and other
math) after collecting the matching dismax query docs. I started a separate
thread asking about the state of configurable collectors.

Thanks,
Peter


On Sat, Dec 7, 2013 at 1:45 AM, Chris Hostetter <ho...@fucit.org>wrote:

>
> I had to do a double take when i read this sentence...
>
> : Even with any improvements to 'scale', all function queries will add a
> : linear increase to the Qtime as index size increases, since they match
> all
> : docs.
>
> ...because that smelled like either a bug in your methodology, or a bug in
> Solr.  To convince myself there wasn't a bug in Solr, i wrote a test case
> (i'll commit tomorow, bunch of churn in svn right now making "ant
> precommit" unhappy) to prove that when wrapping boost functions arround
> queries, Solr will only evaluate the functions for docs matching the
> wrapped query -- so there is no linear increase as the index size
> increases, just the (neccessary) libera increase as the number of
> *matching* docs grows. (for most functions anyway -- as mentioned "scale"
> is special).
>
> BUT! ... then i remembered how this thread started, and your goal of
> "scaling" the scores from a wrapped query.
>
> I want to be clear for 99% of the people reading this, if you find
> yourself writting a query structure like this...
>
>   q={!func}..functions involving wrapping $qq ...
>  qq={!edismax ...lots of stuff but still only matching subset of the
> index...}
>  fq={!query v=$qq}
>
> ...Try to restructure the match you want to do into the form of a
> multiplier
>
>   q={!boost b=$b v=$qq}
>   b=...functions producing a score multiplier...
>  qq={!edismax ...lots of stuff but still only matching subset of the
> index...}
>
> Because the later case is much more efficient and Solr will only compute
> the function values for hte docs it needs to (that match the wrapped $qq
> query)
>
> But for your specific goal Peter: Yes, if the whole point of a function
> you have is to wrap generated a "scaled" score of your base $qq, then the
> function (wrapping the scale(), wrapping the query()) is going to have to
> be evaluated for every doc -- that will definitely be linear based on the
> size of the index.
>
>
>
> -Hoss
> http://www.lucidworks.com/
>

Re: Function query matching

Posted by Peter Keegan <pe...@gmail.com>.

: The bottom line for Peter is still the same: using scale() wrapped arround
: a function/query does involve a computing hte results for every document,
: and that is going to scale linearly as the size of hte index grows -- but
: it it is *only* because of the scale function.

Another problem with this approach is that the scale() function will likely
generate incorrect values because it occurs before any filters. If the
filters drop high scoring docs, the scaled values will never include the
'maxTarget' value (and may not include the 'minTarget' value, either).

Peter


On Sat, Dec 7, 2013 at 2:30 PM, Chris Hostetter <ho...@fucit.org>wrote:

>
> (This is why i shouldn't send emails just before going to bed.)
>
> I woke up this morning realizing that of course I was completley wrong
> when i said this...
>
> : I want to be clear for 99% of the people reading this, if you find
> : yourself writting a query structure like this...
> :
> :   q={!func}..functions involving wrapping $qq ...
>         ...
> : ...Try to restructure the match you want to do into the form of a
> : multiplier
>         ...
> : Because the later case is much more efficient and Solr will only compute
> : the function values for hte docs it needs to (that match the wrapped $qq
> : query)
>
> The reason i was wrong...
>
> Even though function queries do by default match all documents, and even
> if the main query is a function query (ie: "q={!func}..."), if there is
> an "fq" that filters down the set of documents, then the (main) function
> query will only be calculated for the documents that match the filter.
>
> It was trivial to ammend the test i mentioned last night to show this (and
> i feel silly for not doing that last night and stoping myself from saying
> something foolish)...
>
>   https://svn.apache.org/viewvc?view=revision&revision=r1548955
>
> The bottom line for Peter is still the same: using scale() wrapped arround
> a function/query does involve a computing hte results for every document,
> and that is going to scale linearly as the size of hte index grows -- but
> it it is *only* because of the scale function.
>
>
>
> -Hoss
> http://www.lucidworks.com/
>

Re: Function query matching

Posted by Chris Hostetter <ho...@fucit.org>.

(This is why i shouldn't send emails just before going to bed.)

I woke up this morning realizing that of course I was completley wrong 
when i said this...

: I want to be clear for 99% of the people reading this, if you find 
: yourself writting a query structure like this...
: 
:   q={!func}..functions involving wrapping $qq ...
	...
: ...Try to restructure the match you want to do into the form of a 
: multiplier
	... 
: Because the later case is much more efficient and Solr will only compute 
: the function values for hte docs it needs to (that match the wrapped $qq 
: query)

The reason i was wrong...

Even though function queries do by default match all documents, and even 
if the main query is a function query (ie: "q={!func}..."), if there is 
an "fq" that filters down the set of documents, then the (main) function 
query will only be calculated for the documents that match the filter.

It was trivial to ammend the test i mentioned last night to show this (and 
i feel silly for not doing that last night and stoping myself from saying 
something foolish)...

  https://svn.apache.org/viewvc?view=revision&revision=r1548955

The bottom line for Peter is still the same: using scale() wrapped arround 
a function/query does involve a computing hte results for every document, 
and that is going to scale linearly as the size of hte index grows -- but 
it it is *only* because of the scale function.



-Hoss
http://www.lucidworks.com/

Re: Function query matching

Posted by Chris Hostetter <ho...@fucit.org>.

I had to do a double take when i read this sentence...

: Even with any improvements to 'scale', all function queries will add a
: linear increase to the Qtime as index size increases, since they match all
: docs.

...because that smelled like either a bug in your methodology, or a bug in 
Solr.  To convince myself there wasn't a bug in Solr, i wrote a test case 
(i'll commit tomorow, bunch of churn in svn right now making "ant 
precommit" unhappy) to prove that when wrapping boost functions arround 
queries, Solr will only evaluate the functions for docs matching the 
wrapped query -- so there is no linear increase as the index size 
increases, just the (neccessary) libera increase as the number of 
*matching* docs grows. (for most functions anyway -- as mentioned "scale" 
is special).

BUT! ... then i remembered how this thread started, and your goal of 
"scaling" the scores from a wrapped query.

I want to be clear for 99% of the people reading this, if you find 
yourself writting a query structure like this...

  q={!func}..functions involving wrapping $qq ...
 qq={!edismax ...lots of stuff but still only matching subset of the index...}
 fq={!query v=$qq}

...Try to restructure the match you want to do into the form of a 
multiplier

  q={!boost b=$b v=$qq}
  b=...functions producing a score multiplier...
 qq={!edismax ...lots of stuff but still only matching subset of the index...}

Because the later case is much more efficient and Solr will only compute 
the function values for hte docs it needs to (that match the wrapped $qq 
query)

But for your specific goal Peter: Yes, if the whole point of a function 
you have is to wrap generated a "scaled" score of your base $qq, then the 
function (wrapping the scale(), wrapping the query()) is going to have to 
be evaluated for every doc -- that will definitely be linear based on the 
size of the index.



-Hoss
http://www.lucidworks.com/

Re: Function query matching

Posted by Peter Keegan <pe...@gmail.com>.

In my previous posting, I said:

  "Subsequent calls to ScaleFloatFuntion.getValues bypassed
'createScaleInfo and  added ~0 time."

These subsequent calls are for the remaining segments in the index reader
(21 segments).

Peter



On Fri, Dec 6, 2013 at 2:10 PM, Peter Keegan <pe...@gmail.com> wrote:

> I added some timing logging to IndexSearcher and ScaleFloatFunction and
> compared a simple DisMax query with a DisMax query wrapped in the scale
> function. The index size was 500K docs, 61K docs match the DisMax query.
> The simple DisMax query took 33 ms, the function query took 89 ms. What I
> found was:
>
> 1. The scale query only normalized the scores once (in
> ScaleInfo.createScaleInfo) and added 33 ms to the Qtime.  Subsequent calls
> to ScaleFloatFuntion.getValues bypassed 'createScaleInfo and  added ~0 time.
>
> 2. The FunctionQuery 'nextDoc' iterations added 16 ms over the DisMax
> 'nextDoc' iterations.
>
> Here's the breakdown:
>
> Simple DisMax query:
> weight.scorer: 3 ms (get term enum)
> scorer.score: 23 ms (nextDoc iterations)
> other: 3 ms
> Total: 33 ms
>
> DisMax wrapped in ScaleFloatFunction:
> weight.scorer: 39 ms (get scaled values)
> scorer.score: 39 ms (nextDoc iterations)
> other: 11 ms
> Total: 89 ms
>
> Even with any improvements to 'scale', all function queries will add a
> linear increase to the Qtime as index size increases, since they match all
> docs.
>
> Trey: I'd be happy to test any patch that you find improves the speed.
>
>
>
> On Mon, Dec 2, 2013 at 11:21 PM, Trey Grainger <so...@gmail.com> wrote:
>
>> We're working on the same problem with the combination of the
>> scale(query(...)) combination, so I'd like to share a bit more information
>> that may be useful.
>>
>> *On the scale function:*
>> Even thought the scale query has to calculate the scores for all
>> documents,
>> it is actually doing this work twice for each ValueSource (once to
>> calculate the min and max values, and then again when actually scoring the
>> documents), which is inefficient.
>>
>> To solve the problem, we're in the process of putting a cache inside the
>> scale function to remember the values for each document when they are
>> initially computed (to find the min and max) so that the second pass can
>> just use the previously computed values for each document.  Our theory is
>> that most of the extra time due to the scale function is really just the
>> result of doing duplicate work.
>>
>> No promises this won't be overly costly in terms of memory utilization,
>> but
>> we'll see what we get in terms of speed improvements and will share the
>> code if it works out well.  Alternate implementation suggestions (or
>> criticism of a cache like this) are also welcomed.
>>
>>
>> *On the NoOp product function: scale(prod(1, query(...))):*
>> We do the same thing, which ultimately is just an unnecessary waste of a
>> loop through all documents to do an extra multiplication step.  I just
>> debugged the code and uncovered the problem.  There is a Map (called
>> context) that is passed through to each value source to store intermediate
>> state, and both the query and scale functions are passing the ValueSource
>> for the query function in as the KEY to this Map (as opposed to using some
>> composite key that makes sense in the current context).  Essentially,
>> these
>> lines are overwriting each other:
>>
>> Inside ScaleFloatFunction: context.put(this.source, scaleInfo);
>>  //this.source refers to the QueryValueSource, and the scaleInfo refers to
>> a ScaleInfo object
>> Inside QueryValueSource: context.put(this, w); //this refers to the same
>> QueryValueSource from above, and the w refers to a Weight object
>>
>> As such, when the ScaleFloatFunction later goes to read the ScaleInfo from
>> the context Map, it unexpectedly pulls the Weight object out instead and
>> thus the invalid case exception occurs.  The NoOp multiplication works
>> because it puts an "different" ValueSource between the query and the
>> ScaleFloatFunction such that this.source (in ScaleFloatFunction) != this
>> (in QueryValueSource).
>>
>> This should be an easy fix.  I'll create a JIRA ticket to use better key
>> names in these functions and push up a patch.  This will eliminate the
>> need
>> for the extra NoOp function.
>>
>> -Trey
>>
>>
>> On Mon, Dec 2, 2013 at 12:41 PM, Peter Keegan <peterlkeegan@gmail.com
>> >wrote:
>>
>> > I'm persuing this possible PostFilter solution, I can see how to collect
>> > all the hits and recompute the scores in a PostFilter, after all the
>> hits
>> > have been collected (for scaling). Now, I can't see how to get the
>> custom
>> > doc/score values back into the main query's HitQueue. Any advice?
>> >
>> > Thanks,
>> > Peter
>> >
>> >
>> > On Fri, Nov 29, 2013 at 9:18 AM, Peter Keegan <peterlkeegan@gmail.com
>> > >wrote:
>> >
>> > > Instead of using a function query, could I use the edismax query (plus
>> > > some low cost filters not shown in the example) and implement the
>> > > scale/sum/product computation in a PostFilter? Is the query's maxScore
>> > > available there?
>> > >
>> > > Thanks,
>> > > Peter
>> > >
>> > >
>> > > On Wed, Nov 27, 2013 at 1:58 PM, Peter Keegan <peterlkeegan@gmail.com
>> > >wrote:
>> > >
>> > >> Although the 'scale' is a big part of it, here's a closer breakdown.
>> > Here
>> > >> are 4 queries with increasing functions, and theei response times
>> > (caching
>> > >> turned off in solrconfig):
>> > >>
>> > >> 100 msec:
>> > >> select?q={!edismax v='news' qf='title^2 body'}
>> > >>
>> > >> 135 msec:
>> > >> select?qq={!edismax v='news' qf='title^2
>> > >> body'}q={!func}product(field(myfield),query($qq)&fq={!query v=$qq}
>> > >>
>> > >> 200 msec:
>> > >> select?qq={!edismax v='news' qf='title^2
>> > >>
>> >
>> body'}q={!func}sum(product(0.75,query($qq)),product(0.25,field(myfield))))&fq={!query
>> > >> v=$qq}
>> > >>
>> > >> 320 msec:
>> > >>  select?qq={!edismax v='news' qf='title^2
>> > >>
>> >
>> body'}&scaledQ=scale(product(query($qq),1),0,1)&q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))&fq={!query
>> > >> v=$qq}
>> > >>
>> > >> Btw, that no-op product is necessary, else you get this exception:
>> > >>
>> > >> org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to
>> >
>> org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo
>> > >>
>> > >> thanks,
>> > >>
>> > >> peter
>> > >>
>> > >>
>> > >>
>> > >> On Wed, Nov 27, 2013 at 1:30 PM, Chris Hostetter <
>> > >> hossman_lucene@fucit.org> wrote:
>> > >>
>> > >>>
>> > >>> : So, this query does just what I want, but it's typically 3 times
>> > slower
>> > >>> : than the edismax query  without the functions:
>> > >>>
>> > >>> that's because the scale() function is inhernetly slow (it has to
>> > >>> compute the min & max value for every document in order to know how
>> to
>> > >>> scale them)
>> > >>>
>> > >>> what you are seeing is the price you have to pay to get that query
>> > with a
>> > >>> "normalized" 0-1 value.
>> > >>>
>> > >>> (you might be able to save a little bit of time by eliminating that
>> > >>> no-Op multiply by 1: "product(query($qq),1)" ... but i doubt you'll
>> > even
>> > >>> notice much of a chnage given that scale function.
>> > >>>
>> > >>> : Is there any way to speed this up? Would writing a custom function
>> > >>> query
>> > >>> : that compiled all the function queries together be any faster?
>> > >>>
>> > >>> If you can find a faster implementation for scale() then by all
>> means
>> > let
>> > >>> us konw, and we can fold it back into Solr.
>> > >>>
>> > >>>
>> > >>> -Hoss
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>>
>
>

Re: Function query matching

Posted by Peter Keegan <pe...@gmail.com>.

I added some timing logging to IndexSearcher and ScaleFloatFunction and
compared a simple DisMax query with a DisMax query wrapped in the scale
function. The index size was 500K docs, 61K docs match the DisMax query.
The simple DisMax query took 33 ms, the function query took 89 ms. What I
found was:

1. The scale query only normalized the scores once (in
ScaleInfo.createScaleInfo) and added 33 ms to the Qtime.  Subsequent calls
to ScaleFloatFuntion.getValues bypassed 'createScaleInfo and  added ~0 time.

2. The FunctionQuery 'nextDoc' iterations added 16 ms over the DisMax
'nextDoc' iterations.

Here's the breakdown:

Simple DisMax query:
weight.scorer: 3 ms (get term enum)
scorer.score: 23 ms (nextDoc iterations)
other: 3 ms
Total: 33 ms

DisMax wrapped in ScaleFloatFunction:
weight.scorer: 39 ms (get scaled values)
scorer.score: 39 ms (nextDoc iterations)
other: 11 ms
Total: 89 ms

Even with any improvements to 'scale', all function queries will add a
linear increase to the Qtime as index size increases, since they match all
docs.

Trey: I'd be happy to test any patch that you find improves the speed.



On Mon, Dec 2, 2013 at 11:21 PM, Trey Grainger <so...@gmail.com> wrote:

> We're working on the same problem with the combination of the
> scale(query(...)) combination, so I'd like to share a bit more information
> that may be useful.
>
> *On the scale function:*
> Even thought the scale query has to calculate the scores for all documents,
> it is actually doing this work twice for each ValueSource (once to
> calculate the min and max values, and then again when actually scoring the
> documents), which is inefficient.
>
> To solve the problem, we're in the process of putting a cache inside the
> scale function to remember the values for each document when they are
> initially computed (to find the min and max) so that the second pass can
> just use the previously computed values for each document.  Our theory is
> that most of the extra time due to the scale function is really just the
> result of doing duplicate work.
>
> No promises this won't be overly costly in terms of memory utilization, but
> we'll see what we get in terms of speed improvements and will share the
> code if it works out well.  Alternate implementation suggestions (or
> criticism of a cache like this) are also welcomed.
>
>
> *On the NoOp product function: scale(prod(1, query(...))):*
> We do the same thing, which ultimately is just an unnecessary waste of a
> loop through all documents to do an extra multiplication step.  I just
> debugged the code and uncovered the problem.  There is a Map (called
> context) that is passed through to each value source to store intermediate
> state, and both the query and scale functions are passing the ValueSource
> for the query function in as the KEY to this Map (as opposed to using some
> composite key that makes sense in the current context).  Essentially, these
> lines are overwriting each other:
>
> Inside ScaleFloatFunction: context.put(this.source, scaleInfo);
>  //this.source refers to the QueryValueSource, and the scaleInfo refers to
> a ScaleInfo object
> Inside QueryValueSource: context.put(this, w); //this refers to the same
> QueryValueSource from above, and the w refers to a Weight object
>
> As such, when the ScaleFloatFunction later goes to read the ScaleInfo from
> the context Map, it unexpectedly pulls the Weight object out instead and
> thus the invalid case exception occurs.  The NoOp multiplication works
> because it puts an "different" ValueSource between the query and the
> ScaleFloatFunction such that this.source (in ScaleFloatFunction) != this
> (in QueryValueSource).
>
> This should be an easy fix.  I'll create a JIRA ticket to use better key
> names in these functions and push up a patch.  This will eliminate the need
> for the extra NoOp function.
>
> -Trey
>
>
> On Mon, Dec 2, 2013 at 12:41 PM, Peter Keegan <peterlkeegan@gmail.com
> >wrote:
>
> > I'm persuing this possible PostFilter solution, I can see how to collect
> > all the hits and recompute the scores in a PostFilter, after all the hits
> > have been collected (for scaling). Now, I can't see how to get the custom
> > doc/score values back into the main query's HitQueue. Any advice?
> >
> > Thanks,
> > Peter
> >
> >
> > On Fri, Nov 29, 2013 at 9:18 AM, Peter Keegan <peterlkeegan@gmail.com
> > >wrote:
> >
> > > Instead of using a function query, could I use the edismax query (plus
> > > some low cost filters not shown in the example) and implement the
> > > scale/sum/product computation in a PostFilter? Is the query's maxScore
> > > available there?
> > >
> > > Thanks,
> > > Peter
> > >
> > >
> > > On Wed, Nov 27, 2013 at 1:58 PM, Peter Keegan <peterlkeegan@gmail.com
> > >wrote:
> > >
> > >> Although the 'scale' is a big part of it, here's a closer breakdown.
> > Here
> > >> are 4 queries with increasing functions, and theei response times
> > (caching
> > >> turned off in solrconfig):
> > >>
> > >> 100 msec:
> > >> select?q={!edismax v='news' qf='title^2 body'}
> > >>
> > >> 135 msec:
> > >> select?qq={!edismax v='news' qf='title^2
> > >> body'}q={!func}product(field(myfield),query($qq)&fq={!query v=$qq}
> > >>
> > >> 200 msec:
> > >> select?qq={!edismax v='news' qf='title^2
> > >>
> >
> body'}q={!func}sum(product(0.75,query($qq)),product(0.25,field(myfield))))&fq={!query
> > >> v=$qq}
> > >>
> > >> 320 msec:
> > >>  select?qq={!edismax v='news' qf='title^2
> > >>
> >
> body'}&scaledQ=scale(product(query($qq),1),0,1)&q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))&fq={!query
> > >> v=$qq}
> > >>
> > >> Btw, that no-op product is necessary, else you get this exception:
> > >>
> > >> org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to
> >
> org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo
> > >>
> > >> thanks,
> > >>
> > >> peter
> > >>
> > >>
> > >>
> > >> On Wed, Nov 27, 2013 at 1:30 PM, Chris Hostetter <
> > >> hossman_lucene@fucit.org> wrote:
> > >>
> > >>>
> > >>> : So, this query does just what I want, but it's typically 3 times
> > slower
> > >>> : than the edismax query  without the functions:
> > >>>
> > >>> that's because the scale() function is inhernetly slow (it has to
> > >>> compute the min & max value for every document in order to know how
> to
> > >>> scale them)
> > >>>
> > >>> what you are seeing is the price you have to pay to get that query
> > with a
> > >>> "normalized" 0-1 value.
> > >>>
> > >>> (you might be able to save a little bit of time by eliminating that
> > >>> no-Op multiply by 1: "product(query($qq),1)" ... but i doubt you'll
> > even
> > >>> notice much of a chnage given that scale function.
> > >>>
> > >>> : Is there any way to speed this up? Would writing a custom function
> > >>> query
> > >>> : that compiled all the function queries together be any faster?
> > >>>
> > >>> If you can find a faster implementation for scale() then by all means
> > let
> > >>> us konw, and we can fold it back into Solr.
> > >>>
> > >>>
> > >>> -Hoss
> > >>>
> > >>
> > >>
> > >
> >
>

Re: Function query matching

Posted by Trey Grainger <so...@gmail.com>.

We're working on the same problem with the combination of the
scale(query(...)) combination, so I'd like to share a bit more information
that may be useful.

*On the scale function:*
Even thought the scale query has to calculate the scores for all documents,
it is actually doing this work twice for each ValueSource (once to
calculate the min and max values, and then again when actually scoring the
documents), which is inefficient.

To solve the problem, we're in the process of putting a cache inside the
scale function to remember the values for each document when they are
initially computed (to find the min and max) so that the second pass can
just use the previously computed values for each document.  Our theory is
that most of the extra time due to the scale function is really just the
result of doing duplicate work.

No promises this won't be overly costly in terms of memory utilization, but
we'll see what we get in terms of speed improvements and will share the
code if it works out well.  Alternate implementation suggestions (or
criticism of a cache like this) are also welcomed.

*On the NoOp product function: scale(prod(1, query(...))):*
We do the same thing, which ultimately is just an unnecessary waste of a
loop through all documents to do an extra multiplication step.  I just
debugged the code and uncovered the problem.  There is a Map (called
context) that is passed through to each value source to store intermediate
state, and both the query and scale functions are passing the ValueSource
for the query function in as the KEY to this Map (as opposed to using some
composite key that makes sense in the current context).  Essentially, these
lines are overwriting each other:

Inside ScaleFloatFunction: context.put(this.source, scaleInfo);
 //this.source refers to the QueryValueSource, and the scaleInfo refers to
a ScaleInfo object
Inside QueryValueSource: context.put(this, w); //this refers to the same
QueryValueSource from above, and the w refers to a Weight object

As such, when the ScaleFloatFunction later goes to read the ScaleInfo from
the context Map, it unexpectedly pulls the Weight object out instead and
thus the invalid case exception occurs.  The NoOp multiplication works
because it puts an "different" ValueSource between the query and the
ScaleFloatFunction such that this.source (in ScaleFloatFunction) != this
(in QueryValueSource).

This should be an easy fix.  I'll create a JIRA ticket to use better key
names in these functions and push up a patch.  This will eliminate the need
for the extra NoOp function.

-Trey

On Mon, Dec 2, 2013 at 12:41 PM, Peter Keegan <pe...@gmail.com>wrote:

> I'm persuing this possible PostFilter solution, I can see how to collect
> all the hits and recompute the scores in a PostFilter, after all the hits
> have been collected (for scaling). Now, I can't see how to get the custom
> doc/score values back into the main query's HitQueue. Any advice?
>
> Thanks,
> Peter
>
>
> On Fri, Nov 29, 2013 at 9:18 AM, Peter Keegan <peterlkeegan@gmail.com
> >wrote:
>
> > Instead of using a function query, could I use the edismax query (plus
> > some low cost filters not shown in the example) and implement the
> > scale/sum/product computation in a PostFilter? Is the query's maxScore
> > available there?
> >
> > Thanks,
> > Peter
> >
> >
> > On Wed, Nov 27, 2013 at 1:58 PM, Peter Keegan <peterlkeegan@gmail.com
> >wrote:
> >
> >> Although the 'scale' is a big part of it, here's a closer breakdown.
> Here
> >> are 4 queries with increasing functions, and theei response times
> (caching
> >> turned off in solrconfig):
> >>
> >> 100 msec:
> >> select?q={!edismax v='news' qf='title^2 body'}
> >>
> >> 135 msec:
> >> select?qq={!edismax v='news' qf='title^2
> >> body'}q={!func}product(field(myfield),query($qq)&fq={!query v=$qq}
> >>
> >> 200 msec:
> >> select?qq={!edismax v='news' qf='title^2
> >>
> body'}q={!func}sum(product(0.75,query($qq)),product(0.25,field(myfield))))&fq={!query
> >> v=$qq}
> >>
> >> 320 msec:
> >>  select?qq={!edismax v='news' qf='title^2
> >>
> body'}&scaledQ=scale(product(query($qq),1),0,1)&q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))&fq={!query
> >> v=$qq}
> >>
> >> Btw, that no-op product is necessary, else you get this exception:
> >>
> >> org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to
> org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo
> >>
> >> thanks,
> >>
> >> peter
> >>
> >>
> >>
> >> On Wed, Nov 27, 2013 at 1:30 PM, Chris Hostetter <
> >> hossman_lucene@fucit.org> wrote:
> >>
> >>>
> >>> : So, this query does just what I want, but it's typically 3 times
> slower
> >>> : than the edismax query  without the functions:
> >>>
> >>> that's because the scale() function is inhernetly slow (it has to
> >>> compute the min & max value for every document in order to know how to
> >>> scale them)
> >>>
> >>> what you are seeing is the price you have to pay to get that query
> with a
> >>> "normalized" 0-1 value.
> >>>
> >>> (you might be able to save a little bit of time by eliminating that
> >>> no-Op multiply by 1: "product(query($qq),1)" ... but i doubt you'll
> even
> >>> notice much of a chnage given that scale function.
> >>>
> >>> : Is there any way to speed this up? Would writing a custom function
> >>> query
> >>> : that compiled all the function queries together be any faster?
> >>>
> >>> If you can find a faster implementation for scale() then by all means
> let
> >>> us konw, and we can fold it back into Solr.
> >>>
> >>>
> >>> -Hoss
> >>>
> >>
> >>
> >
>

Re: Function query matching

Posted by Peter Keegan <pe...@gmail.com>.

I'm persuing this possible PostFilter solution, I can see how to collect
all the hits and recompute the scores in a PostFilter, after all the hits
have been collected (for scaling). Now, I can't see how to get the custom
doc/score values back into the main query's HitQueue. Any advice?

Thanks,
Peter


On Fri, Nov 29, 2013 at 9:18 AM, Peter Keegan <pe...@gmail.com>wrote:

> Instead of using a function query, could I use the edismax query (plus
> some low cost filters not shown in the example) and implement the
> scale/sum/product computation in a PostFilter? Is the query's maxScore
> available there?
>
> Thanks,
> Peter
>
>
> On Wed, Nov 27, 2013 at 1:58 PM, Peter Keegan <pe...@gmail.com>wrote:
>
>> Although the 'scale' is a big part of it, here's a closer breakdown. Here
>> are 4 queries with increasing functions, and theei response times (caching
>> turned off in solrconfig):
>>
>> 100 msec:
>> select?q={!edismax v='news' qf='title^2 body'}
>>
>> 135 msec:
>> select?qq={!edismax v='news' qf='title^2
>> body'}q={!func}product(field(myfield),query($qq)&fq={!query v=$qq}
>>
>> 200 msec:
>> select?qq={!edismax v='news' qf='title^2
>> body'}q={!func}sum(product(0.75,query($qq)),product(0.25,field(myfield))))&fq={!query
>> v=$qq}
>>
>> 320 msec:
>>  select?qq={!edismax v='news' qf='title^2
>> body'}&scaledQ=scale(product(query($qq),1),0,1)&q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))&fq={!query
>> v=$qq}
>>
>> Btw, that no-op product is necessary, else you get this exception:
>>
>> org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo
>>
>> thanks,
>>
>> peter
>>
>>
>>
>> On Wed, Nov 27, 2013 at 1:30 PM, Chris Hostetter <
>> hossman_lucene@fucit.org> wrote:
>>
>>>
>>> : So, this query does just what I want, but it's typically 3 times slower
>>> : than the edismax query  without the functions:
>>>
>>> that's because the scale() function is inhernetly slow (it has to
>>> compute the min & max value for every document in order to know how to
>>> scale them)
>>>
>>> what you are seeing is the price you have to pay to get that query with a
>>> "normalized" 0-1 value.
>>>
>>> (you might be able to save a little bit of time by eliminating that
>>> no-Op multiply by 1: "product(query($qq),1)" ... but i doubt you'll even
>>> notice much of a chnage given that scale function.
>>>
>>> : Is there any way to speed this up? Would writing a custom function
>>> query
>>> : that compiled all the function queries together be any faster?
>>>
>>> If you can find a faster implementation for scale() then by all means let
>>> us konw, and we can fold it back into Solr.
>>>
>>>
>>> -Hoss
>>>
>>
>>
>

Re: Function query matching

Posted by Peter Keegan <pe...@gmail.com>.

Instead of using a function query, could I use the edismax query (plus some
low cost filters not shown in the example) and implement the
scale/sum/product computation in a PostFilter? Is the query's maxScore
available there?

Thanks,
Peter


On Wed, Nov 27, 2013 at 1:58 PM, Peter Keegan <pe...@gmail.com>wrote:

> Although the 'scale' is a big part of it, here's a closer breakdown. Here
> are 4 queries with increasing functions, and theei response times (caching
> turned off in solrconfig):
>
> 100 msec:
> select?q={!edismax v='news' qf='title^2 body'}
>
> 135 msec:
> select?qq={!edismax v='news' qf='title^2
> body'}q={!func}product(field(myfield),query($qq)&fq={!query v=$qq}
>
> 200 msec:
> select?qq={!edismax v='news' qf='title^2
> body'}q={!func}sum(product(0.75,query($qq)),product(0.25,field(myfield))))&fq={!query
> v=$qq}
>
> 320 msec:
> select?qq={!edismax v='news' qf='title^2
> body'}&scaledQ=scale(product(query($qq),1),0,1)&q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))&fq={!query
> v=$qq}
>
> Btw, that no-op product is necessary, else you get this exception:
>
> org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo
>
> thanks,
>
> peter
>
>
>
> On Wed, Nov 27, 2013 at 1:30 PM, Chris Hostetter <hossman_lucene@fucit.org
> > wrote:
>
>>
>> : So, this query does just what I want, but it's typically 3 times slower
>> : than the edismax query  without the functions:
>>
>> that's because the scale() function is inhernetly slow (it has to
>> compute the min & max value for every document in order to know how to
>> scale them)
>>
>> what you are seeing is the price you have to pay to get that query with a
>> "normalized" 0-1 value.
>>
>> (you might be able to save a little bit of time by eliminating that
>> no-Op multiply by 1: "product(query($qq),1)" ... but i doubt you'll even
>> notice much of a chnage given that scale function.
>>
>> : Is there any way to speed this up? Would writing a custom function query
>> : that compiled all the function queries together be any faster?
>>
>> If you can find a faster implementation for scale() then by all means let
>> us konw, and we can fold it back into Solr.
>>
>>
>> -Hoss
>>
>
>

Re: Function query matching

Posted by Peter Keegan <pe...@gmail.com>.

Although the 'scale' is a big part of it, here's a closer breakdown. Here
are 4 queries with increasing functions, and theei response times (caching
turned off in solrconfig):

100 msec:
select?q={!edismax v='news' qf='title^2 body'}

135 msec:
select?qq={!edismax v='news' qf='title^2
body'}q={!func}product(field(myfield),query($qq)&fq={!query v=$qq}

200 msec:
select?qq={!edismax v='news' qf='title^2
body'}q={!func}sum(product(0.75,query($qq)),product(0.25,field(myfield))))&fq={!query
v=$qq}

320 msec:
select?qq={!edismax v='news' qf='title^2
body'}&scaledQ=scale(product(query($qq),1),0,1)&q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))&fq={!query
v=$qq}

Btw, that no-op product is necessary, else you get this exception:

org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to
org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo

thanks,

peter



On Wed, Nov 27, 2013 at 1:30 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : So, this query does just what I want, but it's typically 3 times slower
> : than the edismax query  without the functions:
>
> that's because the scale() function is inhernetly slow (it has to
> compute the min & max value for every document in order to know how to
> scale them)
>
> what you are seeing is the price you have to pay to get that query with a
> "normalized" 0-1 value.
>
> (you might be able to save a little bit of time by eliminating that
> no-Op multiply by 1: "product(query($qq),1)" ... but i doubt you'll even
> notice much of a chnage given that scale function.
>
> : Is there any way to speed this up? Would writing a custom function query
> : that compiled all the function queries together be any faster?
>
> If you can find a faster implementation for scale() then by all means let
> us konw, and we can fold it back into Solr.
>
>
> -Hoss
>

Re: Function query matching

Posted by Chris Hostetter <ho...@fucit.org>.

: So, this query does just what I want, but it's typically 3 times slower
: than the edismax query  without the functions:

that's because the scale() function is inhernetly slow (it has to 
compute the min & max value for every document in order to know how to 
scale them)

what you are seeing is the price you have to pay to get that query with a 
"normalized" 0-1 value.

(you might be able to save a little bit of time by eliminating that 
no-Op multiply by 1: "product(query($qq),1)" ... but i doubt you'll even 
notice much of a chnage given that scale function.

: Is there any way to speed this up? Would writing a custom function query
: that compiled all the function queries together be any faster?

If you can find a faster implementation for scale() then by all means let 
us konw, and we can fold it back into Solr.


-Hoss

Re: Function query matching

Posted by Peter Keegan <pe...@gmail.com>.

Hi,

So, this query does just what I want, but it's typically 3 times slower
than the edismax query  without the functions:

select?qq={!edismax v='news' qf='title^2 body'}&scaledQ=scale(product(
query($qq),1),0,1)&q={!func}sum(product(0.75,$scaledQ),
product(0.25,field(myfield)))&fq={!query v=$qq}

Is there any way to speed this up? Would writing a custom function query
that compiled all the function queries together be any faster?

Thanks,
Peter

On Mon, Nov 11, 2013 at 1:31 PM, Peter Keegan <pe...@gmail.com>wrote:

> Thanks
>
>
> On Mon, Nov 11, 2013 at 11:46 AM, Yonik Seeley <yo...@heliosearch.com>wrote:
>
>> On Mon, Nov 11, 2013 at 11:39 AM, Peter Keegan <pe...@gmail.com>
>> wrote:
>> > fq=$qq
>> >
>> > What is the proper syntax?
>>
>> fq={!query v=$qq}
>>
>> -Yonik
>> http://heliosearch.com -- making solr shine
>>
>
>

Re: Function query matching

Posted by Peter Keegan <pe...@gmail.com>.

Thanks


On Mon, Nov 11, 2013 at 11:46 AM, Yonik Seeley <yo...@heliosearch.com>wrote:

> On Mon, Nov 11, 2013 at 11:39 AM, Peter Keegan <pe...@gmail.com>
> wrote:
> > fq=$qq
> >
> > What is the proper syntax?
>
> fq={!query v=$qq}
>
> -Yonik
> http://heliosearch.com -- making solr shine
>

Re: Function query matching

Posted by Yonik Seeley <yo...@heliosearch.com>.

On Mon, Nov 11, 2013 at 11:39 AM, Peter Keegan <pe...@gmail.com> wrote:
> fq=$qq
>
> What is the proper syntax?

fq={!query v=$qq}

-Yonik
http://heliosearch.com -- making solr shine

Re: Function query matching

Posted by Peter Keegan <pe...@gmail.com>.

I replaced the frange filter with the following filter and got the correct
no. of results and it was 3X faster:

select?qq={!edismax v='news' qf='title^2
body'}&scaledQ=scale(product(query($qq),1),0,1)&q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))&fq={!edismax
v='news' qf='title^2 body'}

Then, I tried to simplify the query with parameter substitution, but 'fq'
didn't parse correctly:

select?qq={!edismax v='news' qf='title^2
body'}&scaledQ=scale(product(query($qq),1),0,1)&q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))&fq=$qq

What is the proper syntax?

Thanks,
Peter


On Thu, Nov 7, 2013 at 2:16 PM, Peter Keegan <pe...@gmail.com> wrote:

> I'm trying to used a normalized score in a query as I described in a
> recent thread titled "Re: How to get similarity score between 0 and 1 not
> relative score"
>
> I'm using this query:
> select?qq={!edismax v='news' qf='title^2
> body'}&scaledQ=scale(product(query($qq),1),0,1)&q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))&fq={!frange
> l=0.001}$q
>
> Is there another way to accomplish this using dismax boosting?
>
>
>
> On Thu, Nov 7, 2013 at 12:55 PM, Jason Hellman <
> jhellman@innoventsolutions.com> wrote:
>
>> You can, of course, us a function range query:
>>
>> select?q=text:news&fq={!frange l=0 u=100}sum(x,y)
>>
>>
>> http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/search/FunctionRangeQParserPlugin.html
>>
>> This will give you a bit more flexibility to meet your goal.
>>
>> On Nov 7, 2013, at 7:26 AM, Erik Hatcher <er...@gmail.com> wrote:
>>
>> > Function queries score (all) documents, but don't filter them.  All
>> documents effectively match a function query.
>> >
>> >       Erik
>> >
>> > On Nov 7, 2013, at 1:48 PM, Peter Keegan <pe...@gmail.com>
>> wrote:
>> >
>> >> Why does this function query return docs that don't match the embedded
>> >> query?
>> >> select?qq=text:news&q={!func}sum(query($qq),0)
>> >
>>
>>
>

Re: Function query matching

Posted by Peter Keegan <pe...@gmail.com>.

I'm trying to used a normalized score in a query as I described in a recent
thread titled "Re: How to get similarity score between 0 and 1 not relative
score"

I'm using this query:
select?qq={!edismax v='news' qf='title^2
body'}&scaledQ=scale(product(query($qq),1),0,1)&q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))&fq={!frange
l=0.001}$q

Is there another way to accomplish this using dismax boosting?



On Thu, Nov 7, 2013 at 12:55 PM, Jason Hellman <
jhellman@innoventsolutions.com> wrote:

> You can, of course, us a function range query:
>
> select?q=text:news&fq={!frange l=0 u=100}sum(x,y)
>
>
> http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/search/FunctionRangeQParserPlugin.html
>
> This will give you a bit more flexibility to meet your goal.
>
> On Nov 7, 2013, at 7:26 AM, Erik Hatcher <er...@gmail.com> wrote:
>
> > Function queries score (all) documents, but don't filter them.  All
> documents effectively match a function query.
> >
> >       Erik
> >
> > On Nov 7, 2013, at 1:48 PM, Peter Keegan <pe...@gmail.com> wrote:
> >
> >> Why does this function query return docs that don't match the embedded
> >> query?
> >> select?qq=text:news&q={!func}sum(query($qq),0)
> >
>
>

Re: Function query matching

Posted by Jason Hellman <jh...@innoventsolutions.com>.

You can, of course, us a function range query:

select?q=text:news&fq={!frange l=0 u=100}sum(x,y)

http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/search/FunctionRangeQParserPlugin.html

This will give you a bit more flexibility to meet your goal.

On Nov 7, 2013, at 7:26 AM, Erik Hatcher <er...@gmail.com> wrote:

> Function queries score (all) documents, but don't filter them.  All documents effectively match a function query.   
> 
> 	Erik
> 
> On Nov 7, 2013, at 1:48 PM, Peter Keegan <pe...@gmail.com> wrote:
> 
>> Why does this function query return docs that don't match the embedded
>> query?
>> select?qq=text:news&q={!func}sum(query($qq),0)
>

Re: Function query matching

Posted by Erik Hatcher <er...@gmail.com>.

Function queries score (all) documents, but don't filter them.  All documents effectively match a function query.   

	Erik

On Nov 7, 2013, at 1:48 PM, Peter Keegan <pe...@gmail.com> wrote:

> Why does this function query return docs that don't match the embedded
> query?
> select?qq=text:news&q={!func}sum(query($qq),0)