You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Roy Ward <ro...@buymusichere.com> on 2007/10/17 08:05:38 UTC

Re: [jira] Commented: (LUCENE-997) Add search timeout support to Lucene

Hoss Man commented on LUCENE-997:
> I'm not entirely convinced it makes sense to modify these classes to
> include timeouts
> as core functionality ... would it make more sense to deal with this in
> subclasses of 
> IndexSearcher/TopDocs/Hits ?

I like the idea of timeouts as core functionality, as it makes it much
easier to deal with things like partial results.

I do have some thoughts on the patch though:

(1) You only added timeouts to:

  public TopDocs search(Weight weight, Filter filter, final int nDocs)

It's confusing if timeout functionality is not also added to:

  public TopFieldDocs search(Weight weight, Filter filter, final int nDocs, 
Sort sort)

(2) Estimating the the number of results is a good idea, however it breaks
some of the code in Hits.java when the Vector of results is not as long as
expected. This either needs more work or just returning the number or
results actually found. Perhaps a separate method for getting the estimate
in the case of partial results would be the way to go.

(3) The timer, consisting of a whole lot of millisecond pauses (if the
resolution is 1) is not accurate (certainly under load). There needs to be
at least an occasional call to an accurate timer. It would also be better to
replace getCounter() by something like getMilliseconds() so the caller does
not need to know the resolution of the timer.

-- 
View this message in context: http://www.nabble.com/-jira--Created%3A-%28LUCENE-997%29-Add-search-timeout-support-to-Lucene-tf4438431.html#a13247616
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-997) Add search timeout support to Lucene

Posted by Roy Ward <ro...@buymusichere.com>.
Sean Timm wrote:

>> (2) Estimating the the number of results <snip>

>Is there a test case that shows this breakage, or can you point me to 
>the code in Hits.java that my patch causes problems with?  Sorry, I'm 
>not seeing it.

In the case of no hits at all getting returned, the following code:

if(hits.hasNext())
{
    Hit hit = (Hit)hits.next();
    float s = hit.getScore();
    ...
}

throws the following exception (line numbers are the patch against
lucene-2.2):

java.lang.ArrayIndexOutOfBoundsException: 1 >= 1
        at org.apache.lucene.search.Hits.score(Hits.java:125)
        at org.apache.lucene.search.Hit.getScore(Hit.java:68)
<snip>

which is thrown by:

    return hitDoc(n).score;

I haven't worked through this fully (and I haven't yet put this into a nice
test case I can send you - I'm testing this within a large application), but
I think it's related to the number of estimated hits being different than
the actual number of hits, so some code doesn't check that there are enough
hits (and user code might do this too). The problem does not occur when a
search that actually returns zero hits is done.

> I wouldn't expect anyone to actually use a 1 ms resolution.
<snip>
> If that isn't preempted until 1.1 seconds  have elapsed,
> I think my operations team will still be happy.

This one is no big deal, since as you point out, it's really to stop the
tiny proportion of queries that don't finish in a reasonable time, but I was
thinking of a couple of things here:

If there's going to be a timer thread running in an application, other
things may want to make use of it (such as reporting elapsed times), so some
accuracy might be good.

Also, it's nice to have it somewhat bulletproof - If someone makes the
mistake of using a resolution of 1ms (I tested it for curiosity), on a
heavily loaded system with many threads running that resulted in about 7000
ticks every minute, so it was wrong by a factor of about 8.

If I find time time to put something in the timer class to get good accuracy
without too much performance loss, I'll send you a patch (I'm sorry I
haven't done so already, but I'm patching against 2.2 rather than the cvs
trunk).

Roy Ward
-- 
View this message in context: http://www.nabble.com/-jira--Created%3A-%28LUCENE-997%29-Add-search-timeout-support-to-Lucene-tf4438431.html#a13285307
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-997) Add search timeout support to Lucene

Posted by Sean Timm <ti...@aol.com>.
Roy,

Thanks for the review and comments.  My comments inline below.

Roy Ward wrote:
> (1) You only added timeouts to:
>
>   public TopDocs search(Weight weight, Filter filter, final int nDocs)
>
> It's confusing if timeout functionality is not also added to:
>
>   public TopFieldDocs search(Weight weight, Filter filter, final int nDocs, 
> Sort sort)
>   
Good catch.  That was an oversight.  The necessary changes were made to 
TopFieldDocCollector.java, but you are right the changes to

  public TopFieldDocs search(Weight weight, Filter filter, final int nDocs, 
Sort sort)

were not in the patch.  I have this fixed locally and will submit a 
patch shortly.
> (2) Estimating the the number of results is a good idea, however it breaks
> some of the code in Hits.java when the Vector of results is not as long as
> expected. This either needs more work or just returning the number or
> results actually found. Perhaps a separate method for getting the estimate
> in the case of partial results would be the way to go.
>   
Is there a test case that shows this breakage, or can you point me to 
the code in Hits.java that my patch causes problems with?  Sorry, I'm 
not seeing it.
> (3) The timer, consisting of a whole lot of millisecond pauses (if the
> resolution is 1) is not accurate (certainly under load). There needs to be
> at least an occasional call to an accurate timer. It would also be better to
> replace getCounter() by something like getMilliseconds() so the caller does
> not need to know the resolution of the timer.
I wouldn't expect anyone to actually use a 1 ms resolution.  That is in 
the provided test case simply because it almost guarantees a timeout 
occurs.  The accuracy of the timeout as long as it is reasonably close 
isn't terribly important.  The typical use case as I see it would be to 
preempt the occasional (< 1%)  queries that take an unreasonable amount 
of time to complete.  For example the timer may be configured for 10 
counts of 100 ms (1 second).  If that isn't preempted until 1.1 seconds 
have elapsed, I think my operations team will still be happy.

I do like your suggestion of getMilliseconds().  It is clearer.  I've 
made this change locally and will submit a patch shortly.

-Sean

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org