You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by fangz <fa...@hotmail.com> on 2008/02/26 21:49:52 UTC

Inconsistent Search Speed

Hi,

I am using a simple java program to test the search speed. The index file is
about 1.93G in size. I initiated an indexsearcher and built a query using
the query parser: parser.parse("entity:fail"). The initial run took more
than 60 seconds, but the subsequent runs only took 1.5 seconds. This does
not change with or without calling indexsearcher.close(). As I know, Lucene
does not cache results (no filter is involved). So what is causing such a
big speed difference?

Thank you in advance!

fangz
-- 
View this message in context: http://www.nabble.com/Inconsistent-Search-Speed-tp15698325p15698325.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Inconsistent Search Speed

Posted by Grant Ingersoll <gs...@apache.org>.

The first call loads various data structures into memory.  The second  
takes advantage of those structures being in memory.  What you want to  
do is "warm" the searcher by sending some queries to it before making  
it available.

-Grant

On Feb 26, 2008, at 3:49 PM, fangz wrote:

>
> Hi,
>
> I am using a simple java program to test the search speed. The index  
> file is
> about 1.93G in size. I initiated an indexsearcher and built a query  
> using
> the query parser: parser.parse("entity:fail"). The initial run took  
> more
> than 60 seconds, but the subsequent runs only took 1.5 seconds. This  
> does
> not change with or without calling indexsearcher.close(). As I know,  
> Lucene
> does not cache results (no filter is involved). So what is causing  
> such a
> big speed difference?
>
> Thank you in advance!
>
> fangz
> -- 
> View this message in context: http://www.nabble.com/Inconsistent-Search-Speed-tp15698325p15698325.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Inconsistent Search Speed

Posted by Daniel Noll <da...@nuix.com>.

On Thursday 28 February 2008 01:52:27 Erick Erickson wrote:
> And don't iterate through the Hits object for more than 100 or so hits.
> Like Mark said. Really. Really don't <G>...

Is there a good trick for avoiding this?

Say you have a situation like this...
  - User searches
  - User sees first N hits, perhaps scrolls
  - User chooses to save results to a file

Clearly for the first two, using Hits is normal.  For the third step you would 
be iterating over potentially a larger number of results, so Hits is not 
recommended.  But implementing a HitCollector from scratch to get the same 
results as Hits seems silly, so what is the usual way out of this?  Do you 
re-execute the query using TopDocs?  Or do you call hitDoc(hits.length()) to 
force Hits itself to load the remainder, and then go back to the start and 
iterate through?

Using TopDocs up-front would be desirable but it turns out it tries to 
allocate the maximum you pass in, up-front...

Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Inconsistent Search Speed

Posted by Erick Erickson <er...@gmail.com>.

To reinforce Grant's comment, lazy loading improved one situation for me
on the order of 10X. I wrote it up and it's somewhere in the Wiki. Your
results
will vary, and unless you have a LOT of stored fields I wouldn't necessarily
expect a similar speedup, but it's sure worth looking at.

And don't iterate through the Hits object for more than 100 or so hits. Like
Mark said. Really. Really don't <G>...

Best
Erick

On Wed, Feb 27, 2008 at 7:33 AM, Grant Ingersoll <gs...@apache.org>
wrote:

> You could also look at the FieldSelector when getting the Document.
> Such that you only load the one field you need
>
> -Grant
>
> On Feb 26, 2008, at 10:13 PM, Mark Miller wrote:
>
> > The Lucene prime directive: dont iterate through all of Hits! Its
> > horribly inefficient. You must use a hitcollector. Even still,
> > getting your field values will be slow no matter what if you get for
> > every hit. You don't want to do this for every hit in a search. But
> > don't loop through Hits.
> >
> > fangz wrote:
> >> Thank you for the info.  It makes sense.
> >> My search will return more than 10000 documents and I have to loop
> >> through
> >> all documents to build a list with unique field values. It seems
> >> that the
> >> looping of the hits takes the longest time in the initial run but
> >> afterwards
> >> it becomes much faster. If the hits are relatively small, I do not
> >> see the
> >> same behavior.
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> --------------------------
> Grant Ingersoll
> http://www.lucenebootcamp.com
> Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Inconsistent Search Speed

Posted by Grant Ingersoll <gs...@apache.org>.

You could also look at the FieldSelector when getting the Document.   
Such that you only load the one field you need

-Grant

On Feb 26, 2008, at 10:13 PM, Mark Miller wrote:

> The Lucene prime directive: dont iterate through all of Hits! Its  
> horribly inefficient. You must use a hitcollector. Even still,  
> getting your field values will be slow no matter what if you get for  
> every hit. You don't want to do this for every hit in a search. But  
> don't loop through Hits.
>
> fangz wrote:
>> Thank you for the info.  It makes sense.
>> My search will return more than 10000 documents and I have to loop  
>> through
>> all documents to build a list with unique field values. It seems  
>> that the
>> looping of the hits takes the longest time in the initial run but  
>> afterwards
>> it becomes much faster. If the hits are relatively small, I do not  
>> see the
>> same behavior.
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Inconsistent Search Speed

Posted by Mark Miller <ma...@gmail.com>.

The Lucene prime directive: dont iterate through all of Hits! Its 
horribly inefficient. You must use a hitcollector. Even still, getting 
your field values will be slow no matter what if you get for every hit. 
You don't want to do this for every hit in a search. But don't loop 
through Hits.

fangz wrote:
> Thank you for the info.  It makes sense. 
>
> My search will return more than 10000 documents and I have to loop through
> all documents to build a list with unique field values. It seems that the
> looping of the hits takes the longest time in the initial run but afterwards
> it becomes much faster. If the hits are relatively small, I do not see the
> same behavior.
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Inconsistent Search Speed

Posted by fangz <fa...@hotmail.com>.

Thank you for the info.  It makes sense. 

My search will return more than 10000 documents and I have to loop through
all documents to build a list with unique field values. It seems that the
looping of the hits takes the longest time in the initial run but afterwards
it becomes much faster. If the hits are relatively small, I do not see the
same behavior.
-- 
View this message in context: http://www.nabble.com/Inconsistent-Search-Speed-tp15698325p15704908.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Inconsistent Search Speed

Posted by Grant Ingersoll <gs...@apache.org>.

Ah, you didn't mention term vectors.  What do you need them for?   
Perhaps a bit more background could help here.

-Grant

On Feb 27, 2008, at 1:31 PM, fangz wrote:

>
> I implemented HitCollector as you suggested. It improved the initial  
> run
> significantly. However it only showed slight improvement in the  
> subsequent
> runs. I don't know how to implement FieldSelector in my situation.  
> My codes
> look like this:
>
> public void collect( int doc, float score ) {
>
>    TermFreqVector vector = null;
>    vector = searcher.getIndexReader().getTermFreqVector(doc, "field");
>    ...
>
> Thank you again!
>
> fangz
> -- 
> View this message in context: http://www.nabble.com/Inconsistent-Search-Speed-tp15698325p15719770.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Inconsistent Search Speed

Posted by fangz <fa...@hotmail.com>.

I implemented HitCollector as you suggested. It improved the initial run
significantly. However it only showed slight improvement in the subsequent
runs. I don't know how to implement FieldSelector in my situation. My codes
look like this:

public void collect( int doc, float score ) {

    TermFreqVector vector = null;
    vector = searcher.getIndexReader().getTermFreqVector(doc, "field");
    ...

Thank you again!

fangz
-- 
View this message in context: http://www.nabble.com/Inconsistent-Search-Speed-tp15698325p15719770.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Inconsistent Search Speed

Posted by h t <bl...@gmail.com>.

Did you use the keywords in two calls?

2008/2/27, fangz <fa...@hotmail.com>:
>
>
> Hi,
>
> I am using a simple java program to test the search speed. The index file
> is
> about 1.93G in size. I initiated an indexsearcher and built a query using
> the query parser: parser.parse("entity:fail"). The initial run took more
> than 60 seconds, but the subsequent runs only took 1.5 seconds. This does
> not change with or without calling indexsearcher.close(). As I know,
> Lucene
> does not cache results (no filter is involved). So what is causing such a
> big speed difference?
>
> Thank you in advance!
>
> fangz
>
> --
> View this message in context:
> http://www.nabble.com/Inconsistent-Search-Speed-tp15698325p15698325.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>