You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Andreas Guther <An...@markettools.com> on 2007/03/22 14:58:54 UTC

Speeding up looping over Hits

Hi,

While looking into performance enhancement for our search feature I
noticed a significant difference in Documents access time while looping
over Hits.

I wrote a test application search for a list of search terms and then
for each returned Hits object loops twice over every single hits.doc(i).

for (int i = 0; i < numberOfDocs; i++) {doc = hits.doc(i);}

I am seeing differences like the following

Found 16,215 hits for 'Water or Wine' in 219 ms
Processed 16,215 docs in 53,141 ms; per single doc 3.2773 ms
Processed 16,215 docs in 2,032 ms; per single doc 0.1253 ms

Interestingly if I run the same test application a second time in my IDE
the difference between the first and the second loop is very low.

I have no explanation why I see this difference but it becomes a huge
problem for us due to the fact that I need to extract from each document
a small set of information pieces and the first time looping just takes
too much time.

I could not find any indication for an external caching of Hits.  I am
running my tests within Eclipse with a memory setting of -Xms766M
-Xmx1024M.

What is the explanation in the different access speed for the same
search results?

Is there a way to speed up looping over the Hits data structure?

Andreas



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Speeding up looping over Hits

Posted by Erick Erickson <er...@gmail.com>.
Your timing differences are probably because of caching. But this
has been mentioned many times in the archive, that a Hits object
is intended to allow fast, simple retrieval of the first few documents
in a result set (100 if memory serves). Each 100 or so calls to
next() causes the search to be re-issued.

See HitCollector, TopDocs, etc...

Erick

On 3/22/07, Andreas Guther <An...@markettools.com> wrote:
>
> Hi,
>
> While looking into performance enhancement for our search feature I
> noticed a significant difference in Documents access time while looping
> over Hits.
>
> I wrote a test application search for a list of search terms and then
> for each returned Hits object loops twice over every single hits.doc(i).
>
> for (int i = 0; i < numberOfDocs; i++) {doc = hits.doc(i);}
>
> I am seeing differences like the following
>
> Found 16,215 hits for 'Water or Wine' in 219 ms
> Processed 16,215 docs in 53,141 ms; per single doc 3.2773 ms
> Processed 16,215 docs in 2,032 ms; per single doc 0.1253 ms
>
> Interestingly if I run the same test application a second time in my IDE
> the difference between the first and the second loop is very low.
>
> I have no explanation why I see this difference but it becomes a huge
> problem for us due to the fact that I need to extract from each document
> a small set of information pieces and the first time looping just takes
> too much time.
>
> I could not find any indication for an external caching of Hits.  I am
> running my tests within Eclipse with a memory setting of -Xms766M
> -Xmx1024M.
>
> What is the explanation in the different access speed for the same
> search results?
>
> Is there a way to speed up looping over the Hits data structure?
>
> Andreas
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Speeding up looping over Hits

Posted by Erick Erickson <er...@gmail.com>.
Oh yeah.. By only loading the relevant fields, my query times
reduced by over 90%. I actually wrote that up on the mailing list if
you wanted to try to find it, but it took Andreas' message to
remind me...

Erick

On 3/22/07, Santa Clause <no...@yahoo.com> wrote:
>
> Another thing you may want to look at is the newer version 2.1.0and  getFieldable. I think that will lazy load the data, that way you
> are  only reading the parts of the document that you need at that
> moment  rather than the whole thing. Someone please correct me if I am wrong
> or  point to what I really mean :)
>
>   I had a similar situation a long while back and I was able to find
> a  patch for the version of Lucene I was using that allowed the above.
> It  made a huge difference. I think something similar is now built in
> 2.1.0.
>
>
> Andreas Guther <An...@markettools.com> wrote:  Hi,
>
> While looking into performance enhancement for our search feature I
> noticed a significant difference in Documents access time while looping
> over Hits.
>
> I wrote a test application search for a list of search terms and then
> for each returned Hits object loops twice over every single hits.doc(i).
>
> for (int i = 0; i < numberOfDocs; i++) {doc = hits.doc(i);}
>
> I am seeing differences like the following
>
> Found 16,215 hits for 'Water or Wine' in 219 ms
> Processed 16,215 docs in 53,141 ms; per single doc 3.2773 ms
> Processed 16,215 docs in 2,032 ms; per single doc 0.1253 ms
>
> Interestingly if I run the same test application a second time in my IDE
> the difference between the first and the second loop is very low.
>
> I have no explanation why I see this difference but it becomes a huge
> problem for us due to the fact that I need to extract from each document
> a small set of information pieces and the first time looping just takes
> too much time.
>
> I could not find any indication for an external caching of Hits.  I am
> running my tests within Eclipse with a memory setting of -Xms766M
> -Xmx1024M.
>
> What is the explanation in the different access speed for the same
> search results?
>
> Is there a way to speed up looping over the Hits data structure?
>
> Andreas
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
> ---------------------------------
> Now that's room service! Choose from over 150,000 hotels
> in 45,000 destinations on Yahoo! Travel to find your fit.

Re: Speeding up looping over Hits

Posted by Santa Clause <no...@yahoo.com>.
Another thing you may want to look at is the newer version 2.1.0 and  getFieldable. I think that will lazy load the data, that way you are  only reading the parts of the document that you need at that moment  rather than the whole thing. Someone please correct me if I am wrong or  point to what I really mean :) 
  
  I had a similar situation a long while back and I was able to find a  patch for the version of Lucene I was using that allowed the above. It  made a huge difference. I think something similar is now built in 2.1.0.
  

Andreas Guther <An...@markettools.com> wrote:  Hi,

While looking into performance enhancement for our search feature I
noticed a significant difference in Documents access time while looping
over Hits.

I wrote a test application search for a list of search terms and then
for each returned Hits object loops twice over every single hits.doc(i).

for (int i = 0; i < numberOfDocs; i++) {doc = hits.doc(i);}

I am seeing differences like the following

Found 16,215 hits for 'Water or Wine' in 219 ms
Processed 16,215 docs in 53,141 ms; per single doc 3.2773 ms
Processed 16,215 docs in 2,032 ms; per single doc 0.1253 ms

Interestingly if I run the same test application a second time in my IDE
the difference between the first and the second loop is very low.

I have no explanation why I see this difference but it becomes a huge
problem for us due to the fact that I need to extract from each document
a small set of information pieces and the first time looping just takes
too much time.

I could not find any indication for an external caching of Hits.  I am
running my tests within Eclipse with a memory setting of -Xms766M
-Xmx1024M.

What is the explanation in the different access speed for the same
search results?

Is there a way to speed up looping over the Hits data structure?

Andreas



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



 
---------------------------------
Now that's room service! Choose from over 150,000 hotels 
in 45,000 destinations on Yahoo! Travel to find your fit.