You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Cyril Barlow <in...@fantasyfooty.org> on 2005/10/05 21:26:01 UTC

What is a Hits object?

Is it an actual array of full Documents or a list of reference points to
Documents? And what's the typical size in memory of a Hits object with say
1000 avg size docs?


	
	
		
___________________________________________________________ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: What is a Hits object?

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Oct 6, 2005, at 11:28 AM, Cyril Barlow wrote:
> How long does the Hits object stay in memory for?

As long as you keep it around.  Lucene isn't responsible for keeping  
a reference to it, your application is.

> Until the IndexSearcher closes? And if you just use 1 IndexSearcher  
> would there be a big collection
> of Hits in memory?

See above.  You're entirely in charge of the Hits object being kept  
around, or not.

Since you're obviously in a servlet environment, you probably will  
want to implement some sort of paging through Hits.  One option is to  
simply throw Hits away after every request and re-search for every  
page.  That works well enough in every case I've encountered that I  
haven't used the alternative of keeping Hits around across requests.

     Erik




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: What is a Hits object?

Posted by Cyril Barlow <in...@fantasyfooty.org>.
----- Original Message -----
From: "J.J. Larrea" <jj...@panix.com>
To: <ja...@lucene.apache.org>
Sent: Wednesday, October 05, 2005 10:31 PM
Subject: Re: What is a Hits object?


> A Hits object is essentially a cache on query results.  It caches in 2
ways:
>
> 1. When a query returning Hits is requested, only the top 100 document IDs
and scores are requested from the scoring system, and the ID/Score pairs are
stored in a list in the Hits object.  Whenever a document ID, score, or
Document object are requested that lie beyond the end of that list, the
query is reexecuted in order to grow the list to at or beyond the request,
typically 100% beyond it.
>
> 2. Returning a Document object (rather than a score or document ID)
requires reconstituting the Document from the stored fields in the index,
which is an expensive operation.  The Hits object maintains a cache of the
200 most recently requested Document objects, so it is unlikely they will
need to be reconstituted more than once.
>
> This is all optimized around typical hitlist access patterns - navigate
forward and backwards through the results pages a small number of documents
at a time.  For applications which cannot benefit from the Hits caching, for
example which employ their own hit caching layer, one can effectively use
the so-called "low level" IndexSearcher routines which return TopDocs rather
than Hits.
>
> - J.J.
>
> At 8:26 PM +0100 10/5/05, Cyril Barlow wrote:
> >Is it an actual array of full Documents or a list of reference points to
> >Documents? And what's the typical size in memory of a Hits object with
say
> >1000 avg size docs?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

How long does the Hits object stay in memory for? Until the IndexSearcher
closes? And if you just use 1 IndexSearcher would there be a big collection
of Hits in memory?


	
	
		
___________________________________________________________ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: What is a Hits object?

Posted by "J.J. Larrea" <jj...@panix.com>.
A Hits object is essentially a cache on query results.  It caches in 2 ways:

1. When a query returning Hits is requested, only the top 100 document IDs and scores are requested from the scoring system, and the ID/Score pairs are stored in a list in the Hits object.  Whenever a document ID, score, or Document object are requested that lie beyond the end of that list, the query is reexecuted in order to grow the list to at or beyond the request, typically 100% beyond it.

2. Returning a Document object (rather than a score or document ID) requires reconstituting the Document from the stored fields in the index, which is an expensive operation.  The Hits object maintains a cache of the 200 most recently requested Document objects, so it is unlikely they will need to be reconstituted more than once.

This is all optimized around typical hitlist access patterns - navigate forward and backwards through the results pages a small number of documents at a time.  For applications which cannot benefit from the Hits caching, for example which employ their own hit caching layer, one can effectively use the so-called "low level" IndexSearcher routines which return TopDocs rather than Hits.

- J.J.

At 8:26 PM +0100 10/5/05, Cyril Barlow wrote:
>Is it an actual array of full Documents or a list of reference points to
>Documents? And what's the typical size in memory of a Hits object with say
>1000 avg size docs?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org