You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kelvin Tan <ke...@relevanz.com> on 2005/03/08 12:38:36 UTC

Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

Mark, 

On Tue, 8 Mar 2005 09:56:37 +0000 (GMT), mark harwood wrote:
>> But I suppose for Document
>> has to be further subclassed so that the other
>> non-initialized fields can be obtained as well, or
>>
> I don't think Document would be the right place for
> this - as a design pattern it is cast as a "value
> object" or "transfer object" which is passed to
> (potentially remote) clients. It shouldn't be holding
> connections to an index so it can load extra fields on
> demand.
>

Good point.

> the offsets are not stored and you have to
> serially read through all of the document's fields on
> disk until you reach the one you want.

How about if the structure of the Document is known (as is usually the case), and a contract be made that the first, say 3 fields, are always lazy-loaded. That way, the iteration through all fields can be avoided.

>
> There's a price to pay for allowing clients too much
> freedom and I think lazy loading of field values is an
> example of something which is too costly.
> I personally prefer a search interface which requires
> clients to state up front what fields they want
> returned - the equivalent of SQL's "Select" clause in
> addition to the usual "where.." matching criteria.

Here's my requirement, and you may have a better idea of approaching it: I need to perform a simple "Top 10 most frequent occurring <field>" from a search. Right now, the naive implementation simply goes through all the hits, gets the respective field(s) and performs the counting. The requirement is not so different from sorting. Sorting loads and caches all the values for that term in the index, but that approach is probably not optimal in this case because there are multiple fields, and they are variable length.

Ideally, I'd like to load only the fields I need from the Documents (for counting purposes). When the user actually views that hit, then I can load the rest of the fields for display.

Any ideas?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org