You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Claes Holmerson <cl...@polopoly.com> on 2003/07/03 14:36:55 UTC

Understanding how indexing works

Hi,

In my job, I have become the new maintainer of a search feature that 
uses Lucene. I am trying to understand how it works by examining the 
index it produces.

When I list index fields by opening an IndexReader, looping over 
documents, then looping over fields with Document.fields(), I see a 
number of fields. However, I don't see the fields that are searchable 
within this document.

If I do IndexReader.terms() directly and then Term.field() on each term, 
I see fields that no longer should exist since the documents containing 
them have been deleted. Is this like it should be?

How can I easily tell if a term is no longer searchable (or rather 
successfully found, when searching), apart from actually doing the 
search? Is there a way to list those terms?

Thanks,
Claes

-- 
Claes Holmerson
Polopoly - Cultivating the information garden
Kungsgatan 88, SE-112 27 Stockholm, SWEDEN
Direct: +46 8 506 782 59
Mobile: +46 704 47 82 59
Fax:  +46 8 506 782 51
claes.holmerson@polopoly.com, http://www.polopoly.com



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Understanding how indexing works

Posted by Ype Kingma <yk...@xs4all.nl>.
Claes,

On Thursday 03 July 2003 05:36, Claes Holmerson wrote:
> Hi,
>
> In my job, I have become the new maintainer of a search feature that
> uses Lucene. I am trying to understand how it works by examining the
> index it produces.
>
> When I list index fields by opening an IndexReader, looping over
> documents, then looping over fields with Document.fields(), I see a
> number of fields. However, I don't see the fields that are searchable
> within this document.

You only see the stored fields that way.

> If I do IndexReader.terms() directly and then Term.field() on each term,
> I see fields that no longer should exist since the documents containing
> them have been deleted. Is this like it should be?

Until you optimize the index, yes.

> How can I easily tell if a term is no longer searchable (or rather
> successfully found, when searching), apart from actually doing the

Optimize before checking the index for the term.

> search? Is there a way to list those terms?

Yes, but it is slow: for each term, check whether
the associated documents are all deleted.

Kind regards,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org