You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Walter Ferrara <wa...@ecomware.it> on 2007/05/21 23:31:12 UTC

getTermFreqVector atomicity

I'm interested in getting the term vector of a lucene doc. The point is,
it seems I have to give to the IndexReader.getTermFreqVector a doc ID,
while I would know if there is a way to get the termvector by a doc
identifier (not lucene doc id, but a my own field). I know how to get
the lucene docid for the doc I'm interested, but my concern is about the
non-atomicity of getting a id and pass it to another function.
This because I reload index time by time, and I'm worried about a loss
of consistency if the new indexreader remap docids (after deletion for
example), and I end up accessing a different doc, just because between
"get the id" and "get the termvector for that id", the reader could have
been reloaded (and doc-ids changed).

Best,
Walter



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: getTermFreqVector atomicity

Posted by Erick Erickson <er...@gmail.com>.
An IndexReader doesn't see changes in the index unless you close
and reopen it, but if there is significant time between the time you
fetch your docid and read it's vector, that could be a problem.

You can always use TermEnum/TermDocs to find the doc ID
associated with a particular field you have, but I suspect that
suffers from the same problem. In fact, *anything* you do
between fetching the doc ID and getting its termvector
has this problem, and there's no way I know of to get
termvectors by your own ID.

What might work is a "sanity check" sort of algorithm. That is,
fetch the doc ID, then fetch it's term vector, then look at your
custom field for that doc ID and see if it matches the
original. If not, do it all over again.

But that all seems too complicated to me. Why not just insure
that you use the *same* IndexReader both when you get the
original doc ID and when you get its termverctor? Even a temporary
reference should hold things open long enough to insure that
atomicity.

Best
Erick

On 5/21/07, Walter Ferrara <wa...@ecomware.it> wrote:
>
> I'm interested in getting the term vector of a lucene doc. The point is,
> it seems I have to give to the IndexReader.getTermFreqVector a doc ID,
> while I would know if there is a way to get the termvector by a doc
> identifier (not lucene doc id, but a my own field). I know how to get
> the lucene docid for the doc I'm interested, but my concern is about the
> non-atomicity of getting a id and pass it to another function.
> This because I reload index time by time, and I'm worried about a loss
> of consistency if the new indexreader remap docids (after deletion for
> example), and I end up accessing a different doc, just because between
> "get the id" and "get the termvector for that id", the reader could have
> been reloaded (and doc-ids changed).
>
> Best,
> Walter
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>