You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2005/10/31 21:03:20 UTC
Re: Indexing
Taking this to java-dev: Since this is such a common issue, would it
be feasible for Lucene to have some sort of capability to be told
what field is the unique one and automatically update (delete, and
add) a document added with a duplicate of a unique field? This
would probably require that Lucene enforce this uniqueness during an
add, though, right?
On 31 Oct 2005, at 14:58, Chris Hostetter wrote:
>
> : I've 4 fields in a document ie. id, URL, modified date,
> contents. id is
> : unique for each document. I wanted to know if I index a document
> with
> : the same id again , will the previous document (in the index) be
> : overwritten or do I have to delete the index for that document
> first and
> : then re index the modified one.
>
> Lucene has no notion of a "unique field" ... you will need to delete
> the old record ... but you don't neccessarily need to delete it first.
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Indexing
Posted by Chris Hostetter <ho...@fucit.org>.
:
: Taking this to java-dev: Since this is such a common issue, would it
: be feasible for Lucene to have some sort of capability to be told
: what field is the unique one and automatically update (delete, and
: add) a document added with a duplicate of a unique field? This
: would probably require that Lucene enforce this uniqueness during an
: add, though, right?
My vote would be to NOT try and do this internally, instead provide a new
interface with a simplified API that wraps an IndexWriter and an
IndexReader and knows about the primary key field. A class like this
could also having batch based API, so it could be more efficient in
processing all of the deletes/adds/and "updates" - which is also a big
issue people seem to have questiosn baout when they want to preserve
uniqueness in their index
"I can't delete with my reader without closing my writer, I can't
add without closing the reader i just used to delete..."
Perhaps adding functionality like this to "IndexModifier" would make
sense?
Or perhaps seperating it out into another abstraction that uses an
IndexModifier for modifications, and maintains a seperate IndexReader it
reuses when doing searches (which is reopened on demand, or periodically
if updates have been made) would be a good way to go.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org