You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2005/10/31 21:03:20 UTC

Re: Indexing

Taking this to java-dev: Since this is such a common issue, would it  
be feasible for Lucene to have some sort of capability to be told  
what field is the unique one and automatically update (delete, and  
add) a document added with a duplicate of a unique field?   This  
would probably require that Lucene enforce this uniqueness during an  
add, though, right?


On 31 Oct 2005, at 14:58, Chris Hostetter wrote:

>
> :  I've 4 fields in a document ie. id, URL, modified date,  
> contents. id is
> : unique for each document. I wanted to know if I index a document  
> with
> : the same id again , will the previous document (in the index) be
> : overwritten or do I have to delete the index for that document  
> first and
> : then re index the modified one.
>
> Lucene has no notion of a "unique field" ... you will need to delete
> the old record ... but you don't neccessarily need to delete it first.
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Indexing

Posted by Chris Hostetter <ho...@fucit.org>.

:
: Taking this to java-dev: Since this is such a common issue, would it
: be feasible for Lucene to have some sort of capability to be told
: what field is the unique one and automatically update (delete, and
: add) a document added with a duplicate of a unique field?   This
: would probably require that Lucene enforce this uniqueness during an
: add, though, right?

My vote would be to NOT try and do this internally, instead provide a new
interface with a simplified API that wraps an IndexWriter and an
IndexReader and knows about the primary key field.  A class like this
could also having batch based API, so it could be more efficient in
processing all of the deletes/adds/and "updates" - which is also a big
issue people seem to have questiosn baout when they want to preserve
uniqueness in their index

	"I can't delete with my reader without closing my writer, I can't
	add without closing the reader i just used to delete..."

Perhaps adding functionality like this to "IndexModifier" would make
sense?

Or perhaps seperating it out into another abstraction that uses an
IndexModifier for modifications, and maintains a seperate IndexReader it
reuses when doing searches (which is reopened on demand, or periodically
if updates have been made) would be a good way to go.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org