You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Filip Anselm <fi...@nable.dk> on 2005/09/11 00:47:37 UTC

How do I avoid reindexing?

...well the title says it all

I index some documents - all with the same fields... One of the fields,
"id" is unique for the indexed documents. If i try to index a document
with an id, that is already indexed - the old document should be updated
or replaced with the new document, so that I avoid indexed documents
with the same id. How is the best way to do this?

thanks

- Filip


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How do I avoid reindexing?

Posted by Filip Anselm <fi...@nable.dk>.
I haven't had much time to look through the lucene documentation - but
still: How could I miss that! DOH - hehe
Much easier than starting from scratch... ;)

Thanks to all of you!


Erik Hatcher wrote:

>
> On Sep 11, 2005, at 6:03 AM, Filip Anselm wrote:
>
>> I'm completely new to lucene, but could figure out myself that I 
>> need to
>> delete the old doc and add the new... But as I see it, the only way to
>> delete a document is to create an implementation of the abstract
>> IndexReader and use its delete(Term term) method. In my case this 
>> sounds
>> like overkill... - Is there an other way?
>
>
> IndexReader is abstract, but there are some concrete subclasses under 
> the covers.  To get an instance, use one of the .open() static 
> methods on that class.
>
>     Erik
>
>
>>
>>
>> Mikko Noromaa wrote:
>>
>>
>>> Hi,
>>>
>>>
>>>
>>>
>>>> delete document with this id and then add document with the same id.
>>>>
>>>>
>>>>
>>>
>>> Yes, this is clearly the way to go. I implemented a similar 
>>> application
>>> myself. However, earlier I stored my ID as a binary field to save 
>>> space,
>>> because I only ever needed to read the value from a found  document.
>>> With the
>>> update logic added, I had to store the ID as text so that Lucene 
>>> can search
>>> for the document to delete.
>>>
>>> This was annoying because in my application updates are very rare, 
>>> but the
>>> possibility must be there. Storing numbers as text doesn't sound like
>>> something a modern application does.
>>>
>>> So, if it hasn't been asked for before, here comes: It would be 
>>> nice if
>>> Lucene could search by binary fields.
>>>
>>> -- 
>>>
>>> Mikko Noromaa (mikko@noromaa.fi) - tel. +358 40 7348034
>>> Noromaa Solutions - see http://www.nm-sol.com/
>>>
>>>
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: jian chen [mailto:chenjian1227@gmail.com]
>>>> Sent: Sunday, September 11, 2005 2:24 AM
>>>> To: java-user@lucene.apache.org
>>>> Subject: Re: How do I avoid reindexing?
>>>>
>>>>
>>>> delete document with this id and then add document with the same id.
>>>>
>>>> Jian
>>>>
>>>> On 9/10/05, Filip Anselm <fi...@nable.dk> wrote:
>>>>
>>>>
>>>>
>>>>> ...well the title says it all
>>>>>
>>>>> I index some documents - all with the same fields... One of
>>>>>
>>>>>
>>>>>
>>>> the fields,
>>>>
>>>>
>>>>
>>>>> "id" is unique for the indexed documents. If i try to index
>>>>>
>>>>>
>>>>>
>>>> a document
>>>>
>>>>
>>>>
>>>>> with an id, that is already indexed - the old document
>>>>>
>>>>>
>>>>>
>>>> should be updated
>>>>
>>>>
>>>>
>>>>> or replaced with the new document, so that I avoid indexed  documents
>>>>> with the same id. How is the best way to do this?
>>>>>
>>>>> thanks
>>>>>
>>>>> - Filip
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> -------------------------------------------------------------------- -
>>>>
>>>>
>>>>
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How do I avoid reindexing?

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sep 11, 2005, at 6:03 AM, Filip Anselm wrote:

> I'm completely new to lucene, but could figure out myself that I  
> need to
> delete the old doc and add the new... But as I see it, the only way to
> delete a document is to create an implementation of the abstract
> IndexReader and use its delete(Term term) method. In my case this  
> sounds
> like overkill... - Is there an other way?

IndexReader is abstract, but there are some concrete subclasses under  
the covers.  To get an instance, use one of the .open() static  
methods on that class.

     Erik


>
>
> Mikko Noromaa wrote:
>
>
>> Hi,
>>
>>
>>
>>
>>> delete document with this id and then add document with the same id.
>>>
>>>
>>>
>>
>> Yes, this is clearly the way to go. I implemented a similar  
>> application
>> myself. However, earlier I stored my ID as a binary field to save  
>> space,
>> because I only ever needed to read the value from a found  
>> document. With the
>> update logic added, I had to store the ID as text so that Lucene  
>> can search
>> for the document to delete.
>>
>> This was annoying because in my application updates are very rare,  
>> but the
>> possibility must be there. Storing numbers as text doesn't sound like
>> something a modern application does.
>>
>> So, if it hasn't been asked for before, here comes: It would be  
>> nice if
>> Lucene could search by binary fields.
>>
>> --
>>
>> Mikko Noromaa (mikko@noromaa.fi) - tel. +358 40 7348034
>> Noromaa Solutions - see http://www.nm-sol.com/
>>
>>
>>
>>
>>
>>> -----Original Message-----
>>> From: jian chen [mailto:chenjian1227@gmail.com]
>>> Sent: Sunday, September 11, 2005 2:24 AM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: How do I avoid reindexing?
>>>
>>>
>>> delete document with this id and then add document with the same id.
>>>
>>> Jian
>>>
>>> On 9/10/05, Filip Anselm <fi...@nable.dk> wrote:
>>>
>>>
>>>
>>>> ...well the title says it all
>>>>
>>>> I index some documents - all with the same fields... One of
>>>>
>>>>
>>>>
>>> the fields,
>>>
>>>
>>>
>>>> "id" is unique for the indexed documents. If i try to index
>>>>
>>>>
>>>>
>>> a document
>>>
>>>
>>>
>>>> with an id, that is already indexed - the old document
>>>>
>>>>
>>>>
>>> should be updated
>>>
>>>
>>>
>>>> or replaced with the new document, so that I avoid indexed  
>>>> documents
>>>> with the same id. How is the best way to do this?
>>>>
>>>> thanks
>>>>
>>>> - Filip
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>> -------------------------------------------------------------------- 
>>> -
>>>
>>>
>>>
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How do I avoid reindexing?

Posted by Filip Anselm <fi...@nable.dk>.
I'm completely new to lucene, but could figure out myself that I need to
delete the old doc and add the new... But as I see it, the only way to
delete a document is to create an implementation of the abstract
IndexReader and use its delete(Term term) method. In my case this sounds
like overkill... - Is there an other way?


Mikko Noromaa wrote:

>Hi,
>
>  
>
>>delete document with this id and then add document with the same id.
>>    
>>
>
>Yes, this is clearly the way to go. I implemented a similar application
>myself. However, earlier I stored my ID as a binary field to save space,
>because I only ever needed to read the value from a found document. With the
>update logic added, I had to store the ID as text so that Lucene can search
>for the document to delete.
>
>This was annoying because in my application updates are very rare, but the
>possibility must be there. Storing numbers as text doesn't sound like
>something a modern application does.
>
>So, if it hasn't been asked for before, here comes: It would be nice if
>Lucene could search by binary fields.
>
>--
>
>Mikko Noromaa (mikko@noromaa.fi) - tel. +358 40 7348034
>Noromaa Solutions - see http://www.nm-sol.com/
> 
>
>  
>
>>-----Original Message-----
>>From: jian chen [mailto:chenjian1227@gmail.com] 
>>Sent: Sunday, September 11, 2005 2:24 AM
>>To: java-user@lucene.apache.org
>>Subject: Re: How do I avoid reindexing?
>>
>>
>>delete document with this id and then add document with the same id.
>>
>>Jian
>>
>>On 9/10/05, Filip Anselm <fi...@nable.dk> wrote:
>>    
>>
>>>...well the title says it all
>>>
>>>I index some documents - all with the same fields... One of 
>>>      
>>>
>>the fields,
>>    
>>
>>>"id" is unique for the indexed documents. If i try to index 
>>>      
>>>
>>a document
>>    
>>
>>>with an id, that is already indexed - the old document 
>>>      
>>>
>>should be updated
>>    
>>
>>>or replaced with the new document, so that I avoid indexed documents
>>>with the same id. How is the best way to do this?
>>>
>>>thanks
>>>
>>>- Filip
>>>
>>>
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>    
>>
>>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>      
>>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: How do I avoid reindexing?

Posted by Mikko Noromaa <mi...@noromaa.fi>.
Hi,

> delete document with this id and then add document with the same id.

Yes, this is clearly the way to go. I implemented a similar application
myself. However, earlier I stored my ID as a binary field to save space,
because I only ever needed to read the value from a found document. With the
update logic added, I had to store the ID as text so that Lucene can search
for the document to delete.

This was annoying because in my application updates are very rare, but the
possibility must be there. Storing numbers as text doesn't sound like
something a modern application does.

So, if it hasn't been asked for before, here comes: It would be nice if
Lucene could search by binary fields.

--

Mikko Noromaa (mikko@noromaa.fi) - tel. +358 40 7348034
Noromaa Solutions - see http://www.nm-sol.com/
 

> -----Original Message-----
> From: jian chen [mailto:chenjian1227@gmail.com] 
> Sent: Sunday, September 11, 2005 2:24 AM
> To: java-user@lucene.apache.org
> Subject: Re: How do I avoid reindexing?
> 
> 
> delete document with this id and then add document with the same id.
> 
> Jian
> 
> On 9/10/05, Filip Anselm <fi...@nable.dk> wrote:
> > 
> > ...well the title says it all
> > 
> > I index some documents - all with the same fields... One of 
> the fields,
> > "id" is unique for the indexed documents. If i try to index 
> a document
> > with an id, that is already indexed - the old document 
> should be updated
> > or replaced with the new document, so that I avoid indexed documents
> > with the same id. How is the best way to do this?
> > 
> > thanks
> > 
> > - Filip
> > 
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> >
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How do I avoid reindexing?

Posted by jian chen <ch...@gmail.com>.
delete document with this id and then add document with the same id.

Jian

On 9/10/05, Filip Anselm <fi...@nable.dk> wrote:
> 
> ...well the title says it all
> 
> I index some documents - all with the same fields... One of the fields,
> "id" is unique for the indexed documents. If i try to index a document
> with an id, that is already indexed - the old document should be updated
> or replaced with the new document, so that I avoid indexed documents
> with the same id. How is the best way to do this?
> 
> thanks
> 
> - Filip
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>