You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Clemens Wyss DEV <cl...@mysign.ch> on 2014/06/18 16:08:19 UTC

IndexWriter#updateDocument(Term, Document)

I would like to perform a batch update on an index. In order to omit duplicate entries I am making use of IndexWriter#updateDocument(Term, Document)

open an IndexWriter;
foreach( element in elementsToBeUpdatedWhichHaveDuplicates )
{
doc = element.toDoc();
indexWriter.updateDocument( uniqueTermForElement, doc );
}

Unfortunately  this does not seem to work, whereas

open an IndexWriter;
foreach( element in elementsToBeUpdatedWhichHaveDuplicates )
{
doc = element.toDoc();
indexWriter.updateDocument( uniqueTermForElement, doc );
indexWriter.commit(); // expensive?
}

does. How can I batchupdate without commiting?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


AW: IndexWriter#updateDocument(Term, Document)

Posted by Clemens Wyss DEV <cl...@mysign.ch>.
+1
And there was another issue in my indexing framework. I have the LowercaseFilter in use, so the Term only matched if the value was all lowercased ...
Thx
Clemens

-----Ursprüngliche Nachricht-----
Von: Michael McCandless [mailto:lucene@mikemccandless.com] 
Gesendet: Donnerstag, 19. Juni 2014 18:59
An: Lucene Users
Betreff: Re: IndexWriter#updateDocument(Term, Document)

There is a bug in your test: you cannot use reader.maxDoc().

It's expected this would be 2 when (*) is commented out, because you have 2 docs, one of which is deleted.

Use numDocs instead?

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jun 19, 2014 at 12:54 PM, Clemens Wyss DEV <cl...@mysign.ch> wrote:
> directory = new SimpleFSDirectory( indexLocation ); IndexWriterConfig 
> config = new IndexWriterConfig(Version.LUCENE_47, new 
> WhitespaceAnalyzer( Version.LUCENE_47 )); indexWriter = new 
> IndexWriter( directory, config ); Document doc = new Document(); 
> String value = "hello"; String key = "test"; doc.add( new StringField( 
> key, value, Store.YES ) ); indexWriter.updateDocument( new Term( key, 
> value ), doc ); // (*) indexWriter.commit(); 
> indexWriter.updateDocument( new Term( key, value ), doc ); 
> indexWriter.commit(); indexWriter.close(); reader = 
> DirectoryReader.open( directory ); Assert.assertEquals( 1, 
> reader.maxDoc() );// fails unless (*) is uncommented
>
> -----Ursprüngliche Nachricht-----
> Von: Michael McCandless [mailto:lucene@mikemccandless.com]
> Gesendet: Mittwoch, 18. Juni 2014 16:20
> An: Lucene Users
> Betreff: Re: IndexWriter#updateDocument(Term, Document)
>
> Your first case is supposed to work; if it doesn't it's a bad bug :)
>
> Can you reduce it to a small example?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Jun 18, 2014 at 10:08 AM, Clemens Wyss DEV <cl...@mysign.ch> wrote:
>> I would like to perform a batch update on an index. In order to omit 
>> duplicate entries I am making use of IndexWriter#updateDocument(Term,
>> Document)
>>
>> open an IndexWriter;
>> foreach( element in elementsToBeUpdatedWhichHaveDuplicates ) { doc = 
>> element.toDoc(); indexWriter.updateDocument( uniqueTermForElement, 
>> doc ); }
>>
>> Unfortunately  this does not seem to work, whereas
>>
>> open an IndexWriter;
>> foreach( element in elementsToBeUpdatedWhichHaveDuplicates ) { doc = 
>> element.toDoc(); indexWriter.updateDocument( uniqueTermForElement, 
>> doc ); indexWriter.commit(); // expensive?
>> }
>>
>> does. How can I batchupdate without commiting?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexWriter#updateDocument(Term, Document)

Posted by Michael McCandless <lu...@mikemccandless.com>.
There is a bug in your test: you cannot use reader.maxDoc().

It's expected this would be 2 when (*) is commented out, because you
have 2 docs, one of which is deleted.

Use numDocs instead?

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jun 19, 2014 at 12:54 PM, Clemens Wyss DEV <cl...@mysign.ch> wrote:
> directory = new SimpleFSDirectory( indexLocation );
> IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_47, new WhitespaceAnalyzer( Version.LUCENE_47 ));
> indexWriter = new IndexWriter( directory, config );
> Document doc = new Document();
> String value = "hello";
> String key = "test";
> doc.add( new StringField( key, value, Store.YES ) );
> indexWriter.updateDocument( new Term( key, value ), doc );
> // (*) indexWriter.commit();
> indexWriter.updateDocument( new Term( key, value ), doc );
> indexWriter.commit();
> indexWriter.close();
> reader = DirectoryReader.open( directory );
> Assert.assertEquals( 1, reader.maxDoc() );// fails unless (*) is uncommented
>
> -----Ursprüngliche Nachricht-----
> Von: Michael McCandless [mailto:lucene@mikemccandless.com]
> Gesendet: Mittwoch, 18. Juni 2014 16:20
> An: Lucene Users
> Betreff: Re: IndexWriter#updateDocument(Term, Document)
>
> Your first case is supposed to work; if it doesn't it's a bad bug :)
>
> Can you reduce it to a small example?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Jun 18, 2014 at 10:08 AM, Clemens Wyss DEV <cl...@mysign.ch> wrote:
>> I would like to perform a batch update on an index. In order to omit
>> duplicate entries I am making use of IndexWriter#updateDocument(Term,
>> Document)
>>
>> open an IndexWriter;
>> foreach( element in elementsToBeUpdatedWhichHaveDuplicates ) { doc =
>> element.toDoc(); indexWriter.updateDocument( uniqueTermForElement, doc
>> ); }
>>
>> Unfortunately  this does not seem to work, whereas
>>
>> open an IndexWriter;
>> foreach( element in elementsToBeUpdatedWhichHaveDuplicates ) { doc =
>> element.toDoc(); indexWriter.updateDocument( uniqueTermForElement, doc
>> ); indexWriter.commit(); // expensive?
>> }
>>
>> does. How can I batchupdate without commiting?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


AW: IndexWriter#updateDocument(Term, Document)

Posted by Clemens Wyss DEV <cl...@mysign.ch>.
directory = new SimpleFSDirectory( indexLocation );
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_47, new WhitespaceAnalyzer( Version.LUCENE_47 ));
indexWriter = new IndexWriter( directory, config );
Document doc = new Document();
String value = "hello";
String key = "test";
doc.add( new StringField( key, value, Store.YES ) );
indexWriter.updateDocument( new Term( key, value ), doc );
// (*) indexWriter.commit();
indexWriter.updateDocument( new Term( key, value ), doc );
indexWriter.commit();
indexWriter.close();
reader = DirectoryReader.open( directory );
Assert.assertEquals( 1, reader.maxDoc() );// fails unless (*) is uncommented

-----Ursprüngliche Nachricht-----
Von: Michael McCandless [mailto:lucene@mikemccandless.com] 
Gesendet: Mittwoch, 18. Juni 2014 16:20
An: Lucene Users
Betreff: Re: IndexWriter#updateDocument(Term, Document)

Your first case is supposed to work; if it doesn't it's a bad bug :)

Can you reduce it to a small example?

Mike McCandless

http://blog.mikemccandless.com


On Wed, Jun 18, 2014 at 10:08 AM, Clemens Wyss DEV <cl...@mysign.ch> wrote:
> I would like to perform a batch update on an index. In order to omit 
> duplicate entries I am making use of IndexWriter#updateDocument(Term, 
> Document)
>
> open an IndexWriter;
> foreach( element in elementsToBeUpdatedWhichHaveDuplicates ) { doc = 
> element.toDoc(); indexWriter.updateDocument( uniqueTermForElement, doc 
> ); }
>
> Unfortunately  this does not seem to work, whereas
>
> open an IndexWriter;
> foreach( element in elementsToBeUpdatedWhichHaveDuplicates ) { doc = 
> element.toDoc(); indexWriter.updateDocument( uniqueTermForElement, doc 
> ); indexWriter.commit(); // expensive?
> }
>
> does. How can I batchupdate without commiting?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexWriter#updateDocument(Term, Document)

Posted by Michael McCandless <lu...@mikemccandless.com>.
Your first case is supposed to work; if it doesn't it's a bad bug :)

Can you reduce it to a small example?

Mike McCandless

http://blog.mikemccandless.com


On Wed, Jun 18, 2014 at 10:08 AM, Clemens Wyss DEV <cl...@mysign.ch> wrote:
> I would like to perform a batch update on an index. In order to omit duplicate entries I am making use of IndexWriter#updateDocument(Term, Document)
>
> open an IndexWriter;
> foreach( element in elementsToBeUpdatedWhichHaveDuplicates )
> {
> doc = element.toDoc();
> indexWriter.updateDocument( uniqueTermForElement, doc );
> }
>
> Unfortunately  this does not seem to work, whereas
>
> open an IndexWriter;
> foreach( element in elementsToBeUpdatedWhichHaveDuplicates )
> {
> doc = element.toDoc();
> indexWriter.updateDocument( uniqueTermForElement, doc );
> indexWriter.commit(); // expensive?
> }
>
> does. How can I batchupdate without commiting?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org