You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Clemens Wyss DEV <cl...@mysign.ch> on 2014/06/18 16:08:19 UTC
IndexWriter#updateDocument(Term, Document)
I would like to perform a batch update on an index. In order to omit duplicate entries I am making use of IndexWriter#updateDocument(Term, Document)
open an IndexWriter;
foreach( element in elementsToBeUpdatedWhichHaveDuplicates )
{
doc = element.toDoc();
indexWriter.updateDocument( uniqueTermForElement, doc );
}
Unfortunately this does not seem to work, whereas
open an IndexWriter;
foreach( element in elementsToBeUpdatedWhichHaveDuplicates )
{
doc = element.toDoc();
indexWriter.updateDocument( uniqueTermForElement, doc );
indexWriter.commit(); // expensive?
}
does. How can I batchupdate without commiting?
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
AW: IndexWriter#updateDocument(Term, Document)
Posted by Clemens Wyss DEV <cl...@mysign.ch>.
+1
And there was another issue in my indexing framework. I have the LowercaseFilter in use, so the Term only matched if the value was all lowercased ...
Thx
Clemens
-----Ursprüngliche Nachricht-----
Von: Michael McCandless [mailto:lucene@mikemccandless.com]
Gesendet: Donnerstag, 19. Juni 2014 18:59
An: Lucene Users
Betreff: Re: IndexWriter#updateDocument(Term, Document)
There is a bug in your test: you cannot use reader.maxDoc().
It's expected this would be 2 when (*) is commented out, because you have 2 docs, one of which is deleted.
Use numDocs instead?
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jun 19, 2014 at 12:54 PM, Clemens Wyss DEV <cl...@mysign.ch> wrote:
> directory = new SimpleFSDirectory( indexLocation ); IndexWriterConfig
> config = new IndexWriterConfig(Version.LUCENE_47, new
> WhitespaceAnalyzer( Version.LUCENE_47 )); indexWriter = new
> IndexWriter( directory, config ); Document doc = new Document();
> String value = "hello"; String key = "test"; doc.add( new StringField(
> key, value, Store.YES ) ); indexWriter.updateDocument( new Term( key,
> value ), doc ); // (*) indexWriter.commit();
> indexWriter.updateDocument( new Term( key, value ), doc );
> indexWriter.commit(); indexWriter.close(); reader =
> DirectoryReader.open( directory ); Assert.assertEquals( 1,
> reader.maxDoc() );// fails unless (*) is uncommented
>
> -----Ursprüngliche Nachricht-----
> Von: Michael McCandless [mailto:lucene@mikemccandless.com]
> Gesendet: Mittwoch, 18. Juni 2014 16:20
> An: Lucene Users
> Betreff: Re: IndexWriter#updateDocument(Term, Document)
>
> Your first case is supposed to work; if it doesn't it's a bad bug :)
>
> Can you reduce it to a small example?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Jun 18, 2014 at 10:08 AM, Clemens Wyss DEV <cl...@mysign.ch> wrote:
>> I would like to perform a batch update on an index. In order to omit
>> duplicate entries I am making use of IndexWriter#updateDocument(Term,
>> Document)
>>
>> open an IndexWriter;
>> foreach( element in elementsToBeUpdatedWhichHaveDuplicates ) { doc =
>> element.toDoc(); indexWriter.updateDocument( uniqueTermForElement,
>> doc ); }
>>
>> Unfortunately this does not seem to work, whereas
>>
>> open an IndexWriter;
>> foreach( element in elementsToBeUpdatedWhichHaveDuplicates ) { doc =
>> element.toDoc(); indexWriter.updateDocument( uniqueTermForElement,
>> doc ); indexWriter.commit(); // expensive?
>> }
>>
>> does. How can I batchupdate without commiting?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: IndexWriter#updateDocument(Term, Document)
Posted by Michael McCandless <lu...@mikemccandless.com>.
There is a bug in your test: you cannot use reader.maxDoc().
It's expected this would be 2 when (*) is commented out, because you
have 2 docs, one of which is deleted.
Use numDocs instead?
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jun 19, 2014 at 12:54 PM, Clemens Wyss DEV <cl...@mysign.ch> wrote:
> directory = new SimpleFSDirectory( indexLocation );
> IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_47, new WhitespaceAnalyzer( Version.LUCENE_47 ));
> indexWriter = new IndexWriter( directory, config );
> Document doc = new Document();
> String value = "hello";
> String key = "test";
> doc.add( new StringField( key, value, Store.YES ) );
> indexWriter.updateDocument( new Term( key, value ), doc );
> // (*) indexWriter.commit();
> indexWriter.updateDocument( new Term( key, value ), doc );
> indexWriter.commit();
> indexWriter.close();
> reader = DirectoryReader.open( directory );
> Assert.assertEquals( 1, reader.maxDoc() );// fails unless (*) is uncommented
>
> -----Ursprüngliche Nachricht-----
> Von: Michael McCandless [mailto:lucene@mikemccandless.com]
> Gesendet: Mittwoch, 18. Juni 2014 16:20
> An: Lucene Users
> Betreff: Re: IndexWriter#updateDocument(Term, Document)
>
> Your first case is supposed to work; if it doesn't it's a bad bug :)
>
> Can you reduce it to a small example?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Jun 18, 2014 at 10:08 AM, Clemens Wyss DEV <cl...@mysign.ch> wrote:
>> I would like to perform a batch update on an index. In order to omit
>> duplicate entries I am making use of IndexWriter#updateDocument(Term,
>> Document)
>>
>> open an IndexWriter;
>> foreach( element in elementsToBeUpdatedWhichHaveDuplicates ) { doc =
>> element.toDoc(); indexWriter.updateDocument( uniqueTermForElement, doc
>> ); }
>>
>> Unfortunately this does not seem to work, whereas
>>
>> open an IndexWriter;
>> foreach( element in elementsToBeUpdatedWhichHaveDuplicates ) { doc =
>> element.toDoc(); indexWriter.updateDocument( uniqueTermForElement, doc
>> ); indexWriter.commit(); // expensive?
>> }
>>
>> does. How can I batchupdate without commiting?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
AW: IndexWriter#updateDocument(Term, Document)
Posted by Clemens Wyss DEV <cl...@mysign.ch>.
directory = new SimpleFSDirectory( indexLocation );
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_47, new WhitespaceAnalyzer( Version.LUCENE_47 ));
indexWriter = new IndexWriter( directory, config );
Document doc = new Document();
String value = "hello";
String key = "test";
doc.add( new StringField( key, value, Store.YES ) );
indexWriter.updateDocument( new Term( key, value ), doc );
// (*) indexWriter.commit();
indexWriter.updateDocument( new Term( key, value ), doc );
indexWriter.commit();
indexWriter.close();
reader = DirectoryReader.open( directory );
Assert.assertEquals( 1, reader.maxDoc() );// fails unless (*) is uncommented
-----Ursprüngliche Nachricht-----
Von: Michael McCandless [mailto:lucene@mikemccandless.com]
Gesendet: Mittwoch, 18. Juni 2014 16:20
An: Lucene Users
Betreff: Re: IndexWriter#updateDocument(Term, Document)
Your first case is supposed to work; if it doesn't it's a bad bug :)
Can you reduce it to a small example?
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jun 18, 2014 at 10:08 AM, Clemens Wyss DEV <cl...@mysign.ch> wrote:
> I would like to perform a batch update on an index. In order to omit
> duplicate entries I am making use of IndexWriter#updateDocument(Term,
> Document)
>
> open an IndexWriter;
> foreach( element in elementsToBeUpdatedWhichHaveDuplicates ) { doc =
> element.toDoc(); indexWriter.updateDocument( uniqueTermForElement, doc
> ); }
>
> Unfortunately this does not seem to work, whereas
>
> open an IndexWriter;
> foreach( element in elementsToBeUpdatedWhichHaveDuplicates ) { doc =
> element.toDoc(); indexWriter.updateDocument( uniqueTermForElement, doc
> ); indexWriter.commit(); // expensive?
> }
>
> does. How can I batchupdate without commiting?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: IndexWriter#updateDocument(Term, Document)
Posted by Michael McCandless <lu...@mikemccandless.com>.
Your first case is supposed to work; if it doesn't it's a bad bug :)
Can you reduce it to a small example?
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jun 18, 2014 at 10:08 AM, Clemens Wyss DEV <cl...@mysign.ch> wrote:
> I would like to perform a batch update on an index. In order to omit duplicate entries I am making use of IndexWriter#updateDocument(Term, Document)
>
> open an IndexWriter;
> foreach( element in elementsToBeUpdatedWhichHaveDuplicates )
> {
> doc = element.toDoc();
> indexWriter.updateDocument( uniqueTermForElement, doc );
> }
>
> Unfortunately this does not seem to work, whereas
>
> open an IndexWriter;
> foreach( element in elementsToBeUpdatedWhichHaveDuplicates )
> {
> doc = element.toDoc();
> indexWriter.updateDocument( uniqueTermForElement, doc );
> indexWriter.commit(); // expensive?
> }
>
> does. How can I batchupdate without commiting?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org