You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by no...@kaigrabfelder.de on 2014/02/24 11:33:13 UTC

updateDocument (somtimes) no longer deleting documents after Update to 4.6

Hi there,

we recently updated our application from lucene 3.0 to 3.6 with the 
effect that (albeit using the SearchManager functionality as described 
on 
http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html) 
calls to searcherManager.maybeRefresh() were incredibly slow. e.g. 
taking about 30 seconds after adding one document to the index with an 
index of about 9000 documents. I assumed that we did something wrong 
with the configuration as 30 seconds could not be meant with NRT ;-)

Thus we migrated to the latest 4.6 version and indexing speed was 
indeed very good now (with the searcherManager.maybeRefreshBlocking() 
call only taking milliseconds to complete). But after some wore testing 
we discovered that somehow the indexWriter.updateDocument( term, 
documentToIndex ) functionality wasn't working anymore as expected - at 
least somtetimes. It looks like either the updateDocument method does 
not longer reliably delete the old document before adding a new one - 
with the result that older documents are beeing returned by searches 
breaking our application.

Unfortunately I'm not able to reproduce the issues in a simple unit 
test but maybe somebody of the lucene experts knows what we are doing 
wrong here. Not sure if it is of any relevance but we are running on 
Windows with a 64 bit JDK 7 thus MMapDirectory is beeing used.

Our Index Writer is configured like this:

	IndexWriterConfig conf = new IndexWriterConfig( Version.LUCENE_46, new 
LimitTokenCountAnalyzer( new DefaultAnalyzer(), Integer.MAX_VALUE ) );


	conf.setOpenMode( OpenMode.APPEND );

	IndexWriter indexWriter = new IndexWriter( FSDirectory.open( new File( 
directoryPath )), conf );

SearcherManager is configured like this:

	searcherManager = new SearcherManager(indexWriter, true, null);

// The anlyzer that we are using looks like this:

	public class DefaultAnalyzer extends Analyzer
	{
	   @Override
	   protected TokenStreamComponents createComponents(final String 
fieldName,
		   final Reader reader) {
		 return new TokenStreamComponents(new 
WhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));
	   }
	}

The update of the index looks like this:

	// instead of 42 the unique business identifier is used
	Long myUniqueBusinessId = 42l;
	BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);
	NumericUtils.longToPrefixCoded( myUniqueBusinessId.longValue(), 0, ref 
);
	Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );

	// this method may be called multiple times with the same term and 
luceneDocumentToIndex parameter
	indexWriter.updateDocument( term, luceneDocumentToIndex);

	// After performing a couple of updates we execute
	searcherManager.maybeRefreshBlocking();


// For searching we are using the following code
	searcher = searcherManager.acquire();
	// luceneQuery is the query, filter is some sort of filtering that we 
apply, luceneSort is some sorting query
	TopDocs topDocs = searcher.search( luceneQuery, filter, 1000, 
luceneSort );

// If we perform a query for MY_UNIQUE_BUSINESS_ID it will return 
multiple results instead of just one - this was neither the case with 
lucene 3.0 nor 3.6


In order to fix the issue I tried couple of things but to now avail. It 
still happens (not all the time though) that the lucene returns two 
documents when querying for MY_UNIQUE_BUSINESS_ID instead of just one
- 	setting setMaxBufferedDeleteTerms to 1 in the config
	conf.setMaxBufferedDeleteTerms( 1 );
- explicetly deleting instead of just updating
	indexWriter.deleteDocuments( term );
- ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in the index 
and not just analysed
- trying to delete the document via indexWriter.tryDeleteDocument()
- calling indexWriter.maybeMerge() after the update
- calling indexWriter.commit() after the update


Sorry for the lenghty post but I wanted to include as much information 
as possible. Let me know if something is missing...

Thanks for helping in advance ;-)

Kai

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: updateDocument (somtimes) no longer deleting documents after Update to 4.6

Posted by Kai Grabfelder <no...@kaigrabfelder.de>.

Hi Uwe,

thank you very much! That indeed was the issue and did the trick!

Best Regards

Kai


--- Original Nachricht ---
Absender: Uwe Schindler
Datum: 24.02.14 20:42
> Hi,
> 
> it looks like your filters are implemented in a wrong way:
> 
> - First, in Lucene 3 and 4, filters are applied by segment. Means, they have to calculate the DocIdSet of matched documents for each index segment separately. On updating, the document is "deleted" (hidden) on the old segment and re-added to a new index segment. This is why you see it two times in the filter.
> - Second, in Lucene 4, Filters now get (Bits acceptDocs) in their getDocIdSet method. This is new, before the deleted documents were applied *after* the filters, now together with the filters. If acceptDocs is non-null, these are "hidden" deleted documents. If you filter does not applies those accept docs correctly to the returned DocIdSet, the deleted document suddenly reappear. In Lucene 4, all deleted documents is just an addition filter applied while searching: A filter that marks still accessible documents and hides all deleted documents. If your new filter does not chain in this additional filter, the deletions are ignored. A quick fix is to use "return BitsFilteredDocIdSet.wrap(yourFilterBitSet, acceptDocs)" instead of "return yourFilterBitSet".
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
>> -----Original Message-----
>> From: nospam@kaigrabfelder.de [mailto:nospam@kaigrabfelder.de]
>> Sent: Monday, February 24, 2014 7:14 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: updateDocument (somtimes) no longer deleting documents
>> after Update to 4.6
>> 
>> Hm it looks like this is somehow caused by the filters we are using for
>> searching.
>> 
>> I took one of the MY_UNIQUE_BUSINESS_ID ids, used in our applications
>> search functionality and debuged the lucene search a little more. If I specify
>> null for the filters I only get one result (which is correct).
>> If I add the two filters that we usually use in our application I notice that the
>> filters are triggered twice - for two different segments - and the result is
>> contained in both segments. Looks like the first segment contains all
>> documents in the index with the second segment containing only one - the
>> document that should have been deleted upfront.
>> 
>> This can be reproduced even after restarting the application and even after
>> indexWriter.commit is triggered
>> 
>> Could this be a bug? Or is this the desired behaviour?
>> 
>> Best Regards
>> 
>> Kai
>> 
>> 
>> Am 2014-02-24 13:54, schrieb nospam@kaigrabfelder.de:
>> > I'll see if I can dig a little bit deeper into the 3.6 behavior, for
>> > now I'm trying to get it running on 4.6 (as the index file is also a
>> > lot smaller - on 3.6 it was about 2 GB for about 9000 documents, with
>> > 4.6 it's only about 200 MB).
>> >
>> > And yes the business ID is indexed - otherwhise I wouldn't be able to
>> > find it at all - The problem is not that I can't find it but I find it
>> > twice. And to make matters worse not consistently all the bime but
>> > only sometimes. Somehow it looks like the delete (before the update)
>> > does sometimes work and sometimes not. Do you know any chances why
>> > this could happen? Maybe something related to the MergePoliy (which we
>> > don't set e.g. we are using the default)
>> >
>> > Best Regards
>> >
>> > Kai
>> >
>> >
>> > Am 2014-02-24 12:10, schrieb Michael McCandless:
>> >> The 30 second turnaround time in 3.6.x is absurd; if you turn on
>> >> IndexWriter's infoStream maybe it'd give a clue.  Or, capture a few
>> >> stack traces and post them.
>> >>
>> >> How are you creating the luceneDocumentToIndex?  You must ensure
>> that
>> >> the business ID is in fact indexed as a field in the document,
>> >> otherwise the update won't find it.
>> >>
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >>
>> >> On Mon, Feb 24, 2014 at 5:33 AM,  <no...@kaigrabfelder.de> wrote:
>> >>> Hi there,
>> >>>
>> >>> we recently updated our application from lucene 3.0 to 3.6 with the
>> >>> effect that (albeit using the SearchManager functionality as
>> >>> described on
>> >>>
>> >>> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-
>> simpl
>> >>> ifies.html) calls to searcherManager.maybeRefresh() were incredibly
>> >>> slow. e.g.
>> >>> taking
>> >>> about 30 seconds after adding one document to the index with an
>> >>> index of about 9000 documents. I assumed that we did something wrong
>> >>> with the configuration as 30 seconds could not be meant with NRT ;-)
>> >>>
>> >>> Thus we migrated to the latest 4.6 version and indexing speed was
>> >>> indeed very good now (with the
>> >>> searcherManager.maybeRefreshBlocking() call only taking milliseconds
>> >>> to complete). But after some wore testing we discovered that somehow
>> >>> the indexWriter.updateDocument( term, documentToIndex
>> >>> )
>> >>> functionality wasn't working anymore as expected - at least
>> >>> somtetimes. It looks like either the updateDocument method does not
>> >>> longer reliably delete the old document before adding a new one -
>> >>> with the result that older documents are beeing returned by searches
>> >>> breaking our application.
>> >>>
>> >>> Unfortunately I'm not able to reproduce the issues in a simple unit
>> >>> test but maybe somebody of the lucene experts knows what we are
>> >>> doing wrong here. Not sure if it is of any relevance but we are
>> >>> running on Windows with a
>> >>> 64 bit
>> >>> JDK 7 thus MMapDirectory is beeing used.
>> >>>
>> >>> Our Index Writer is configured like this:
>> >>>
>> >>>         IndexWriterConfig conf = new IndexWriterConfig(
>> >>> Version.LUCENE_46, new LimitTokenCountAnalyzer( new
>> >>> DefaultAnalyzer(), Integer.MAX_VALUE ) );
>> >>>
>> >>>
>> >>>         conf.setOpenMode( OpenMode.APPEND );
>> >>>
>> >>>         IndexWriter indexWriter = new IndexWriter(
>> >>> FSDirectory.open( new
>> >>> File( directoryPath )), conf );
>> >>>
>> >>> SearcherManager is configured like this:
>> >>>
>> >>>         searcherManager = new SearcherManager(indexWriter, true,
>> >>> null);
>> >>>
>> >>> // The anlyzer that we are using looks like this:
>> >>>
>> >>>         public class DefaultAnalyzer extends Analyzer
>> >>>         {
>> >>>            @Override
>> >>>            protected TokenStreamComponents createComponents(final
>> >>> String
>> >>> fieldName,
>> >>>                    final Reader reader) {
>> >>>                  return new TokenStreamComponents(new
>> >>> WhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));
>> >>>            }
>> >>>         }
>> >>>
>> >>> The update of the index looks like this:
>> >>>
>> >>>         // instead of 42 the unique business identifier is used
>> >>>         Long myUniqueBusinessId = 42l;
>> >>>         BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);
>> >>>         NumericUtils.longToPrefixCoded(
>> >>> myUniqueBusinessId.longValue(), 0,
>> >>> ref );
>> >>>         Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );
>> >>>
>> >>>         // this method may be called multiple times with the same
>> >>> term and
>> >>> luceneDocumentToIndex parameter
>> >>>         indexWriter.updateDocument( term, luceneDocumentToIndex);
>> >>>
>> >>>         // After performing a couple of updates we execute
>> >>>         searcherManager.maybeRefreshBlocking();
>> >>>
>> >>>
>> >>> // For searching we are using the following code
>> >>>         searcher = searcherManager.acquire();
>> >>>         // luceneQuery is the query, filter is some sort of
>> >>> filtering that
>> >>> we apply, luceneSort is some sorting query
>> >>>         TopDocs topDocs = searcher.search( luceneQuery, filter,
>> >>> 1000,
>> >>> luceneSort );
>> >>>
>> >>> // If we perform a query for MY_UNIQUE_BUSINESS_ID it will return
>> >>> multiple
>> >>> results instead of just one - this was neither the case with lucene
>> >>> 3.0 nor
>> >>> 3.6
>> >>>
>> >>>
>> >>> In order to fix the issue I tried couple of things but to now
>> >>> avail. It
>> >>> still happens (not all the time though) that the lucene returns two
>> >>> documents when querying for MY_UNIQUE_BUSINESS_ID instead of
>> just
>> >>> one
>> >>> -       setting setMaxBufferedDeleteTerms to 1 in the config
>> >>>         conf.setMaxBufferedDeleteTerms( 1 );
>> >>> - explicetly deleting instead of just updating
>> >>>         indexWriter.deleteDocuments( term );
>> >>> - ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in the
>> >>> index and
>> >>> not just analysed
>> >>> - trying to delete the document via indexWriter.tryDeleteDocument()
>> >>> - calling indexWriter.maybeMerge() after the update
>> >>> - calling indexWriter.commit() after the update
>> >>>
>> >>>
>> >>> Sorry for the lenghty post but I wanted to include as much
>> >>> information as
>> >>> possible. Let me know if something is missing...
>> >>>
>> >>> Thanks for helping in advance ;-)
>> >>>
>> >>> Kai
>> >>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>>
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: updateDocument (somtimes) no longer deleting documents after Update to 4.6

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi,

it looks like your filters are implemented in a wrong way:

- First, in Lucene 3 and 4, filters are applied by segment. Means, they have to calculate the DocIdSet of matched documents for each index segment separately. On updating, the document is "deleted" (hidden) on the old segment and re-added to a new index segment. This is why you see it two times in the filter.
- Second, in Lucene 4, Filters now get (Bits acceptDocs) in their getDocIdSet method. This is new, before the deleted documents were applied *after* the filters, now together with the filters. If acceptDocs is non-null, these are "hidden" deleted documents. If you filter does not applies those accept docs correctly to the returned DocIdSet, the deleted document suddenly reappear. In Lucene 4, all deleted documents is just an addition filter applied while searching: A filter that marks still accessible documents and hides all deleted documents. If your new filter does not chain in this additional filter, the deletions are ignored. A quick fix is to use "return BitsFilteredDocIdSet.wrap(yourFilterBitSet, acceptDocs)" instead of "return yourFilterBitSet".

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: nospam@kaigrabfelder.de [mailto:nospam@kaigrabfelder.de]
> Sent: Monday, February 24, 2014 7:14 PM
> To: java-user@lucene.apache.org
> Subject: Re: updateDocument (somtimes) no longer deleting documents
> after Update to 4.6
> 
> Hm it looks like this is somehow caused by the filters we are using for
> searching.
> 
> I took one of the MY_UNIQUE_BUSINESS_ID ids, used in our applications
> search functionality and debuged the lucene search a little more. If I specify
> null for the filters I only get one result (which is correct).
> If I add the two filters that we usually use in our application I notice that the
> filters are triggered twice - for two different segments - and the result is
> contained in both segments. Looks like the first segment contains all
> documents in the index with the second segment containing only one - the
> document that should have been deleted upfront.
> 
> This can be reproduced even after restarting the application and even after
> indexWriter.commit is triggered
> 
> Could this be a bug? Or is this the desired behaviour?
> 
> Best Regards
> 
> Kai
> 
> 
> Am 2014-02-24 13:54, schrieb nospam@kaigrabfelder.de:
> > I'll see if I can dig a little bit deeper into the 3.6 behavior, for
> > now I'm trying to get it running on 4.6 (as the index file is also a
> > lot smaller - on 3.6 it was about 2 GB for about 9000 documents, with
> > 4.6 it's only about 200 MB).
> >
> > And yes the business ID is indexed - otherwhise I wouldn't be able to
> > find it at all - The problem is not that I can't find it but I find it
> > twice. And to make matters worse not consistently all the bime but
> > only sometimes. Somehow it looks like the delete (before the update)
> > does sometimes work and sometimes not. Do you know any chances why
> > this could happen? Maybe something related to the MergePoliy (which we
> > don't set e.g. we are using the default)
> >
> > Best Regards
> >
> > Kai
> >
> >
> > Am 2014-02-24 12:10, schrieb Michael McCandless:
> >> The 30 second turnaround time in 3.6.x is absurd; if you turn on
> >> IndexWriter's infoStream maybe it'd give a clue.  Or, capture a few
> >> stack traces and post them.
> >>
> >> How are you creating the luceneDocumentToIndex?  You must ensure
> that
> >> the business ID is in fact indexed as a field in the document,
> >> otherwise the update won't find it.
> >>
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Mon, Feb 24, 2014 at 5:33 AM,  <no...@kaigrabfelder.de> wrote:
> >>> Hi there,
> >>>
> >>> we recently updated our application from lucene 3.0 to 3.6 with the
> >>> effect that (albeit using the SearchManager functionality as
> >>> described on
> >>>
> >>> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-
> simpl
> >>> ifies.html) calls to searcherManager.maybeRefresh() were incredibly
> >>> slow. e.g.
> >>> taking
> >>> about 30 seconds after adding one document to the index with an
> >>> index of about 9000 documents. I assumed that we did something wrong
> >>> with the configuration as 30 seconds could not be meant with NRT ;-)
> >>>
> >>> Thus we migrated to the latest 4.6 version and indexing speed was
> >>> indeed very good now (with the
> >>> searcherManager.maybeRefreshBlocking() call only taking milliseconds
> >>> to complete). But after some wore testing we discovered that somehow
> >>> the indexWriter.updateDocument( term, documentToIndex
> >>> )
> >>> functionality wasn't working anymore as expected - at least
> >>> somtetimes. It looks like either the updateDocument method does not
> >>> longer reliably delete the old document before adding a new one -
> >>> with the result that older documents are beeing returned by searches
> >>> breaking our application.
> >>>
> >>> Unfortunately I'm not able to reproduce the issues in a simple unit
> >>> test but maybe somebody of the lucene experts knows what we are
> >>> doing wrong here. Not sure if it is of any relevance but we are
> >>> running on Windows with a
> >>> 64 bit
> >>> JDK 7 thus MMapDirectory is beeing used.
> >>>
> >>> Our Index Writer is configured like this:
> >>>
> >>>         IndexWriterConfig conf = new IndexWriterConfig(
> >>> Version.LUCENE_46, new LimitTokenCountAnalyzer( new
> >>> DefaultAnalyzer(), Integer.MAX_VALUE ) );
> >>>
> >>>
> >>>         conf.setOpenMode( OpenMode.APPEND );
> >>>
> >>>         IndexWriter indexWriter = new IndexWriter(
> >>> FSDirectory.open( new
> >>> File( directoryPath )), conf );
> >>>
> >>> SearcherManager is configured like this:
> >>>
> >>>         searcherManager = new SearcherManager(indexWriter, true,
> >>> null);
> >>>
> >>> // The anlyzer that we are using looks like this:
> >>>
> >>>         public class DefaultAnalyzer extends Analyzer
> >>>         {
> >>>            @Override
> >>>            protected TokenStreamComponents createComponents(final
> >>> String
> >>> fieldName,
> >>>                    final Reader reader) {
> >>>                  return new TokenStreamComponents(new
> >>> WhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));
> >>>            }
> >>>         }
> >>>
> >>> The update of the index looks like this:
> >>>
> >>>         // instead of 42 the unique business identifier is used
> >>>         Long myUniqueBusinessId = 42l;
> >>>         BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);
> >>>         NumericUtils.longToPrefixCoded(
> >>> myUniqueBusinessId.longValue(), 0,
> >>> ref );
> >>>         Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );
> >>>
> >>>         // this method may be called multiple times with the same
> >>> term and
> >>> luceneDocumentToIndex parameter
> >>>         indexWriter.updateDocument( term, luceneDocumentToIndex);
> >>>
> >>>         // After performing a couple of updates we execute
> >>>         searcherManager.maybeRefreshBlocking();
> >>>
> >>>
> >>> // For searching we are using the following code
> >>>         searcher = searcherManager.acquire();
> >>>         // luceneQuery is the query, filter is some sort of
> >>> filtering that
> >>> we apply, luceneSort is some sorting query
> >>>         TopDocs topDocs = searcher.search( luceneQuery, filter,
> >>> 1000,
> >>> luceneSort );
> >>>
> >>> // If we perform a query for MY_UNIQUE_BUSINESS_ID it will return
> >>> multiple
> >>> results instead of just one - this was neither the case with lucene
> >>> 3.0 nor
> >>> 3.6
> >>>
> >>>
> >>> In order to fix the issue I tried couple of things but to now
> >>> avail. It
> >>> still happens (not all the time though) that the lucene returns two
> >>> documents when querying for MY_UNIQUE_BUSINESS_ID instead of
> just
> >>> one
> >>> -       setting setMaxBufferedDeleteTerms to 1 in the config
> >>>         conf.setMaxBufferedDeleteTerms( 1 );
> >>> - explicetly deleting instead of just updating
> >>>         indexWriter.deleteDocuments( term );
> >>> - ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in the
> >>> index and
> >>> not just analysed
> >>> - trying to delete the document via indexWriter.tryDeleteDocument()
> >>> - calling indexWriter.maybeMerge() after the update
> >>> - calling indexWriter.commit() after the update
> >>>
> >>>
> >>> Sorry for the lenghty post but I wanted to include as much
> >>> information as
> >>> possible. Let me know if something is missing...
> >>>
> >>> Thanks for helping in advance ;-)
> >>>
> >>> Kai
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: updateDocument (somtimes) no longer deleting documents after Update to 4.6

Posted by Erick Erickson <er...@gmail.com>.

I suspect you're finding the old doc that is simply marked
as deleted. Did you check for that?

One quick way to see if this is even in the right ballpark would be
to do a forceMerge. If the problem disappears, then this is
relevant I'd guess.

Warning: The operative word here is "guess", I haven't been
working in this layer for a long time...

Best,
Erick


On Mon, Feb 24, 2014 at 10:14 AM, <no...@kaigrabfelder.de> wrote:

> Hm it looks like this is somehow caused by the filters we are using for
> searching.
>
> I took one of the MY_UNIQUE_BUSINESS_ID ids, used in our applications
> search functionality and debuged the lucene search a little more. If I
> specify null for the filters I only get one result (which is correct). If I
> add the two filters that we usually use in our application I notice that
> the filters are triggered twice - for two different segments - and the
> result is contained in both segments. Looks like the first segment contains
> all documents in the index with the second segment containing only one -
> the document that should have been deleted upfront.
>
> This can be reproduced even after restarting the application and even
> after indexWriter.commit is triggered
>
> Could this be a bug? Or is this the desired behaviour?
>
> Best Regards
>
> Kai
>
>
> Am 2014-02-24 13:54, schrieb nospam@kaigrabfelder.de:
>
>  I'll see if I can dig a little bit deeper into the 3.6 behavior, for
>> now I'm trying to get it running on 4.6 (as the index file is also a
>> lot smaller - on 3.6 it was about 2 GB for about 9000 documents, with
>> 4.6 it's only about 200 MB).
>>
>> And yes the business ID is indexed - otherwhise I wouldn't be able to
>> find it at all - The problem is not that I can't find it but I find it
>> twice. And to make matters worse not consistently all the bime but
>> only sometimes. Somehow it looks like the delete (before the update)
>> does sometimes work and sometimes not. Do you know any chances why
>> this could happen? Maybe something related to the MergePoliy (which we
>> don't set e.g. we are using the default)
>>
>> Best Regards
>>
>> Kai
>>
>>
>> Am 2014-02-24 12:10, schrieb Michael McCandless:
>>
>>> The 30 second turnaround time in 3.6.x is absurd; if you turn on
>>> IndexWriter's infoStream maybe it'd give a clue.  Or, capture a few
>>> stack traces and post them.
>>>
>>> How are you creating the luceneDocumentToIndex?  You must ensure that
>>> the business ID is in fact indexed as a field in the document,
>>> otherwise the update won't find it.
>>>
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Mon, Feb 24, 2014 at 5:33 AM,  <no...@kaigrabfelder.de> wrote:
>>>
>>>> Hi there,
>>>>
>>>> we recently updated our application from lucene 3.0 to 3.6 with the
>>>> effect
>>>> that (albeit using the SearchManager functionality as described on
>>>>
>>>> http://blog.mikemccandless.com/2011/09/lucenes-
>>>> searchermanager-simplifies.html)
>>>> calls to searcherManager.maybeRefresh() were incredibly slow. e.g.
>>>> taking
>>>> about 30 seconds after adding one document to the index with an index of
>>>> about 9000 documents. I assumed that we did something wrong with the
>>>> configuration as 30 seconds could not be meant with NRT ;-)
>>>>
>>>> Thus we migrated to the latest 4.6 version and indexing speed was indeed
>>>> very good now (with the searcherManager.maybeRefreshBlocking() call
>>>> only
>>>> taking milliseconds to complete). But after some wore testing we
>>>> discovered
>>>> that somehow the indexWriter.updateDocument( term, documentToIndex )
>>>> functionality wasn't working anymore as expected - at least somtetimes.
>>>> It
>>>> looks like either the updateDocument method does not longer reliably
>>>> delete
>>>> the old document before adding a new one - with the result that older
>>>> documents are beeing returned by searches breaking our application.
>>>>
>>>> Unfortunately I'm not able to reproduce the issues in a simple unit
>>>> test but
>>>> maybe somebody of the lucene experts knows what we are doing wrong
>>>> here. Not
>>>> sure if it is of any relevance but we are running on Windows with a 64
>>>> bit
>>>> JDK 7 thus MMapDirectory is beeing used.
>>>>
>>>> Our Index Writer is configured like this:
>>>>
>>>>         IndexWriterConfig conf = new IndexWriterConfig(
>>>> Version.LUCENE_46,
>>>> new LimitTokenCountAnalyzer( new DefaultAnalyzer(), Integer.MAX_VALUE )
>>>> );
>>>>
>>>>
>>>>         conf.setOpenMode( OpenMode.APPEND );
>>>>
>>>>         IndexWriter indexWriter = new IndexWriter( FSDirectory.open( new
>>>> File( directoryPath )), conf );
>>>>
>>>> SearcherManager is configured like this:
>>>>
>>>>         searcherManager = new SearcherManager(indexWriter, true, null);
>>>>
>>>> // The anlyzer that we are using looks like this:
>>>>
>>>>         public class DefaultAnalyzer extends Analyzer
>>>>         {
>>>>            @Override
>>>>            protected TokenStreamComponents createComponents(final String
>>>> fieldName,
>>>>                    final Reader reader) {
>>>>                  return new TokenStreamComponents(new
>>>> WhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));
>>>>            }
>>>>         }
>>>>
>>>> The update of the index looks like this:
>>>>
>>>>         // instead of 42 the unique business identifier is used
>>>>         Long myUniqueBusinessId = 42l;
>>>>         BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);
>>>>         NumericUtils.longToPrefixCoded( myUniqueBusinessId.longValue(),
>>>> 0,
>>>> ref );
>>>>         Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );
>>>>
>>>>         // this method may be called multiple times with the same term
>>>> and
>>>> luceneDocumentToIndex parameter
>>>>         indexWriter.updateDocument( term, luceneDocumentToIndex);
>>>>
>>>>         // After performing a couple of updates we execute
>>>>         searcherManager.maybeRefreshBlocking();
>>>>
>>>>
>>>> // For searching we are using the following code
>>>>         searcher = searcherManager.acquire();
>>>>         // luceneQuery is the query, filter is some sort of filtering
>>>> that
>>>> we apply, luceneSort is some sorting query
>>>>         TopDocs topDocs = searcher.search( luceneQuery, filter, 1000,
>>>> luceneSort );
>>>>
>>>> // If we perform a query for MY_UNIQUE_BUSINESS_ID it will return
>>>> multiple
>>>> results instead of just one - this was neither the case with lucene 3.0
>>>> nor
>>>> 3.6
>>>>
>>>>
>>>> In order to fix the issue I tried couple of things but to now avail. It
>>>> still happens (not all the time though) that the lucene returns two
>>>> documents when querying for MY_UNIQUE_BUSINESS_ID instead of just one
>>>> -       setting setMaxBufferedDeleteTerms to 1 in the config
>>>>         conf.setMaxBufferedDeleteTerms( 1 );
>>>> - explicetly deleting instead of just updating
>>>>         indexWriter.deleteDocuments( term );
>>>> - ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in the index
>>>> and
>>>> not just analysed
>>>> - trying to delete the document via indexWriter.tryDeleteDocument()
>>>> - calling indexWriter.maybeMerge() after the update
>>>> - calling indexWriter.commit() after the update
>>>>
>>>>
>>>> Sorry for the lenghty post but I wanted to include as much information
>>>> as
>>>> possible. Let me know if something is missing...
>>>>
>>>> Thanks for helping in advance ;-)
>>>>
>>>> Kai
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: updateDocument (somtimes) no longer deleting documents after Update to 4.6

Posted by no...@kaigrabfelder.de.

Hm it looks like this is somehow caused by the filters we are using for 
searching.

I took one of the MY_UNIQUE_BUSINESS_ID ids, used in our applications 
search functionality and debuged the lucene search a little more. If I 
specify null for the filters I only get one result (which is correct). 
If I add the two filters that we usually use in our application I notice 
that the filters are triggered twice - for two different segments - and 
the result is contained in both segments. Looks like the first segment 
contains all documents in the index with the second segment containing 
only one - the document that should have been deleted upfront.

This can be reproduced even after restarting the application and even 
after indexWriter.commit is triggered

Could this be a bug? Or is this the desired behaviour?

Best Regards

Kai


Am 2014-02-24 13:54, schrieb nospam@kaigrabfelder.de:
> I'll see if I can dig a little bit deeper into the 3.6 behavior, for
> now I'm trying to get it running on 4.6 (as the index file is also a
> lot smaller - on 3.6 it was about 2 GB for about 9000 documents, with
> 4.6 it's only about 200 MB).
>
> And yes the business ID is indexed - otherwhise I wouldn't be able to
> find it at all - The problem is not that I can't find it but I find 
> it
> twice. And to make matters worse not consistently all the bime but
> only sometimes. Somehow it looks like the delete (before the update)
> does sometimes work and sometimes not. Do you know any chances why
> this could happen? Maybe something related to the MergePoliy (which 
> we
> don't set e.g. we are using the default)
>
> Best Regards
>
> Kai
>
>
> Am 2014-02-24 12:10, schrieb Michael McCandless:
>> The 30 second turnaround time in 3.6.x is absurd; if you turn on
>> IndexWriter's infoStream maybe it'd give a clue.  Or, capture a few
>> stack traces and post them.
>>
>> How are you creating the luceneDocumentToIndex?  You must ensure 
>> that
>> the business ID is in fact indexed as a field in the document,
>> otherwise the update won't find it.
>>
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Mon, Feb 24, 2014 at 5:33 AM,  <no...@kaigrabfelder.de> wrote:
>>> Hi there,
>>>
>>> we recently updated our application from lucene 3.0 to 3.6 with the 
>>> effect
>>> that (albeit using the SearchManager functionality as described on
>>> 
>>> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html)
>>> calls to searcherManager.maybeRefresh() were incredibly slow. e.g. 
>>> taking
>>> about 30 seconds after adding one document to the index with an 
>>> index of
>>> about 9000 documents. I assumed that we did something wrong with 
>>> the
>>> configuration as 30 seconds could not be meant with NRT ;-)
>>>
>>> Thus we migrated to the latest 4.6 version and indexing speed was 
>>> indeed
>>> very good now (with the searcherManager.maybeRefreshBlocking() call 
>>> only
>>> taking milliseconds to complete). But after some wore testing we 
>>> discovered
>>> that somehow the indexWriter.updateDocument( term, documentToIndex 
>>> )
>>> functionality wasn't working anymore as expected - at least 
>>> somtetimes. It
>>> looks like either the updateDocument method does not longer 
>>> reliably delete
>>> the old document before adding a new one - with the result that 
>>> older
>>> documents are beeing returned by searches breaking our application.
>>>
>>> Unfortunately I'm not able to reproduce the issues in a simple unit 
>>> test but
>>> maybe somebody of the lucene experts knows what we are doing wrong 
>>> here. Not
>>> sure if it is of any relevance but we are running on Windows with a 
>>> 64 bit
>>> JDK 7 thus MMapDirectory is beeing used.
>>>
>>> Our Index Writer is configured like this:
>>>
>>>         IndexWriterConfig conf = new IndexWriterConfig( 
>>> Version.LUCENE_46,
>>> new LimitTokenCountAnalyzer( new DefaultAnalyzer(), 
>>> Integer.MAX_VALUE ) );
>>>
>>>
>>>         conf.setOpenMode( OpenMode.APPEND );
>>>
>>>         IndexWriter indexWriter = new IndexWriter( 
>>> FSDirectory.open( new
>>> File( directoryPath )), conf );
>>>
>>> SearcherManager is configured like this:
>>>
>>>         searcherManager = new SearcherManager(indexWriter, true, 
>>> null);
>>>
>>> // The anlyzer that we are using looks like this:
>>>
>>>         public class DefaultAnalyzer extends Analyzer
>>>         {
>>>            @Override
>>>            protected TokenStreamComponents createComponents(final 
>>> String
>>> fieldName,
>>>                    final Reader reader) {
>>>                  return new TokenStreamComponents(new
>>> WhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));
>>>            }
>>>         }
>>>
>>> The update of the index looks like this:
>>>
>>>         // instead of 42 the unique business identifier is used
>>>         Long myUniqueBusinessId = 42l;
>>>         BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);
>>>         NumericUtils.longToPrefixCoded( 
>>> myUniqueBusinessId.longValue(), 0,
>>> ref );
>>>         Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );
>>>
>>>         // this method may be called multiple times with the same 
>>> term and
>>> luceneDocumentToIndex parameter
>>>         indexWriter.updateDocument( term, luceneDocumentToIndex);
>>>
>>>         // After performing a couple of updates we execute
>>>         searcherManager.maybeRefreshBlocking();
>>>
>>>
>>> // For searching we are using the following code
>>>         searcher = searcherManager.acquire();
>>>         // luceneQuery is the query, filter is some sort of 
>>> filtering that
>>> we apply, luceneSort is some sorting query
>>>         TopDocs topDocs = searcher.search( luceneQuery, filter, 
>>> 1000,
>>> luceneSort );
>>>
>>> // If we perform a query for MY_UNIQUE_BUSINESS_ID it will return 
>>> multiple
>>> results instead of just one - this was neither the case with lucene 
>>> 3.0 nor
>>> 3.6
>>>
>>>
>>> In order to fix the issue I tried couple of things but to now 
>>> avail. It
>>> still happens (not all the time though) that the lucene returns two
>>> documents when querying for MY_UNIQUE_BUSINESS_ID instead of just 
>>> one
>>> -       setting setMaxBufferedDeleteTerms to 1 in the config
>>>         conf.setMaxBufferedDeleteTerms( 1 );
>>> - explicetly deleting instead of just updating
>>>         indexWriter.deleteDocuments( term );
>>> - ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in the 
>>> index and
>>> not just analysed
>>> - trying to delete the document via indexWriter.tryDeleteDocument()
>>> - calling indexWriter.maybeMerge() after the update
>>> - calling indexWriter.commit() after the update
>>>
>>>
>>> Sorry for the lenghty post but I wanted to include as much 
>>> information as
>>> possible. Let me know if something is missing...
>>>
>>> Thanks for helping in advance ;-)
>>>
>>> Kai
>>>
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: updateDocument (somtimes) no longer deleting documents after Update to 4.6

Posted by no...@kaigrabfelder.de.

I'll see if I can dig a little bit deeper into the 3.6 behavior, for 
now I'm trying to get it running on 4.6 (as the index file is also a lot 
smaller - on 3.6 it was about 2 GB for about 9000 documents, with 4.6 
it's only about 200 MB).

And yes the business ID is indexed - otherwhise I wouldn't be able to 
find it at all - The problem is not that I can't find it but I find it 
twice. And to make matters worse not consistently all the bime but only 
sometimes. Somehow it looks like the delete (before the update) does 
sometimes work and sometimes not. Do you know any chances why this could 
happen? Maybe something related to the MergePoliy (which we don't set 
e.g. we are using the default)

Best Regards

Kai


Am 2014-02-24 12:10, schrieb Michael McCandless:
> The 30 second turnaround time in 3.6.x is absurd; if you turn on
> IndexWriter's infoStream maybe it'd give a clue.  Or, capture a few
> stack traces and post them.
>
> How are you creating the luceneDocumentToIndex?  You must ensure that
> the business ID is in fact indexed as a field in the document,
> otherwise the update won't find it.
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Feb 24, 2014 at 5:33 AM,  <no...@kaigrabfelder.de> wrote:
>> Hi there,
>>
>> we recently updated our application from lucene 3.0 to 3.6 with the 
>> effect
>> that (albeit using the SearchManager functionality as described on
>> 
>> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html)
>> calls to searcherManager.maybeRefresh() were incredibly slow. e.g. 
>> taking
>> about 30 seconds after adding one document to the index with an 
>> index of
>> about 9000 documents. I assumed that we did something wrong with the
>> configuration as 30 seconds could not be meant with NRT ;-)
>>
>> Thus we migrated to the latest 4.6 version and indexing speed was 
>> indeed
>> very good now (with the searcherManager.maybeRefreshBlocking() call 
>> only
>> taking milliseconds to complete). But after some wore testing we 
>> discovered
>> that somehow the indexWriter.updateDocument( term, documentToIndex )
>> functionality wasn't working anymore as expected - at least 
>> somtetimes. It
>> looks like either the updateDocument method does not longer reliably 
>> delete
>> the old document before adding a new one - with the result that 
>> older
>> documents are beeing returned by searches breaking our application.
>>
>> Unfortunately I'm not able to reproduce the issues in a simple unit 
>> test but
>> maybe somebody of the lucene experts knows what we are doing wrong 
>> here. Not
>> sure if it is of any relevance but we are running on Windows with a 
>> 64 bit
>> JDK 7 thus MMapDirectory is beeing used.
>>
>> Our Index Writer is configured like this:
>>
>>         IndexWriterConfig conf = new IndexWriterConfig( 
>> Version.LUCENE_46,
>> new LimitTokenCountAnalyzer( new DefaultAnalyzer(), 
>> Integer.MAX_VALUE ) );
>>
>>
>>         conf.setOpenMode( OpenMode.APPEND );
>>
>>         IndexWriter indexWriter = new IndexWriter( FSDirectory.open( 
>> new
>> File( directoryPath )), conf );
>>
>> SearcherManager is configured like this:
>>
>>         searcherManager = new SearcherManager(indexWriter, true, 
>> null);
>>
>> // The anlyzer that we are using looks like this:
>>
>>         public class DefaultAnalyzer extends Analyzer
>>         {
>>            @Override
>>            protected TokenStreamComponents createComponents(final 
>> String
>> fieldName,
>>                    final Reader reader) {
>>                  return new TokenStreamComponents(new
>> WhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));
>>            }
>>         }
>>
>> The update of the index looks like this:
>>
>>         // instead of 42 the unique business identifier is used
>>         Long myUniqueBusinessId = 42l;
>>         BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);
>>         NumericUtils.longToPrefixCoded( 
>> myUniqueBusinessId.longValue(), 0,
>> ref );
>>         Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );
>>
>>         // this method may be called multiple times with the same 
>> term and
>> luceneDocumentToIndex parameter
>>         indexWriter.updateDocument( term, luceneDocumentToIndex);
>>
>>         // After performing a couple of updates we execute
>>         searcherManager.maybeRefreshBlocking();
>>
>>
>> // For searching we are using the following code
>>         searcher = searcherManager.acquire();
>>         // luceneQuery is the query, filter is some sort of 
>> filtering that
>> we apply, luceneSort is some sorting query
>>         TopDocs topDocs = searcher.search( luceneQuery, filter, 
>> 1000,
>> luceneSort );
>>
>> // If we perform a query for MY_UNIQUE_BUSINESS_ID it will return 
>> multiple
>> results instead of just one - this was neither the case with lucene 
>> 3.0 nor
>> 3.6
>>
>>
>> In order to fix the issue I tried couple of things but to now avail. 
>> It
>> still happens (not all the time though) that the lucene returns two
>> documents when querying for MY_UNIQUE_BUSINESS_ID instead of just 
>> one
>> -       setting setMaxBufferedDeleteTerms to 1 in the config
>>         conf.setMaxBufferedDeleteTerms( 1 );
>> - explicetly deleting instead of just updating
>>         indexWriter.deleteDocuments( term );
>> - ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in the 
>> index and
>> not just analysed
>> - trying to delete the document via indexWriter.tryDeleteDocument()
>> - calling indexWriter.maybeMerge() after the update
>> - calling indexWriter.commit() after the update
>>
>>
>> Sorry for the lenghty post but I wanted to include as much 
>> information as
>> possible. Let me know if something is missing...
>>
>> Thanks for helping in advance ;-)
>>
>> Kai
>>
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: updateDocument (somtimes) no longer deleting documents after Update to 4.6

Posted by Michael McCandless <lu...@mikemccandless.com>.

The 30 second turnaround time in 3.6.x is absurd; if you turn on
IndexWriter's infoStream maybe it'd give a clue.  Or, capture a few
stack traces and post them.

How are you creating the luceneDocumentToIndex?  You must ensure that
the business ID is in fact indexed as a field in the document,
otherwise the update won't find it.


Mike McCandless

http://blog.mikemccandless.com


On Mon, Feb 24, 2014 at 5:33 AM,  <no...@kaigrabfelder.de> wrote:
> Hi there,
>
> we recently updated our application from lucene 3.0 to 3.6 with the effect
> that (albeit using the SearchManager functionality as described on
> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html)
> calls to searcherManager.maybeRefresh() were incredibly slow. e.g. taking
> about 30 seconds after adding one document to the index with an index of
> about 9000 documents. I assumed that we did something wrong with the
> configuration as 30 seconds could not be meant with NRT ;-)
>
> Thus we migrated to the latest 4.6 version and indexing speed was indeed
> very good now (with the searcherManager.maybeRefreshBlocking() call only
> taking milliseconds to complete). But after some wore testing we discovered
> that somehow the indexWriter.updateDocument( term, documentToIndex )
> functionality wasn't working anymore as expected - at least somtetimes. It
> looks like either the updateDocument method does not longer reliably delete
> the old document before adding a new one - with the result that older
> documents are beeing returned by searches breaking our application.
>
> Unfortunately I'm not able to reproduce the issues in a simple unit test but
> maybe somebody of the lucene experts knows what we are doing wrong here. Not
> sure if it is of any relevance but we are running on Windows with a 64 bit
> JDK 7 thus MMapDirectory is beeing used.
>
> Our Index Writer is configured like this:
>
>         IndexWriterConfig conf = new IndexWriterConfig( Version.LUCENE_46,
> new LimitTokenCountAnalyzer( new DefaultAnalyzer(), Integer.MAX_VALUE ) );
>
>
>         conf.setOpenMode( OpenMode.APPEND );
>
>         IndexWriter indexWriter = new IndexWriter( FSDirectory.open( new
> File( directoryPath )), conf );
>
> SearcherManager is configured like this:
>
>         searcherManager = new SearcherManager(indexWriter, true, null);
>
> // The anlyzer that we are using looks like this:
>
>         public class DefaultAnalyzer extends Analyzer
>         {
>            @Override
>            protected TokenStreamComponents createComponents(final String
> fieldName,
>                    final Reader reader) {
>                  return new TokenStreamComponents(new
> WhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));
>            }
>         }
>
> The update of the index looks like this:
>
>         // instead of 42 the unique business identifier is used
>         Long myUniqueBusinessId = 42l;
>         BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);
>         NumericUtils.longToPrefixCoded( myUniqueBusinessId.longValue(), 0,
> ref );
>         Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );
>
>         // this method may be called multiple times with the same term and
> luceneDocumentToIndex parameter
>         indexWriter.updateDocument( term, luceneDocumentToIndex);
>
>         // After performing a couple of updates we execute
>         searcherManager.maybeRefreshBlocking();
>
>
> // For searching we are using the following code
>         searcher = searcherManager.acquire();
>         // luceneQuery is the query, filter is some sort of filtering that
> we apply, luceneSort is some sorting query
>         TopDocs topDocs = searcher.search( luceneQuery, filter, 1000,
> luceneSort );
>
> // If we perform a query for MY_UNIQUE_BUSINESS_ID it will return multiple
> results instead of just one - this was neither the case with lucene 3.0 nor
> 3.6
>
>
> In order to fix the issue I tried couple of things but to now avail. It
> still happens (not all the time though) that the lucene returns two
> documents when querying for MY_UNIQUE_BUSINESS_ID instead of just one
> -       setting setMaxBufferedDeleteTerms to 1 in the config
>         conf.setMaxBufferedDeleteTerms( 1 );
> - explicetly deleting instead of just updating
>         indexWriter.deleteDocuments( term );
> - ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in the index and
> not just analysed
> - trying to delete the document via indexWriter.tryDeleteDocument()
> - calling indexWriter.maybeMerge() after the update
> - calling indexWriter.commit() after the update
>
>
> Sorry for the lenghty post but I wanted to include as much information as
> possible. Let me know if something is missing...
>
> Thanks for helping in advance ;-)
>
> Kai
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org