You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by no...@kaigrabfelder.de on 2014/02/24 11:33:13 UTC
updateDocument (somtimes) no longer deleting documents after Update
to 4.6
Hi there,
we recently updated our application from lucene 3.0 to 3.6 with the
effect that (albeit using the SearchManager functionality as described
on
http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html)
calls to searcherManager.maybeRefresh() were incredibly slow. e.g.
taking about 30 seconds after adding one document to the index with an
index of about 9000 documents. I assumed that we did something wrong
with the configuration as 30 seconds could not be meant with NRT ;-)
Thus we migrated to the latest 4.6 version and indexing speed was
indeed very good now (with the searcherManager.maybeRefreshBlocking()
call only taking milliseconds to complete). But after some wore testing
we discovered that somehow the indexWriter.updateDocument( term,
documentToIndex ) functionality wasn't working anymore as expected - at
least somtetimes. It looks like either the updateDocument method does
not longer reliably delete the old document before adding a new one -
with the result that older documents are beeing returned by searches
breaking our application.
Unfortunately I'm not able to reproduce the issues in a simple unit
test but maybe somebody of the lucene experts knows what we are doing
wrong here. Not sure if it is of any relevance but we are running on
Windows with a 64 bit JDK 7 thus MMapDirectory is beeing used.
Our Index Writer is configured like this:
IndexWriterConfig conf = new IndexWriterConfig( Version.LUCENE_46, new
LimitTokenCountAnalyzer( new DefaultAnalyzer(), Integer.MAX_VALUE ) );
conf.setOpenMode( OpenMode.APPEND );
IndexWriter indexWriter = new IndexWriter( FSDirectory.open( new File(
directoryPath )), conf );
SearcherManager is configured like this:
searcherManager = new SearcherManager(indexWriter, true, null);
// The anlyzer that we are using looks like this:
public class DefaultAnalyzer extends Analyzer
{
@Override
protected TokenStreamComponents createComponents(final String
fieldName,
final Reader reader) {
return new TokenStreamComponents(new
WhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));
}
}
The update of the index looks like this:
// instead of 42 the unique business identifier is used
Long myUniqueBusinessId = 42l;
BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);
NumericUtils.longToPrefixCoded( myUniqueBusinessId.longValue(), 0, ref
);
Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );
// this method may be called multiple times with the same term and
luceneDocumentToIndex parameter
indexWriter.updateDocument( term, luceneDocumentToIndex);
// After performing a couple of updates we execute
searcherManager.maybeRefreshBlocking();
// For searching we are using the following code
searcher = searcherManager.acquire();
// luceneQuery is the query, filter is some sort of filtering that we
apply, luceneSort is some sorting query
TopDocs topDocs = searcher.search( luceneQuery, filter, 1000,
luceneSort );
// If we perform a query for MY_UNIQUE_BUSINESS_ID it will return
multiple results instead of just one - this was neither the case with
lucene 3.0 nor 3.6
In order to fix the issue I tried couple of things but to now avail. It
still happens (not all the time though) that the lucene returns two
documents when querying for MY_UNIQUE_BUSINESS_ID instead of just one
- setting setMaxBufferedDeleteTerms to 1 in the config
conf.setMaxBufferedDeleteTerms( 1 );
- explicetly deleting instead of just updating
indexWriter.deleteDocuments( term );
- ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in the index
and not just analysed
- trying to delete the document via indexWriter.tryDeleteDocument()
- calling indexWriter.maybeMerge() after the update
- calling indexWriter.commit() after the update
Sorry for the lenghty post but I wanted to include as much information
as possible. Let me know if something is missing...
Thanks for helping in advance ;-)
Kai
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: updateDocument (somtimes) no longer deleting documents after
Update to 4.6
Posted by Kai Grabfelder <no...@kaigrabfelder.de>.
Hi Uwe,
thank you very much! That indeed was the issue and did the trick!
Best Regards
Kai
--- Original Nachricht ---
Absender: Uwe Schindler
Datum: 24.02.14 20:42
> Hi,
>
> it looks like your filters are implemented in a wrong way:
>
> - First, in Lucene 3 and 4, filters are applied by segment. Means, they have to calculate the DocIdSet of matched documents for each index segment separately. On updating, the document is "deleted" (hidden) on the old segment and re-added to a new index segment. This is why you see it two times in the filter.
> - Second, in Lucene 4, Filters now get (Bits acceptDocs) in their getDocIdSet method. This is new, before the deleted documents were applied *after* the filters, now together with the filters. If acceptDocs is non-null, these are "hidden" deleted documents. If you filter does not applies those accept docs correctly to the returned DocIdSet, the deleted document suddenly reappear. In Lucene 4, all deleted documents is just an addition filter applied while searching: A filter that marks still accessible documents and hides all deleted documents. If your new filter does not chain in this additional filter, the deletions are ignored. A quick fix is to use "return BitsFilteredDocIdSet.wrap(yourFilterBitSet, acceptDocs)" instead of "return yourFilterBitSet".
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: nospam@kaigrabfelder.de [mailto:nospam@kaigrabfelder.de]
>> Sent: Monday, February 24, 2014 7:14 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: updateDocument (somtimes) no longer deleting documents
>> after Update to 4.6
>>
>> Hm it looks like this is somehow caused by the filters we are using for
>> searching.
>>
>> I took one of the MY_UNIQUE_BUSINESS_ID ids, used in our applications
>> search functionality and debuged the lucene search a little more. If I specify
>> null for the filters I only get one result (which is correct).
>> If I add the two filters that we usually use in our application I notice that the
>> filters are triggered twice - for two different segments - and the result is
>> contained in both segments. Looks like the first segment contains all
>> documents in the index with the second segment containing only one - the
>> document that should have been deleted upfront.
>>
>> This can be reproduced even after restarting the application and even after
>> indexWriter.commit is triggered
>>
>> Could this be a bug? Or is this the desired behaviour?
>>
>> Best Regards
>>
>> Kai
>>
>>
>> Am 2014-02-24 13:54, schrieb nospam@kaigrabfelder.de:
>> > I'll see if I can dig a little bit deeper into the 3.6 behavior, for
>> > now I'm trying to get it running on 4.6 (as the index file is also a
>> > lot smaller - on 3.6 it was about 2 GB for about 9000 documents, with
>> > 4.6 it's only about 200 MB).
>> >
>> > And yes the business ID is indexed - otherwhise I wouldn't be able to
>> > find it at all - The problem is not that I can't find it but I find it
>> > twice. And to make matters worse not consistently all the bime but
>> > only sometimes. Somehow it looks like the delete (before the update)
>> > does sometimes work and sometimes not. Do you know any chances why
>> > this could happen? Maybe something related to the MergePoliy (which we
>> > don't set e.g. we are using the default)
>> >
>> > Best Regards
>> >
>> > Kai
>> >
>> >
>> > Am 2014-02-24 12:10, schrieb Michael McCandless:
>> >> The 30 second turnaround time in 3.6.x is absurd; if you turn on
>> >> IndexWriter's infoStream maybe it'd give a clue. Or, capture a few
>> >> stack traces and post them.
>> >>
>> >> How are you creating the luceneDocumentToIndex? You must ensure
>> that
>> >> the business ID is in fact indexed as a field in the document,
>> >> otherwise the update won't find it.
>> >>
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >>
>> >> On Mon, Feb 24, 2014 at 5:33 AM, <no...@kaigrabfelder.de> wrote:
>> >>> Hi there,
>> >>>
>> >>> we recently updated our application from lucene 3.0 to 3.6 with the
>> >>> effect that (albeit using the SearchManager functionality as
>> >>> described on
>> >>>
>> >>> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-
>> simpl
>> >>> ifies.html) calls to searcherManager.maybeRefresh() were incredibly
>> >>> slow. e.g.
>> >>> taking
>> >>> about 30 seconds after adding one document to the index with an
>> >>> index of about 9000 documents. I assumed that we did something wrong
>> >>> with the configuration as 30 seconds could not be meant with NRT ;-)
>> >>>
>> >>> Thus we migrated to the latest 4.6 version and indexing speed was
>> >>> indeed very good now (with the
>> >>> searcherManager.maybeRefreshBlocking() call only taking milliseconds
>> >>> to complete). But after some wore testing we discovered that somehow
>> >>> the indexWriter.updateDocument( term, documentToIndex
>> >>> )
>> >>> functionality wasn't working anymore as expected - at least
>> >>> somtetimes. It looks like either the updateDocument method does not
>> >>> longer reliably delete the old document before adding a new one -
>> >>> with the result that older documents are beeing returned by searches
>> >>> breaking our application.
>> >>>
>> >>> Unfortunately I'm not able to reproduce the issues in a simple unit
>> >>> test but maybe somebody of the lucene experts knows what we are
>> >>> doing wrong here. Not sure if it is of any relevance but we are
>> >>> running on Windows with a
>> >>> 64 bit
>> >>> JDK 7 thus MMapDirectory is beeing used.
>> >>>
>> >>> Our Index Writer is configured like this:
>> >>>
>> >>> IndexWriterConfig conf = new IndexWriterConfig(
>> >>> Version.LUCENE_46, new LimitTokenCountAnalyzer( new
>> >>> DefaultAnalyzer(), Integer.MAX_VALUE ) );
>> >>>
>> >>>
>> >>> conf.setOpenMode( OpenMode.APPEND );
>> >>>
>> >>> IndexWriter indexWriter = new IndexWriter(
>> >>> FSDirectory.open( new
>> >>> File( directoryPath )), conf );
>> >>>
>> >>> SearcherManager is configured like this:
>> >>>
>> >>> searcherManager = new SearcherManager(indexWriter, true,
>> >>> null);
>> >>>
>> >>> // The anlyzer that we are using looks like this:
>> >>>
>> >>> public class DefaultAnalyzer extends Analyzer
>> >>> {
>> >>> @Override
>> >>> protected TokenStreamComponents createComponents(final
>> >>> String
>> >>> fieldName,
>> >>> final Reader reader) {
>> >>> return new TokenStreamComponents(new
>> >>> WhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));
>> >>> }
>> >>> }
>> >>>
>> >>> The update of the index looks like this:
>> >>>
>> >>> // instead of 42 the unique business identifier is used
>> >>> Long myUniqueBusinessId = 42l;
>> >>> BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);
>> >>> NumericUtils.longToPrefixCoded(
>> >>> myUniqueBusinessId.longValue(), 0,
>> >>> ref );
>> >>> Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );
>> >>>
>> >>> // this method may be called multiple times with the same
>> >>> term and
>> >>> luceneDocumentToIndex parameter
>> >>> indexWriter.updateDocument( term, luceneDocumentToIndex);
>> >>>
>> >>> // After performing a couple of updates we execute
>> >>> searcherManager.maybeRefreshBlocking();
>> >>>
>> >>>
>> >>> // For searching we are using the following code
>> >>> searcher = searcherManager.acquire();
>> >>> // luceneQuery is the query, filter is some sort of
>> >>> filtering that
>> >>> we apply, luceneSort is some sorting query
>> >>> TopDocs topDocs = searcher.search( luceneQuery, filter,
>> >>> 1000,
>> >>> luceneSort );
>> >>>
>> >>> // If we perform a query for MY_UNIQUE_BUSINESS_ID it will return
>> >>> multiple
>> >>> results instead of just one - this was neither the case with lucene
>> >>> 3.0 nor
>> >>> 3.6
>> >>>
>> >>>
>> >>> In order to fix the issue I tried couple of things but to now
>> >>> avail. It
>> >>> still happens (not all the time though) that the lucene returns two
>> >>> documents when querying for MY_UNIQUE_BUSINESS_ID instead of
>> just
>> >>> one
>> >>> - setting setMaxBufferedDeleteTerms to 1 in the config
>> >>> conf.setMaxBufferedDeleteTerms( 1 );
>> >>> - explicetly deleting instead of just updating
>> >>> indexWriter.deleteDocuments( term );
>> >>> - ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in the
>> >>> index and
>> >>> not just analysed
>> >>> - trying to delete the document via indexWriter.tryDeleteDocument()
>> >>> - calling indexWriter.maybeMerge() after the update
>> >>> - calling indexWriter.commit() after the update
>> >>>
>> >>>
>> >>> Sorry for the lenghty post but I wanted to include as much
>> >>> information as
>> >>> possible. Let me know if something is missing...
>> >>>
>> >>> Thanks for helping in advance ;-)
>> >>>
>> >>> Kai
>> >>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>>
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: updateDocument (somtimes) no longer deleting documents after Update to 4.6
Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,
it looks like your filters are implemented in a wrong way:
- First, in Lucene 3 and 4, filters are applied by segment. Means, they have to calculate the DocIdSet of matched documents for each index segment separately. On updating, the document is "deleted" (hidden) on the old segment and re-added to a new index segment. This is why you see it two times in the filter.
- Second, in Lucene 4, Filters now get (Bits acceptDocs) in their getDocIdSet method. This is new, before the deleted documents were applied *after* the filters, now together with the filters. If acceptDocs is non-null, these are "hidden" deleted documents. If you filter does not applies those accept docs correctly to the returned DocIdSet, the deleted document suddenly reappear. In Lucene 4, all deleted documents is just an addition filter applied while searching: A filter that marks still accessible documents and hides all deleted documents. If your new filter does not chain in this additional filter, the deletions are ignored. A quick fix is to use "return BitsFilteredDocIdSet.wrap(yourFilterBitSet, acceptDocs)" instead of "return yourFilterBitSet".
Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de
> -----Original Message-----
> From: nospam@kaigrabfelder.de [mailto:nospam@kaigrabfelder.de]
> Sent: Monday, February 24, 2014 7:14 PM
> To: java-user@lucene.apache.org
> Subject: Re: updateDocument (somtimes) no longer deleting documents
> after Update to 4.6
>
> Hm it looks like this is somehow caused by the filters we are using for
> searching.
>
> I took one of the MY_UNIQUE_BUSINESS_ID ids, used in our applications
> search functionality and debuged the lucene search a little more. If I specify
> null for the filters I only get one result (which is correct).
> If I add the two filters that we usually use in our application I notice that the
> filters are triggered twice - for two different segments - and the result is
> contained in both segments. Looks like the first segment contains all
> documents in the index with the second segment containing only one - the
> document that should have been deleted upfront.
>
> This can be reproduced even after restarting the application and even after
> indexWriter.commit is triggered
>
> Could this be a bug? Or is this the desired behaviour?
>
> Best Regards
>
> Kai
>
>
> Am 2014-02-24 13:54, schrieb nospam@kaigrabfelder.de:
> > I'll see if I can dig a little bit deeper into the 3.6 behavior, for
> > now I'm trying to get it running on 4.6 (as the index file is also a
> > lot smaller - on 3.6 it was about 2 GB for about 9000 documents, with
> > 4.6 it's only about 200 MB).
> >
> > And yes the business ID is indexed - otherwhise I wouldn't be able to
> > find it at all - The problem is not that I can't find it but I find it
> > twice. And to make matters worse not consistently all the bime but
> > only sometimes. Somehow it looks like the delete (before the update)
> > does sometimes work and sometimes not. Do you know any chances why
> > this could happen? Maybe something related to the MergePoliy (which we
> > don't set e.g. we are using the default)
> >
> > Best Regards
> >
> > Kai
> >
> >
> > Am 2014-02-24 12:10, schrieb Michael McCandless:
> >> The 30 second turnaround time in 3.6.x is absurd; if you turn on
> >> IndexWriter's infoStream maybe it'd give a clue. Or, capture a few
> >> stack traces and post them.
> >>
> >> How are you creating the luceneDocumentToIndex? You must ensure
> that
> >> the business ID is in fact indexed as a field in the document,
> >> otherwise the update won't find it.
> >>
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Mon, Feb 24, 2014 at 5:33 AM, <no...@kaigrabfelder.de> wrote:
> >>> Hi there,
> >>>
> >>> we recently updated our application from lucene 3.0 to 3.6 with the
> >>> effect that (albeit using the SearchManager functionality as
> >>> described on
> >>>
> >>> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-
> simpl
> >>> ifies.html) calls to searcherManager.maybeRefresh() were incredibly
> >>> slow. e.g.
> >>> taking
> >>> about 30 seconds after adding one document to the index with an
> >>> index of about 9000 documents. I assumed that we did something wrong
> >>> with the configuration as 30 seconds could not be meant with NRT ;-)
> >>>
> >>> Thus we migrated to the latest 4.6 version and indexing speed was
> >>> indeed very good now (with the
> >>> searcherManager.maybeRefreshBlocking() call only taking milliseconds
> >>> to complete). But after some wore testing we discovered that somehow
> >>> the indexWriter.updateDocument( term, documentToIndex
> >>> )
> >>> functionality wasn't working anymore as expected - at least
> >>> somtetimes. It looks like either the updateDocument method does not
> >>> longer reliably delete the old document before adding a new one -
> >>> with the result that older documents are beeing returned by searches
> >>> breaking our application.
> >>>
> >>> Unfortunately I'm not able to reproduce the issues in a simple unit
> >>> test but maybe somebody of the lucene experts knows what we are
> >>> doing wrong here. Not sure if it is of any relevance but we are
> >>> running on Windows with a
> >>> 64 bit
> >>> JDK 7 thus MMapDirectory is beeing used.
> >>>
> >>> Our Index Writer is configured like this:
> >>>
> >>> IndexWriterConfig conf = new IndexWriterConfig(
> >>> Version.LUCENE_46, new LimitTokenCountAnalyzer( new
> >>> DefaultAnalyzer(), Integer.MAX_VALUE ) );
> >>>
> >>>
> >>> conf.setOpenMode( OpenMode.APPEND );
> >>>
> >>> IndexWriter indexWriter = new IndexWriter(
> >>> FSDirectory.open( new
> >>> File( directoryPath )), conf );
> >>>
> >>> SearcherManager is configured like this:
> >>>
> >>> searcherManager = new SearcherManager(indexWriter, true,
> >>> null);
> >>>
> >>> // The anlyzer that we are using looks like this:
> >>>
> >>> public class DefaultAnalyzer extends Analyzer
> >>> {
> >>> @Override
> >>> protected TokenStreamComponents createComponents(final
> >>> String
> >>> fieldName,
> >>> final Reader reader) {
> >>> return new TokenStreamComponents(new
> >>> WhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));
> >>> }
> >>> }
> >>>
> >>> The update of the index looks like this:
> >>>
> >>> // instead of 42 the unique business identifier is used
> >>> Long myUniqueBusinessId = 42l;
> >>> BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);
> >>> NumericUtils.longToPrefixCoded(
> >>> myUniqueBusinessId.longValue(), 0,
> >>> ref );
> >>> Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );
> >>>
> >>> // this method may be called multiple times with the same
> >>> term and
> >>> luceneDocumentToIndex parameter
> >>> indexWriter.updateDocument( term, luceneDocumentToIndex);
> >>>
> >>> // After performing a couple of updates we execute
> >>> searcherManager.maybeRefreshBlocking();
> >>>
> >>>
> >>> // For searching we are using the following code
> >>> searcher = searcherManager.acquire();
> >>> // luceneQuery is the query, filter is some sort of
> >>> filtering that
> >>> we apply, luceneSort is some sorting query
> >>> TopDocs topDocs = searcher.search( luceneQuery, filter,
> >>> 1000,
> >>> luceneSort );
> >>>
> >>> // If we perform a query for MY_UNIQUE_BUSINESS_ID it will return
> >>> multiple
> >>> results instead of just one - this was neither the case with lucene
> >>> 3.0 nor
> >>> 3.6
> >>>
> >>>
> >>> In order to fix the issue I tried couple of things but to now
> >>> avail. It
> >>> still happens (not all the time though) that the lucene returns two
> >>> documents when querying for MY_UNIQUE_BUSINESS_ID instead of
> just
> >>> one
> >>> - setting setMaxBufferedDeleteTerms to 1 in the config
> >>> conf.setMaxBufferedDeleteTerms( 1 );
> >>> - explicetly deleting instead of just updating
> >>> indexWriter.deleteDocuments( term );
> >>> - ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in the
> >>> index and
> >>> not just analysed
> >>> - trying to delete the document via indexWriter.tryDeleteDocument()
> >>> - calling indexWriter.maybeMerge() after the update
> >>> - calling indexWriter.commit() after the update
> >>>
> >>>
> >>> Sorry for the lenghty post but I wanted to include as much
> >>> information as
> >>> possible. Let me know if something is missing...
> >>>
> >>> Thanks for helping in advance ;-)
> >>>
> >>> Kai
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: updateDocument (somtimes) no longer deleting documents after
Update to 4.6
Posted by Erick Erickson <er...@gmail.com>.
I suspect you're finding the old doc that is simply marked
as deleted. Did you check for that?
One quick way to see if this is even in the right ballpark would be
to do a forceMerge. If the problem disappears, then this is
relevant I'd guess.
Warning: The operative word here is "guess", I haven't been
working in this layer for a long time...
Best,
Erick
On Mon, Feb 24, 2014 at 10:14 AM, <no...@kaigrabfelder.de> wrote:
> Hm it looks like this is somehow caused by the filters we are using for
> searching.
>
> I took one of the MY_UNIQUE_BUSINESS_ID ids, used in our applications
> search functionality and debuged the lucene search a little more. If I
> specify null for the filters I only get one result (which is correct). If I
> add the two filters that we usually use in our application I notice that
> the filters are triggered twice - for two different segments - and the
> result is contained in both segments. Looks like the first segment contains
> all documents in the index with the second segment containing only one -
> the document that should have been deleted upfront.
>
> This can be reproduced even after restarting the application and even
> after indexWriter.commit is triggered
>
> Could this be a bug? Or is this the desired behaviour?
>
> Best Regards
>
> Kai
>
>
> Am 2014-02-24 13:54, schrieb nospam@kaigrabfelder.de:
>
> I'll see if I can dig a little bit deeper into the 3.6 behavior, for
>> now I'm trying to get it running on 4.6 (as the index file is also a
>> lot smaller - on 3.6 it was about 2 GB for about 9000 documents, with
>> 4.6 it's only about 200 MB).
>>
>> And yes the business ID is indexed - otherwhise I wouldn't be able to
>> find it at all - The problem is not that I can't find it but I find it
>> twice. And to make matters worse not consistently all the bime but
>> only sometimes. Somehow it looks like the delete (before the update)
>> does sometimes work and sometimes not. Do you know any chances why
>> this could happen? Maybe something related to the MergePoliy (which we
>> don't set e.g. we are using the default)
>>
>> Best Regards
>>
>> Kai
>>
>>
>> Am 2014-02-24 12:10, schrieb Michael McCandless:
>>
>>> The 30 second turnaround time in 3.6.x is absurd; if you turn on
>>> IndexWriter's infoStream maybe it'd give a clue. Or, capture a few
>>> stack traces and post them.
>>>
>>> How are you creating the luceneDocumentToIndex? You must ensure that
>>> the business ID is in fact indexed as a field in the document,
>>> otherwise the update won't find it.
>>>
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Mon, Feb 24, 2014 at 5:33 AM, <no...@kaigrabfelder.de> wrote:
>>>
>>>> Hi there,
>>>>
>>>> we recently updated our application from lucene 3.0 to 3.6 with the
>>>> effect
>>>> that (albeit using the SearchManager functionality as described on
>>>>
>>>> http://blog.mikemccandless.com/2011/09/lucenes-
>>>> searchermanager-simplifies.html)
>>>> calls to searcherManager.maybeRefresh() were incredibly slow. e.g.
>>>> taking
>>>> about 30 seconds after adding one document to the index with an index of
>>>> about 9000 documents. I assumed that we did something wrong with the
>>>> configuration as 30 seconds could not be meant with NRT ;-)
>>>>
>>>> Thus we migrated to the latest 4.6 version and indexing speed was indeed
>>>> very good now (with the searcherManager.maybeRefreshBlocking() call
>>>> only
>>>> taking milliseconds to complete). But after some wore testing we
>>>> discovered
>>>> that somehow the indexWriter.updateDocument( term, documentToIndex )
>>>> functionality wasn't working anymore as expected - at least somtetimes.
>>>> It
>>>> looks like either the updateDocument method does not longer reliably
>>>> delete
>>>> the old document before adding a new one - with the result that older
>>>> documents are beeing returned by searches breaking our application.
>>>>
>>>> Unfortunately I'm not able to reproduce the issues in a simple unit
>>>> test but
>>>> maybe somebody of the lucene experts knows what we are doing wrong
>>>> here. Not
>>>> sure if it is of any relevance but we are running on Windows with a 64
>>>> bit
>>>> JDK 7 thus MMapDirectory is beeing used.
>>>>
>>>> Our Index Writer is configured like this:
>>>>
>>>> IndexWriterConfig conf = new IndexWriterConfig(
>>>> Version.LUCENE_46,
>>>> new LimitTokenCountAnalyzer( new DefaultAnalyzer(), Integer.MAX_VALUE )
>>>> );
>>>>
>>>>
>>>> conf.setOpenMode( OpenMode.APPEND );
>>>>
>>>> IndexWriter indexWriter = new IndexWriter( FSDirectory.open( new
>>>> File( directoryPath )), conf );
>>>>
>>>> SearcherManager is configured like this:
>>>>
>>>> searcherManager = new SearcherManager(indexWriter, true, null);
>>>>
>>>> // The anlyzer that we are using looks like this:
>>>>
>>>> public class DefaultAnalyzer extends Analyzer
>>>> {
>>>> @Override
>>>> protected TokenStreamComponents createComponents(final String
>>>> fieldName,
>>>> final Reader reader) {
>>>> return new TokenStreamComponents(new
>>>> WhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));
>>>> }
>>>> }
>>>>
>>>> The update of the index looks like this:
>>>>
>>>> // instead of 42 the unique business identifier is used
>>>> Long myUniqueBusinessId = 42l;
>>>> BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);
>>>> NumericUtils.longToPrefixCoded( myUniqueBusinessId.longValue(),
>>>> 0,
>>>> ref );
>>>> Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );
>>>>
>>>> // this method may be called multiple times with the same term
>>>> and
>>>> luceneDocumentToIndex parameter
>>>> indexWriter.updateDocument( term, luceneDocumentToIndex);
>>>>
>>>> // After performing a couple of updates we execute
>>>> searcherManager.maybeRefreshBlocking();
>>>>
>>>>
>>>> // For searching we are using the following code
>>>> searcher = searcherManager.acquire();
>>>> // luceneQuery is the query, filter is some sort of filtering
>>>> that
>>>> we apply, luceneSort is some sorting query
>>>> TopDocs topDocs = searcher.search( luceneQuery, filter, 1000,
>>>> luceneSort );
>>>>
>>>> // If we perform a query for MY_UNIQUE_BUSINESS_ID it will return
>>>> multiple
>>>> results instead of just one - this was neither the case with lucene 3.0
>>>> nor
>>>> 3.6
>>>>
>>>>
>>>> In order to fix the issue I tried couple of things but to now avail. It
>>>> still happens (not all the time though) that the lucene returns two
>>>> documents when querying for MY_UNIQUE_BUSINESS_ID instead of just one
>>>> - setting setMaxBufferedDeleteTerms to 1 in the config
>>>> conf.setMaxBufferedDeleteTerms( 1 );
>>>> - explicetly deleting instead of just updating
>>>> indexWriter.deleteDocuments( term );
>>>> - ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in the index
>>>> and
>>>> not just analysed
>>>> - trying to delete the document via indexWriter.tryDeleteDocument()
>>>> - calling indexWriter.maybeMerge() after the update
>>>> - calling indexWriter.commit() after the update
>>>>
>>>>
>>>> Sorry for the lenghty post but I wanted to include as much information
>>>> as
>>>> possible. Let me know if something is missing...
>>>>
>>>> Thanks for helping in advance ;-)
>>>>
>>>> Kai
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: updateDocument (somtimes) no longer deleting documents after
Update to 4.6
Posted by no...@kaigrabfelder.de.
Hm it looks like this is somehow caused by the filters we are using for
searching.
I took one of the MY_UNIQUE_BUSINESS_ID ids, used in our applications
search functionality and debuged the lucene search a little more. If I
specify null for the filters I only get one result (which is correct).
If I add the two filters that we usually use in our application I notice
that the filters are triggered twice - for two different segments - and
the result is contained in both segments. Looks like the first segment
contains all documents in the index with the second segment containing
only one - the document that should have been deleted upfront.
This can be reproduced even after restarting the application and even
after indexWriter.commit is triggered
Could this be a bug? Or is this the desired behaviour?
Best Regards
Kai
Am 2014-02-24 13:54, schrieb nospam@kaigrabfelder.de:
> I'll see if I can dig a little bit deeper into the 3.6 behavior, for
> now I'm trying to get it running on 4.6 (as the index file is also a
> lot smaller - on 3.6 it was about 2 GB for about 9000 documents, with
> 4.6 it's only about 200 MB).
>
> And yes the business ID is indexed - otherwhise I wouldn't be able to
> find it at all - The problem is not that I can't find it but I find
> it
> twice. And to make matters worse not consistently all the bime but
> only sometimes. Somehow it looks like the delete (before the update)
> does sometimes work and sometimes not. Do you know any chances why
> this could happen? Maybe something related to the MergePoliy (which
> we
> don't set e.g. we are using the default)
>
> Best Regards
>
> Kai
>
>
> Am 2014-02-24 12:10, schrieb Michael McCandless:
>> The 30 second turnaround time in 3.6.x is absurd; if you turn on
>> IndexWriter's infoStream maybe it'd give a clue. Or, capture a few
>> stack traces and post them.
>>
>> How are you creating the luceneDocumentToIndex? You must ensure
>> that
>> the business ID is in fact indexed as a field in the document,
>> otherwise the update won't find it.
>>
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Mon, Feb 24, 2014 at 5:33 AM, <no...@kaigrabfelder.de> wrote:
>>> Hi there,
>>>
>>> we recently updated our application from lucene 3.0 to 3.6 with the
>>> effect
>>> that (albeit using the SearchManager functionality as described on
>>>
>>> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html)
>>> calls to searcherManager.maybeRefresh() were incredibly slow. e.g.
>>> taking
>>> about 30 seconds after adding one document to the index with an
>>> index of
>>> about 9000 documents. I assumed that we did something wrong with
>>> the
>>> configuration as 30 seconds could not be meant with NRT ;-)
>>>
>>> Thus we migrated to the latest 4.6 version and indexing speed was
>>> indeed
>>> very good now (with the searcherManager.maybeRefreshBlocking() call
>>> only
>>> taking milliseconds to complete). But after some wore testing we
>>> discovered
>>> that somehow the indexWriter.updateDocument( term, documentToIndex
>>> )
>>> functionality wasn't working anymore as expected - at least
>>> somtetimes. It
>>> looks like either the updateDocument method does not longer
>>> reliably delete
>>> the old document before adding a new one - with the result that
>>> older
>>> documents are beeing returned by searches breaking our application.
>>>
>>> Unfortunately I'm not able to reproduce the issues in a simple unit
>>> test but
>>> maybe somebody of the lucene experts knows what we are doing wrong
>>> here. Not
>>> sure if it is of any relevance but we are running on Windows with a
>>> 64 bit
>>> JDK 7 thus MMapDirectory is beeing used.
>>>
>>> Our Index Writer is configured like this:
>>>
>>> IndexWriterConfig conf = new IndexWriterConfig(
>>> Version.LUCENE_46,
>>> new LimitTokenCountAnalyzer( new DefaultAnalyzer(),
>>> Integer.MAX_VALUE ) );
>>>
>>>
>>> conf.setOpenMode( OpenMode.APPEND );
>>>
>>> IndexWriter indexWriter = new IndexWriter(
>>> FSDirectory.open( new
>>> File( directoryPath )), conf );
>>>
>>> SearcherManager is configured like this:
>>>
>>> searcherManager = new SearcherManager(indexWriter, true,
>>> null);
>>>
>>> // The anlyzer that we are using looks like this:
>>>
>>> public class DefaultAnalyzer extends Analyzer
>>> {
>>> @Override
>>> protected TokenStreamComponents createComponents(final
>>> String
>>> fieldName,
>>> final Reader reader) {
>>> return new TokenStreamComponents(new
>>> WhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));
>>> }
>>> }
>>>
>>> The update of the index looks like this:
>>>
>>> // instead of 42 the unique business identifier is used
>>> Long myUniqueBusinessId = 42l;
>>> BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);
>>> NumericUtils.longToPrefixCoded(
>>> myUniqueBusinessId.longValue(), 0,
>>> ref );
>>> Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );
>>>
>>> // this method may be called multiple times with the same
>>> term and
>>> luceneDocumentToIndex parameter
>>> indexWriter.updateDocument( term, luceneDocumentToIndex);
>>>
>>> // After performing a couple of updates we execute
>>> searcherManager.maybeRefreshBlocking();
>>>
>>>
>>> // For searching we are using the following code
>>> searcher = searcherManager.acquire();
>>> // luceneQuery is the query, filter is some sort of
>>> filtering that
>>> we apply, luceneSort is some sorting query
>>> TopDocs topDocs = searcher.search( luceneQuery, filter,
>>> 1000,
>>> luceneSort );
>>>
>>> // If we perform a query for MY_UNIQUE_BUSINESS_ID it will return
>>> multiple
>>> results instead of just one - this was neither the case with lucene
>>> 3.0 nor
>>> 3.6
>>>
>>>
>>> In order to fix the issue I tried couple of things but to now
>>> avail. It
>>> still happens (not all the time though) that the lucene returns two
>>> documents when querying for MY_UNIQUE_BUSINESS_ID instead of just
>>> one
>>> - setting setMaxBufferedDeleteTerms to 1 in the config
>>> conf.setMaxBufferedDeleteTerms( 1 );
>>> - explicetly deleting instead of just updating
>>> indexWriter.deleteDocuments( term );
>>> - ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in the
>>> index and
>>> not just analysed
>>> - trying to delete the document via indexWriter.tryDeleteDocument()
>>> - calling indexWriter.maybeMerge() after the update
>>> - calling indexWriter.commit() after the update
>>>
>>>
>>> Sorry for the lenghty post but I wanted to include as much
>>> information as
>>> possible. Let me know if something is missing...
>>>
>>> Thanks for helping in advance ;-)
>>>
>>> Kai
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: updateDocument (somtimes) no longer deleting documents after
Update to 4.6
Posted by no...@kaigrabfelder.de.
I'll see if I can dig a little bit deeper into the 3.6 behavior, for
now I'm trying to get it running on 4.6 (as the index file is also a lot
smaller - on 3.6 it was about 2 GB for about 9000 documents, with 4.6
it's only about 200 MB).
And yes the business ID is indexed - otherwhise I wouldn't be able to
find it at all - The problem is not that I can't find it but I find it
twice. And to make matters worse not consistently all the bime but only
sometimes. Somehow it looks like the delete (before the update) does
sometimes work and sometimes not. Do you know any chances why this could
happen? Maybe something related to the MergePoliy (which we don't set
e.g. we are using the default)
Best Regards
Kai
Am 2014-02-24 12:10, schrieb Michael McCandless:
> The 30 second turnaround time in 3.6.x is absurd; if you turn on
> IndexWriter's infoStream maybe it'd give a clue. Or, capture a few
> stack traces and post them.
>
> How are you creating the luceneDocumentToIndex? You must ensure that
> the business ID is in fact indexed as a field in the document,
> otherwise the update won't find it.
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Feb 24, 2014 at 5:33 AM, <no...@kaigrabfelder.de> wrote:
>> Hi there,
>>
>> we recently updated our application from lucene 3.0 to 3.6 with the
>> effect
>> that (albeit using the SearchManager functionality as described on
>>
>> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html)
>> calls to searcherManager.maybeRefresh() were incredibly slow. e.g.
>> taking
>> about 30 seconds after adding one document to the index with an
>> index of
>> about 9000 documents. I assumed that we did something wrong with the
>> configuration as 30 seconds could not be meant with NRT ;-)
>>
>> Thus we migrated to the latest 4.6 version and indexing speed was
>> indeed
>> very good now (with the searcherManager.maybeRefreshBlocking() call
>> only
>> taking milliseconds to complete). But after some wore testing we
>> discovered
>> that somehow the indexWriter.updateDocument( term, documentToIndex )
>> functionality wasn't working anymore as expected - at least
>> somtetimes. It
>> looks like either the updateDocument method does not longer reliably
>> delete
>> the old document before adding a new one - with the result that
>> older
>> documents are beeing returned by searches breaking our application.
>>
>> Unfortunately I'm not able to reproduce the issues in a simple unit
>> test but
>> maybe somebody of the lucene experts knows what we are doing wrong
>> here. Not
>> sure if it is of any relevance but we are running on Windows with a
>> 64 bit
>> JDK 7 thus MMapDirectory is beeing used.
>>
>> Our Index Writer is configured like this:
>>
>> IndexWriterConfig conf = new IndexWriterConfig(
>> Version.LUCENE_46,
>> new LimitTokenCountAnalyzer( new DefaultAnalyzer(),
>> Integer.MAX_VALUE ) );
>>
>>
>> conf.setOpenMode( OpenMode.APPEND );
>>
>> IndexWriter indexWriter = new IndexWriter( FSDirectory.open(
>> new
>> File( directoryPath )), conf );
>>
>> SearcherManager is configured like this:
>>
>> searcherManager = new SearcherManager(indexWriter, true,
>> null);
>>
>> // The anlyzer that we are using looks like this:
>>
>> public class DefaultAnalyzer extends Analyzer
>> {
>> @Override
>> protected TokenStreamComponents createComponents(final
>> String
>> fieldName,
>> final Reader reader) {
>> return new TokenStreamComponents(new
>> WhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));
>> }
>> }
>>
>> The update of the index looks like this:
>>
>> // instead of 42 the unique business identifier is used
>> Long myUniqueBusinessId = 42l;
>> BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);
>> NumericUtils.longToPrefixCoded(
>> myUniqueBusinessId.longValue(), 0,
>> ref );
>> Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );
>>
>> // this method may be called multiple times with the same
>> term and
>> luceneDocumentToIndex parameter
>> indexWriter.updateDocument( term, luceneDocumentToIndex);
>>
>> // After performing a couple of updates we execute
>> searcherManager.maybeRefreshBlocking();
>>
>>
>> // For searching we are using the following code
>> searcher = searcherManager.acquire();
>> // luceneQuery is the query, filter is some sort of
>> filtering that
>> we apply, luceneSort is some sorting query
>> TopDocs topDocs = searcher.search( luceneQuery, filter,
>> 1000,
>> luceneSort );
>>
>> // If we perform a query for MY_UNIQUE_BUSINESS_ID it will return
>> multiple
>> results instead of just one - this was neither the case with lucene
>> 3.0 nor
>> 3.6
>>
>>
>> In order to fix the issue I tried couple of things but to now avail.
>> It
>> still happens (not all the time though) that the lucene returns two
>> documents when querying for MY_UNIQUE_BUSINESS_ID instead of just
>> one
>> - setting setMaxBufferedDeleteTerms to 1 in the config
>> conf.setMaxBufferedDeleteTerms( 1 );
>> - explicetly deleting instead of just updating
>> indexWriter.deleteDocuments( term );
>> - ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in the
>> index and
>> not just analysed
>> - trying to delete the document via indexWriter.tryDeleteDocument()
>> - calling indexWriter.maybeMerge() after the update
>> - calling indexWriter.commit() after the update
>>
>>
>> Sorry for the lenghty post but I wanted to include as much
>> information as
>> possible. Let me know if something is missing...
>>
>> Thanks for helping in advance ;-)
>>
>> Kai
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: updateDocument (somtimes) no longer deleting documents after
Update to 4.6
Posted by Michael McCandless <lu...@mikemccandless.com>.
The 30 second turnaround time in 3.6.x is absurd; if you turn on
IndexWriter's infoStream maybe it'd give a clue. Or, capture a few
stack traces and post them.
How are you creating the luceneDocumentToIndex? You must ensure that
the business ID is in fact indexed as a field in the document,
otherwise the update won't find it.
Mike McCandless
http://blog.mikemccandless.com
On Mon, Feb 24, 2014 at 5:33 AM, <no...@kaigrabfelder.de> wrote:
> Hi there,
>
> we recently updated our application from lucene 3.0 to 3.6 with the effect
> that (albeit using the SearchManager functionality as described on
> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html)
> calls to searcherManager.maybeRefresh() were incredibly slow. e.g. taking
> about 30 seconds after adding one document to the index with an index of
> about 9000 documents. I assumed that we did something wrong with the
> configuration as 30 seconds could not be meant with NRT ;-)
>
> Thus we migrated to the latest 4.6 version and indexing speed was indeed
> very good now (with the searcherManager.maybeRefreshBlocking() call only
> taking milliseconds to complete). But after some wore testing we discovered
> that somehow the indexWriter.updateDocument( term, documentToIndex )
> functionality wasn't working anymore as expected - at least somtetimes. It
> looks like either the updateDocument method does not longer reliably delete
> the old document before adding a new one - with the result that older
> documents are beeing returned by searches breaking our application.
>
> Unfortunately I'm not able to reproduce the issues in a simple unit test but
> maybe somebody of the lucene experts knows what we are doing wrong here. Not
> sure if it is of any relevance but we are running on Windows with a 64 bit
> JDK 7 thus MMapDirectory is beeing used.
>
> Our Index Writer is configured like this:
>
> IndexWriterConfig conf = new IndexWriterConfig( Version.LUCENE_46,
> new LimitTokenCountAnalyzer( new DefaultAnalyzer(), Integer.MAX_VALUE ) );
>
>
> conf.setOpenMode( OpenMode.APPEND );
>
> IndexWriter indexWriter = new IndexWriter( FSDirectory.open( new
> File( directoryPath )), conf );
>
> SearcherManager is configured like this:
>
> searcherManager = new SearcherManager(indexWriter, true, null);
>
> // The anlyzer that we are using looks like this:
>
> public class DefaultAnalyzer extends Analyzer
> {
> @Override
> protected TokenStreamComponents createComponents(final String
> fieldName,
> final Reader reader) {
> return new TokenStreamComponents(new
> WhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));
> }
> }
>
> The update of the index looks like this:
>
> // instead of 42 the unique business identifier is used
> Long myUniqueBusinessId = 42l;
> BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);
> NumericUtils.longToPrefixCoded( myUniqueBusinessId.longValue(), 0,
> ref );
> Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );
>
> // this method may be called multiple times with the same term and
> luceneDocumentToIndex parameter
> indexWriter.updateDocument( term, luceneDocumentToIndex);
>
> // After performing a couple of updates we execute
> searcherManager.maybeRefreshBlocking();
>
>
> // For searching we are using the following code
> searcher = searcherManager.acquire();
> // luceneQuery is the query, filter is some sort of filtering that
> we apply, luceneSort is some sorting query
> TopDocs topDocs = searcher.search( luceneQuery, filter, 1000,
> luceneSort );
>
> // If we perform a query for MY_UNIQUE_BUSINESS_ID it will return multiple
> results instead of just one - this was neither the case with lucene 3.0 nor
> 3.6
>
>
> In order to fix the issue I tried couple of things but to now avail. It
> still happens (not all the time though) that the lucene returns two
> documents when querying for MY_UNIQUE_BUSINESS_ID instead of just one
> - setting setMaxBufferedDeleteTerms to 1 in the config
> conf.setMaxBufferedDeleteTerms( 1 );
> - explicetly deleting instead of just updating
> indexWriter.deleteDocuments( term );
> - ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in the index and
> not just analysed
> - trying to delete the document via indexWriter.tryDeleteDocument()
> - calling indexWriter.maybeMerge() after the update
> - calling indexWriter.commit() after the update
>
>
> Sorry for the lenghty post but I wanted to include as much information as
> possible. Let me know if something is missing...
>
> Thanks for helping in advance ;-)
>
> Kai
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org