You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Todd Feak (JIRA)" <ji...@apache.org> on 2008/06/25 17:58:44 UTC

[jira] Created: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
------------------------------------------------------------------------

                 Key: LUCENE-1316
                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
             Project: Lucene - Java
          Issue Type: Bug
          Components: Query/Scoring
    Affects Versions: 2.3
         Environment: All
            Reporter: Todd Feak
            Priority: Minor


The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).

In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.

By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.

I modified line 67 in the MatchAllDocsQuery
FROM:
  if (!reader.isDeleted(id)) {
TO:
  if (!reader.hasDeletions() || !reader.isDeleted(id)) {

In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 

Please consider making this adjustment for a future bug fix release.






-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Resolved: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by Michael McCandless <lu...@mikemccandless.com>.
Indeed!

I'll open an issue...

Thanks Peter.

Mike

Peter Keegan wrote:

> I just ran into the same bottleneck with ValueSourceScorer.
> Here's a stack trace:
>
> Name: QueryThread group 1,#4
> State: BLOCKED on org.apache.lucene.index.SegmentReader@49d7fb83  
> owned by: QueryThread group 1,#8
> Total blocked: 1,535,881  Total waited: 1,080
>
> Stack trace:
> org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java: 
> 663)
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:221)
> org.apache.lucene.search.function.ValueSourceQuery 
> $ValueSourceScorer.next(ValueSourceQuery.java:138)
> org.apache.lucene.search.function.ValueSourceQuery 
> $ValueSourceScorer.skipTo(ValueSourceQuery.java:159)
> org.apache.lucene.search.function.CustomScoreQuery 
> $CustomScorer.skipTo(CustomScoreQuery.java:399)
> org 
> .apache 
> .lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:59)
> org 
> .apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java: 
> 51)
> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:319)
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)
> org.apache.lucene.search.Searcher.search(Searcher.java:118)
>
> Peter
>
>
> On Sun, Jan 25, 2009 at 9:41 AM, Michael McCandless (JIRA) <jira@apache.org 
> > wrote:
>
>     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel 
>  ]
>
> Michael McCandless resolved LUCENE-1316.
> ----------------------------------------
>
>    Resolution: Fixed
>
> Committed revision 737513.  Thanks everyone!
>
> > Avoidable synchronization bottleneck in MatchAlldocsQuery 
> $MatchAllScorer
> >  
> ------------------------------------------------------------------------
> >
> >                 Key: LUCENE-1316
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
> >             Project: Lucene - Java
> >          Issue Type: Bug
> >          Components: Query/Scoring
> >    Affects Versions: 2.3
> >         Environment: All
> >            Reporter: Todd Feak
> >            Assignee: Michael McCandless
> >            Priority: Minor
> >             Fix For: 2.9
> >
> >         Attachments: LUCENE-1316.patch, LUCENE-1316.patch,  
> LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch,  
> LUCENE_1316.patch, MatchAllDocsQuery.java
> >
> >   Original Estimate: 1h
> >  Remaining Estimate: 1h
> >
> > The isDeleted() method on IndexReader has been mentioned a number  
> of times as a potential synchronization bottleneck. However, the  
> reason this  bottleneck occurs is actually at a higher level that  
> wasn't focused on (at least in the threads I read).
> > In every case I saw where a stack trace was provided to show the  
> lock/block, higher in the stack you see the MatchAllScorer.next()  
> method. In Solr paricularly, this scorer is used for "NOT" queries.  
> We saw incredibly poor performance (order of magnitude) on our load  
> tests for NOT queries, due to this bottleneck. The problem is that  
> every single document is run through this isDeleted() method, which  
> is synchronized. Having an optimized index exacerbates this issues,  
> as there is only a single SegmentReader to synchronize on, causing a  
> major thread pileup waiting for the lock.
> > By simply having the MatchAllScorer see if there have been any  
> deletions in the reader, much of this can be avoided. Especially in  
> a read-only environment for production where you have slaves doing  
> all the high load searching.
> > I modified line 67 in the MatchAllDocsQuery
> > FROM:
> >   if (!reader.isDeleted(id)) {
> > TO:
> >   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> > In our micro load test for NOT queries only, this was a major  
> performance improvement.  We also got the same query results. I  
> don't believe this will improve the situation for indexes that have  
> deletions.
> > Please consider making this adjustment for a future bug fix release.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Resolved: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by Michael McCandless <lu...@mikemccandless.com>.
Super, thanks Peter!

Mike

Peter Keegan wrote:

> Ok, I applied LUCENE-1316. (I had to patch MutliSegmentReader  
> manually, probably because I'm running 2.3.2 and patch is for 2.4?)
> The patch fixes the sychronization bottleneck in ValueSourceQuery.
>
> Thank you Mike!
>
> Peter
>
> On Sat, Feb 7, 2009 at 10:38 AM, Michael McCandless <lucene@mikemccandless.com 
> > wrote:
>
> Hi Peter,
>
> You'll definitely need to start from the full patch in LUCENE-1316,  
> which fixes SegmentReader & Multi*Reader to use AllTermDocs when  
> given a null Term.
>
> Mike
>
>
> Peter Keegan wrote:
>
> Hi Mike,
>
> Is this patch self contained or do I need the other patches from  
> LUCENE-1316?
> I tried just this patch, but I get a NPE in MultiSegment Reader  
> (current  = null):
>
>    at org.apache.lucene.index.MultiSegmentReader 
> $MultiTermDocs.doc(MultiSegmentReader.java:531)
>    at org.apache.lucene.search.function.ValueSourceQuery 
> $ValueSourceScorer.score(ValueSourceQuery.java:142)
>    at org.apache.lucene.search.function.CustomScoreQuery 
> $CustomScorer.score(CustomScoreQuery.java:389)
>    at org.apache.lucene.search.Scorer.score(Scorer.java:58)
>    at  
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)
>    at org.apache.lucene.search.Searcher.search(Searcher.java:118)
>
> Peter
>
>
> On Fri, Feb 6, 2009 at 7:13 PM, Michael McCandless <lucene@mikemccandless.com 
> > wrote:
>
> OK I just put the patch on there -- let me know.  Thanks,
>
> Mike
>
>
> Peter Keegan wrote:
>
> Thanks, Mike. I'll be happy to help test the patch.
>
> Peter
>
> On Fri, Feb 6, 2009 at 6:44 PM, Michael McCandless <lucene@mikemccandless.com 
> > wrote:
>
> OK I created https://issues.apache.org/jira/browse/LUCENE-1538.
>
> Mike
>
>
> Peter Keegan wrote:
>
> I just ran into the same bottleneck with ValueSourceScorer.
> Here's a stack trace:
>
> Name: QueryThread group 1,#4
> State: BLOCKED on org.apache.lucene.index.SegmentReader@49d7fb83  
> owned by: QueryThread group 1,#8
> Total blocked: 1,535,881  Total waited: 1,080
>
> Stack trace:
> org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java: 
> 663)
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:221)
> org.apache.lucene.search.function.ValueSourceQuery 
> $ValueSourceScorer.next(ValueSourceQuery.java:138)
> org.apache.lucene.search.function.ValueSourceQuery 
> $ValueSourceScorer.skipTo(ValueSourceQuery.java:159)
> org.apache.lucene.search.function.CustomScoreQuery 
> $CustomScorer.skipTo(CustomScoreQuery.java:399)
> org 
> .apache 
> .lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:59)
> org 
> .apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java: 
> 51)
> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:319)
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)
> org.apache.lucene.search.Searcher.search(Searcher.java:118)
>
> Peter
>
>
> On Sun, Jan 25, 2009 at 9:41 AM, Michael McCandless (JIRA) <jira@apache.org 
> > wrote:
>
>  [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel 
>  ]
>
> Michael McCandless resolved LUCENE-1316.
> ----------------------------------------
>
>  Resolution: Fixed
>
> Committed revision 737513.  Thanks everyone!
>
> > Avoidable synchronization bottleneck in MatchAlldocsQuery 
> $MatchAllScorer
> >  
> ------------------------------------------------------------------------
> >
> >                 Key: LUCENE-1316
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
> >             Project: Lucene - Java
> >          Issue Type: Bug
> >          Components: Query/Scoring
> >    Affects Versions: 2.3
> >         Environment: All
> >            Reporter: Todd Feak
> >            Assignee: Michael McCandless
> >            Priority: Minor
> >             Fix For: 2.9
> >
> >         Attachments: LUCENE-1316.patch, LUCENE-1316.patch,  
> LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch,  
> LUCENE_1316.patch, MatchAllDocsQuery.java
> >
> >   Original Estimate: 1h
> >  Remaining Estimate: 1h
> >
> > The isDeleted() method on IndexReader has been mentioned a number  
> of times as a potential synchronization bottleneck. However, the  
> reason this  bottleneck occurs is actually at a higher level that  
> wasn't focused on (at least in the threads I read).
> > In every case I saw where a stack trace was provided to show the  
> lock/block, higher in the stack you see the MatchAllScorer.next()  
> method. In Solr paricularly, this scorer is used for "NOT" queries.  
> We saw incredibly poor performance (order of magnitude) on our load  
> tests for NOT queries, due to this bottleneck. The problem is that  
> every single document is run through this isDeleted() method, which  
> is synchronized. Having an optimized index exacerbates this issues,  
> as there is only a single SegmentReader to synchronize on, causing a  
> major thread pileup waiting for the lock.
> > By simply having the MatchAllScorer see if there have been any  
> deletions in the reader, much of this can be avoided. Especially in  
> a read-only environment for production where you have slaves doing  
> all the high load searching.
> > I modified line 67 in the MatchAllDocsQuery
> > FROM:
> >   if (!reader.isDeleted(id)) {
> > TO:
> >   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> > In our micro load test for NOT queries only, this was a major  
> performance improvement.  We also got the same query results. I  
> don't believe this will improve the situation for indexes that have  
> deletions.
> > Please consider making this adjustment for a future bug fix release.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Resolved: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by Peter Keegan <pe...@gmail.com>.
Ok, I applied LUCENE-1316. (I had to patch MutliSegmentReader manually,
probably because I'm running 2.3.2 and patch is for 2.4?)
The patch fixes the sychronization bottleneck in ValueSourceQuery.

Thank you Mike!

Peter

On Sat, Feb 7, 2009 at 10:38 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

>
> Hi Peter,
>
> You'll definitely need to start from the full patch in LUCENE-1316, which
> fixes SegmentReader & Multi*Reader to use AllTermDocs when given a null
> Term.
>
> Mike
>
>
> Peter Keegan wrote:
>
>  Hi Mike,
>>
>> Is this patch self contained or do I need the other patches from
>> LUCENE-1316?
>> I tried just this patch, but I get a NPE in MultiSegment Reader (current
>>  = null):
>>
>>    at
>> org.apache.lucene.index.MultiSegmentReader$MultiTermDocs.doc(MultiSegmentReader.java:531)
>>    at
>> org.apache.lucene.search.function.ValueSourceQuery$ValueSourceScorer.score(ValueSourceQuery.java:142)
>>    at
>> org.apache.lucene.search.function.CustomScoreQuery$CustomScorer.score(CustomScoreQuery.java:389)
>>    at org.apache.lucene.search.Scorer.score(Scorer.java:58)
>>    at
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)
>>    at org.apache.lucene.search.Searcher.search(Searcher.java:118)
>>
>> Peter
>>
>>
>> On Fri, Feb 6, 2009 at 7:13 PM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>> OK I just put the patch on there -- let me know.  Thanks,
>>
>> Mike
>>
>>
>> Peter Keegan wrote:
>>
>> Thanks, Mike. I'll be happy to help test the patch.
>>
>> Peter
>>
>> On Fri, Feb 6, 2009 at 6:44 PM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>> OK I created https://issues.apache.org/jira/browse/LUCENE-1538.
>>
>> Mike
>>
>>
>> Peter Keegan wrote:
>>
>> I just ran into the same bottleneck with ValueSourceScorer.
>> Here's a stack trace:
>>
>> Name: QueryThread group 1,#4
>> State: BLOCKED on org.apache.lucene.index.SegmentReader@49d7fb83 owned
>> by: QueryThread group 1,#8
>> Total blocked: 1,535,881  Total waited: 1,080
>>
>> Stack trace:
>> org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java:663)
>> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:221)
>>
>> org.apache.lucene.search.function.ValueSourceQuery$ValueSourceScorer.next(ValueSourceQuery.java:138)
>>
>> org.apache.lucene.search.function.ValueSourceQuery$ValueSourceScorer.skipTo(ValueSourceQuery.java:159)
>>
>> org.apache.lucene.search.function.CustomScoreQuery$CustomScorer.skipTo(CustomScoreQuery.java:399)
>>
>> org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:59)
>> org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:51)
>> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:319)
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)
>> org.apache.lucene.search.Searcher.search(Searcher.java:118)
>>
>> Peter
>>
>>
>> On Sun, Jan 25, 2009 at 9:41 AM, Michael McCandless (JIRA) <
>> jira@apache.org> wrote:
>>
>>  [
>> https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>  ]
>>
>> Michael McCandless resolved LUCENE-1316.
>> ----------------------------------------
>>
>>  Resolution: Fixed
>>
>> Committed revision 737513.  Thanks everyone!
>>
>> > Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
>> > ------------------------------------------------------------------------
>> >
>> >                 Key: LUCENE-1316
>> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>> >             Project: Lucene - Java
>> >          Issue Type: Bug
>> >          Components: Query/Scoring
>> >    Affects Versions: 2.3
>> >         Environment: All
>> >            Reporter: Todd Feak
>> >            Assignee: Michael McCandless
>> >            Priority: Minor
>> >             Fix For: 2.9
>> >
>> >         Attachments: LUCENE-1316.patch, LUCENE-1316.patch,
>> LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch,
>> MatchAllDocsQuery.java
>> >
>> >   Original Estimate: 1h
>> >  Remaining Estimate: 1h
>> >
>> > The isDeleted() method on IndexReader has been mentioned a number of
>> times as a potential synchronization bottleneck. However, the reason this
>>  bottleneck occurs is actually at a higher level that wasn't focused on (at
>> least in the threads I read).
>> > In every case I saw where a stack trace was provided to show the
>> lock/block, higher in the stack you see the MatchAllScorer.next() method. In
>> Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly
>> poor performance (order of magnitude) on our load tests for NOT queries, due
>> to this bottleneck. The problem is that every single document is run through
>> this isDeleted() method, which is synchronized. Having an optimized index
>> exacerbates this issues, as there is only a single SegmentReader to
>> synchronize on, causing a major thread pileup waiting for the lock.
>> > By simply having the MatchAllScorer see if there have been any deletions
>> in the reader, much of this can be avoided. Especially in a read-only
>> environment for production where you have slaves doing all the high load
>> searching.
>> > I modified line 67 in the MatchAllDocsQuery
>> > FROM:
>> >   if (!reader.isDeleted(id)) {
>> > TO:
>> >   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
>> > In our micro load test for NOT queries only, this was a major
>> performance improvement.  We also got the same query results. I don't
>> believe this will improve the situation for indexes that have deletions.
>> > Please consider making this adjustment for a future bug fix release.
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: [jira] Resolved: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by Michael McCandless <lu...@mikemccandless.com>.
Hi Peter,

You'll definitely need to start from the full patch in LUCENE-1316,  
which fixes SegmentReader & Multi*Reader to use AllTermDocs when given  
a null Term.

Mike

Peter Keegan wrote:

> Hi Mike,
>
> Is this patch self contained or do I need the other patches from  
> LUCENE-1316?
> I tried just this patch, but I get a NPE in MultiSegment Reader  
> (current  = null):
>
>     at org.apache.lucene.index.MultiSegmentReader 
> $MultiTermDocs.doc(MultiSegmentReader.java:531)
>     at org.apache.lucene.search.function.ValueSourceQuery 
> $ValueSourceScorer.score(ValueSourceQuery.java:142)
>     at org.apache.lucene.search.function.CustomScoreQuery 
> $CustomScorer.score(CustomScoreQuery.java:389)
>     at org.apache.lucene.search.Scorer.score(Scorer.java:58)
>     at  
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)
>     at org.apache.lucene.search.Searcher.search(Searcher.java:118)
>
> Peter
>
>
> On Fri, Feb 6, 2009 at 7:13 PM, Michael McCandless <lucene@mikemccandless.com 
> > wrote:
>
> OK I just put the patch on there -- let me know.  Thanks,
>
> Mike
>
>
> Peter Keegan wrote:
>
> Thanks, Mike. I'll be happy to help test the patch.
>
> Peter
>
> On Fri, Feb 6, 2009 at 6:44 PM, Michael McCandless <lucene@mikemccandless.com 
> > wrote:
>
> OK I created https://issues.apache.org/jira/browse/LUCENE-1538.
>
> Mike
>
>
> Peter Keegan wrote:
>
> I just ran into the same bottleneck with ValueSourceScorer.
> Here's a stack trace:
>
> Name: QueryThread group 1,#4
> State: BLOCKED on org.apache.lucene.index.SegmentReader@49d7fb83  
> owned by: QueryThread group 1,#8
> Total blocked: 1,535,881  Total waited: 1,080
>
> Stack trace:
> org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java: 
> 663)
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:221)
> org.apache.lucene.search.function.ValueSourceQuery 
> $ValueSourceScorer.next(ValueSourceQuery.java:138)
> org.apache.lucene.search.function.ValueSourceQuery 
> $ValueSourceScorer.skipTo(ValueSourceQuery.java:159)
> org.apache.lucene.search.function.CustomScoreQuery 
> $CustomScorer.skipTo(CustomScoreQuery.java:399)
> org 
> .apache 
> .lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:59)
> org 
> .apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java: 
> 51)
> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:319)
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)
> org.apache.lucene.search.Searcher.search(Searcher.java:118)
>
> Peter
>
>
> On Sun, Jan 25, 2009 at 9:41 AM, Michael McCandless (JIRA) <jira@apache.org 
> > wrote:
>
>   [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel 
>  ]
>
> Michael McCandless resolved LUCENE-1316.
> ----------------------------------------
>
>  Resolution: Fixed
>
> Committed revision 737513.  Thanks everyone!
>
> > Avoidable synchronization bottleneck in MatchAlldocsQuery 
> $MatchAllScorer
> >  
> ------------------------------------------------------------------------
> >
> >                 Key: LUCENE-1316
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
> >             Project: Lucene - Java
> >          Issue Type: Bug
> >          Components: Query/Scoring
> >    Affects Versions: 2.3
> >         Environment: All
> >            Reporter: Todd Feak
> >            Assignee: Michael McCandless
> >            Priority: Minor
> >             Fix For: 2.9
> >
> >         Attachments: LUCENE-1316.patch, LUCENE-1316.patch,  
> LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch,  
> LUCENE_1316.patch, MatchAllDocsQuery.java
> >
> >   Original Estimate: 1h
> >  Remaining Estimate: 1h
> >
> > The isDeleted() method on IndexReader has been mentioned a number  
> of times as a potential synchronization bottleneck. However, the  
> reason this  bottleneck occurs is actually at a higher level that  
> wasn't focused on (at least in the threads I read).
> > In every case I saw where a stack trace was provided to show the  
> lock/block, higher in the stack you see the MatchAllScorer.next()  
> method. In Solr paricularly, this scorer is used for "NOT" queries.  
> We saw incredibly poor performance (order of magnitude) on our load  
> tests for NOT queries, due to this bottleneck. The problem is that  
> every single document is run through this isDeleted() method, which  
> is synchronized. Having an optimized index exacerbates this issues,  
> as there is only a single SegmentReader to synchronize on, causing a  
> major thread pileup waiting for the lock.
> > By simply having the MatchAllScorer see if there have been any  
> deletions in the reader, much of this can be avoided. Especially in  
> a read-only environment for production where you have slaves doing  
> all the high load searching.
> > I modified line 67 in the MatchAllDocsQuery
> > FROM:
> >   if (!reader.isDeleted(id)) {
> > TO:
> >   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> > In our micro load test for NOT queries only, this was a major  
> performance improvement.  We also got the same query results. I  
> don't believe this will improve the situation for indexes that have  
> deletions.
> > Please consider making this adjustment for a future bug fix release.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Resolved: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by Peter Keegan <pe...@gmail.com>.
Hi Mike,

Is this patch self contained or do I need the other patches from
LUCENE-1316?
I tried just this patch, but I get a NPE in MultiSegment Reader (current  =
null):

    at
org.apache.lucene.index.MultiSegmentReader$MultiTermDocs.doc(MultiSegmentReader.java:531)
    at
org.apache.lucene.search.function.ValueSourceQuery$ValueSourceScorer.score(ValueSourceQuery.java:142)
    at
org.apache.lucene.search.function.CustomScoreQuery$CustomScorer.score(CustomScoreQuery.java:389)
    at org.apache.lucene.search.Scorer.score(Scorer.java:58)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)
    at org.apache.lucene.search.Searcher.search(Searcher.java:118)

Peter


On Fri, Feb 6, 2009 at 7:13 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

>
> OK I just put the patch on there -- let me know.  Thanks,
>
> Mike
>
>
> Peter Keegan wrote:
>
>  Thanks, Mike. I'll be happy to help test the patch.
>>
>> Peter
>>
>> On Fri, Feb 6, 2009 at 6:44 PM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>> OK I created https://issues.apache.org/jira/browse/LUCENE-1538.
>>
>> Mike
>>
>>
>> Peter Keegan wrote:
>>
>> I just ran into the same bottleneck with ValueSourceScorer.
>> Here's a stack trace:
>>
>> Name: QueryThread group 1,#4
>> State: BLOCKED on org.apache.lucene.index.SegmentReader@49d7fb83 owned
>> by: QueryThread group 1,#8
>> Total blocked: 1,535,881  Total waited: 1,080
>>
>> Stack trace:
>> org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java:663)
>> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:221)
>>
>> org.apache.lucene.search.function.ValueSourceQuery$ValueSourceScorer.next(ValueSourceQuery.java:138)
>>
>> org.apache.lucene.search.function.ValueSourceQuery$ValueSourceScorer.skipTo(ValueSourceQuery.java:159)
>>
>> org.apache.lucene.search.function.CustomScoreQuery$CustomScorer.skipTo(CustomScoreQuery.java:399)
>>
>> org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:59)
>> org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:51)
>> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:319)
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)
>> org.apache.lucene.search.Searcher.search(Searcher.java:118)
>>
>> Peter
>>
>>
>> On Sun, Jan 25, 2009 at 9:41 AM, Michael McCandless (JIRA) <
>> jira@apache.org> wrote:
>>
>>   [
>> https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>  ]
>>
>> Michael McCandless resolved LUCENE-1316.
>> ----------------------------------------
>>
>>  Resolution: Fixed
>>
>> Committed revision 737513.  Thanks everyone!
>>
>> > Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
>> > ------------------------------------------------------------------------
>> >
>> >                 Key: LUCENE-1316
>> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>> >             Project: Lucene - Java
>> >          Issue Type: Bug
>> >          Components: Query/Scoring
>> >    Affects Versions: 2.3
>> >         Environment: All
>> >            Reporter: Todd Feak
>> >            Assignee: Michael McCandless
>> >            Priority: Minor
>> >             Fix For: 2.9
>> >
>> >         Attachments: LUCENE-1316.patch, LUCENE-1316.patch,
>> LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch,
>> MatchAllDocsQuery.java
>> >
>> >   Original Estimate: 1h
>> >  Remaining Estimate: 1h
>> >
>> > The isDeleted() method on IndexReader has been mentioned a number of
>> times as a potential synchronization bottleneck. However, the reason this
>>  bottleneck occurs is actually at a higher level that wasn't focused on (at
>> least in the threads I read).
>> > In every case I saw where a stack trace was provided to show the
>> lock/block, higher in the stack you see the MatchAllScorer.next() method. In
>> Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly
>> poor performance (order of magnitude) on our load tests for NOT queries, due
>> to this bottleneck. The problem is that every single document is run through
>> this isDeleted() method, which is synchronized. Having an optimized index
>> exacerbates this issues, as there is only a single SegmentReader to
>> synchronize on, causing a major thread pileup waiting for the lock.
>> > By simply having the MatchAllScorer see if there have been any deletions
>> in the reader, much of this can be avoided. Especially in a read-only
>> environment for production where you have slaves doing all the high load
>> searching.
>> > I modified line 67 in the MatchAllDocsQuery
>> > FROM:
>> >   if (!reader.isDeleted(id)) {
>> > TO:
>> >   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
>> > In our micro load test for NOT queries only, this was a major
>> performance improvement.  We also got the same query results. I don't
>> believe this will improve the situation for indexes that have deletions.
>> > Please consider making this adjustment for a future bug fix release.
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: [jira] Resolved: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by Michael McCandless <lu...@mikemccandless.com>.
OK I just put the patch on there -- let me know.  Thanks,

Mike

Peter Keegan wrote:

> Thanks, Mike. I'll be happy to help test the patch.
>
> Peter
>
> On Fri, Feb 6, 2009 at 6:44 PM, Michael McCandless <lucene@mikemccandless.com 
> > wrote:
>
> OK I created https://issues.apache.org/jira/browse/LUCENE-1538.
>
> Mike
>
>
> Peter Keegan wrote:
>
> I just ran into the same bottleneck with ValueSourceScorer.
> Here's a stack trace:
>
> Name: QueryThread group 1,#4
> State: BLOCKED on org.apache.lucene.index.SegmentReader@49d7fb83  
> owned by: QueryThread group 1,#8
> Total blocked: 1,535,881  Total waited: 1,080
>
> Stack trace:
> org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java: 
> 663)
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:221)
> org.apache.lucene.search.function.ValueSourceQuery 
> $ValueSourceScorer.next(ValueSourceQuery.java:138)
> org.apache.lucene.search.function.ValueSourceQuery 
> $ValueSourceScorer.skipTo(ValueSourceQuery.java:159)
> org.apache.lucene.search.function.CustomScoreQuery 
> $CustomScorer.skipTo(CustomScoreQuery.java:399)
> org 
> .apache 
> .lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:59)
> org 
> .apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java: 
> 51)
> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:319)
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)
> org.apache.lucene.search.Searcher.search(Searcher.java:118)
>
> Peter
>
>
> On Sun, Jan 25, 2009 at 9:41 AM, Michael McCandless (JIRA) <jira@apache.org 
> > wrote:
>
>    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel 
>  ]
>
> Michael McCandless resolved LUCENE-1316.
> ----------------------------------------
>
>   Resolution: Fixed
>
> Committed revision 737513.  Thanks everyone!
>
> > Avoidable synchronization bottleneck in MatchAlldocsQuery 
> $MatchAllScorer
> >  
> ------------------------------------------------------------------------
> >
> >                 Key: LUCENE-1316
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
> >             Project: Lucene - Java
> >          Issue Type: Bug
> >          Components: Query/Scoring
> >    Affects Versions: 2.3
> >         Environment: All
> >            Reporter: Todd Feak
> >            Assignee: Michael McCandless
> >            Priority: Minor
> >             Fix For: 2.9
> >
> >         Attachments: LUCENE-1316.patch, LUCENE-1316.patch,  
> LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch,  
> LUCENE_1316.patch, MatchAllDocsQuery.java
> >
> >   Original Estimate: 1h
> >  Remaining Estimate: 1h
> >
> > The isDeleted() method on IndexReader has been mentioned a number  
> of times as a potential synchronization bottleneck. However, the  
> reason this  bottleneck occurs is actually at a higher level that  
> wasn't focused on (at least in the threads I read).
> > In every case I saw where a stack trace was provided to show the  
> lock/block, higher in the stack you see the MatchAllScorer.next()  
> method. In Solr paricularly, this scorer is used for "NOT" queries.  
> We saw incredibly poor performance (order of magnitude) on our load  
> tests for NOT queries, due to this bottleneck. The problem is that  
> every single document is run through this isDeleted() method, which  
> is synchronized. Having an optimized index exacerbates this issues,  
> as there is only a single SegmentReader to synchronize on, causing a  
> major thread pileup waiting for the lock.
> > By simply having the MatchAllScorer see if there have been any  
> deletions in the reader, much of this can be avoided. Especially in  
> a read-only environment for production where you have slaves doing  
> all the high load searching.
> > I modified line 67 in the MatchAllDocsQuery
> > FROM:
> >   if (!reader.isDeleted(id)) {
> > TO:
> >   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> > In our micro load test for NOT queries only, this was a major  
> performance improvement.  We also got the same query results. I  
> don't believe this will improve the situation for indexes that have  
> deletions.
> > Please consider making this adjustment for a future bug fix release.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Resolved: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by Peter Keegan <pe...@gmail.com>.
Thanks, Mike. I'll be happy to help test the patch.

Peter

On Fri, Feb 6, 2009 at 6:44 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

>
> OK I created https://issues.apache.org/jira/browse/LUCENE-1538.
>
> Mike
>
>
> Peter Keegan wrote:
>
>  I just ran into the same bottleneck with ValueSourceScorer.
>> Here's a stack trace:
>>
>> Name: QueryThread group 1,#4
>> State: BLOCKED on org.apache.lucene.index.SegmentReader@49d7fb83 owned
>> by: QueryThread group 1,#8
>> Total blocked: 1,535,881  Total waited: 1,080
>>
>> Stack trace:
>> org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java:663)
>> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:221)
>>
>> org.apache.lucene.search.function.ValueSourceQuery$ValueSourceScorer.next(ValueSourceQuery.java:138)
>>
>> org.apache.lucene.search.function.ValueSourceQuery$ValueSourceScorer.skipTo(ValueSourceQuery.java:159)
>>
>> org.apache.lucene.search.function.CustomScoreQuery$CustomScorer.skipTo(CustomScoreQuery.java:399)
>>
>> org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:59)
>> org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:51)
>> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:319)
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)
>> org.apache.lucene.search.Searcher.search(Searcher.java:118)
>>
>> Peter
>>
>>
>> On Sun, Jan 25, 2009 at 9:41 AM, Michael McCandless (JIRA) <
>> jira@apache.org> wrote:
>>
>>    [
>> https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>  ]
>>
>> Michael McCandless resolved LUCENE-1316.
>> ----------------------------------------
>>
>>   Resolution: Fixed
>>
>> Committed revision 737513.  Thanks everyone!
>>
>> > Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
>> > ------------------------------------------------------------------------
>> >
>> >                 Key: LUCENE-1316
>> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>> >             Project: Lucene - Java
>> >          Issue Type: Bug
>> >          Components: Query/Scoring
>> >    Affects Versions: 2.3
>> >         Environment: All
>> >            Reporter: Todd Feak
>> >            Assignee: Michael McCandless
>> >            Priority: Minor
>> >             Fix For: 2.9
>> >
>> >         Attachments: LUCENE-1316.patch, LUCENE-1316.patch,
>> LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch,
>> MatchAllDocsQuery.java
>> >
>> >   Original Estimate: 1h
>> >  Remaining Estimate: 1h
>> >
>> > The isDeleted() method on IndexReader has been mentioned a number of
>> times as a potential synchronization bottleneck. However, the reason this
>>  bottleneck occurs is actually at a higher level that wasn't focused on (at
>> least in the threads I read).
>> > In every case I saw where a stack trace was provided to show the
>> lock/block, higher in the stack you see the MatchAllScorer.next() method. In
>> Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly
>> poor performance (order of magnitude) on our load tests for NOT queries, due
>> to this bottleneck. The problem is that every single document is run through
>> this isDeleted() method, which is synchronized. Having an optimized index
>> exacerbates this issues, as there is only a single SegmentReader to
>> synchronize on, causing a major thread pileup waiting for the lock.
>> > By simply having the MatchAllScorer see if there have been any deletions
>> in the reader, much of this can be avoided. Especially in a read-only
>> environment for production where you have slaves doing all the high load
>> searching.
>> > I modified line 67 in the MatchAllDocsQuery
>> > FROM:
>> >   if (!reader.isDeleted(id)) {
>> > TO:
>> >   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
>> > In our micro load test for NOT queries only, this was a major
>> performance improvement.  We also got the same query results. I don't
>> believe this will improve the situation for indexes that have deletions.
>> > Please consider making this adjustment for a future bug fix release.
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: [jira] Resolved: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by Michael McCandless <lu...@mikemccandless.com>.
OK I created https://issues.apache.org/jira/browse/LUCENE-1538.

Mike

Peter Keegan wrote:

> I just ran into the same bottleneck with ValueSourceScorer.
> Here's a stack trace:
>
> Name: QueryThread group 1,#4
> State: BLOCKED on org.apache.lucene.index.SegmentReader@49d7fb83  
> owned by: QueryThread group 1,#8
> Total blocked: 1,535,881  Total waited: 1,080
>
> Stack trace:
> org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java: 
> 663)
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:221)
> org.apache.lucene.search.function.ValueSourceQuery 
> $ValueSourceScorer.next(ValueSourceQuery.java:138)
> org.apache.lucene.search.function.ValueSourceQuery 
> $ValueSourceScorer.skipTo(ValueSourceQuery.java:159)
> org.apache.lucene.search.function.CustomScoreQuery 
> $CustomScorer.skipTo(CustomScoreQuery.java:399)
> org 
> .apache 
> .lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:59)
> org 
> .apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java: 
> 51)
> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:319)
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)
> org.apache.lucene.search.Searcher.search(Searcher.java:118)
>
> Peter
>
>
> On Sun, Jan 25, 2009 at 9:41 AM, Michael McCandless (JIRA) <jira@apache.org 
> > wrote:
>
>     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel 
>  ]
>
> Michael McCandless resolved LUCENE-1316.
> ----------------------------------------
>
>    Resolution: Fixed
>
> Committed revision 737513.  Thanks everyone!
>
> > Avoidable synchronization bottleneck in MatchAlldocsQuery 
> $MatchAllScorer
> >  
> ------------------------------------------------------------------------
> >
> >                 Key: LUCENE-1316
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
> >             Project: Lucene - Java
> >          Issue Type: Bug
> >          Components: Query/Scoring
> >    Affects Versions: 2.3
> >         Environment: All
> >            Reporter: Todd Feak
> >            Assignee: Michael McCandless
> >            Priority: Minor
> >             Fix For: 2.9
> >
> >         Attachments: LUCENE-1316.patch, LUCENE-1316.patch,  
> LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch,  
> LUCENE_1316.patch, MatchAllDocsQuery.java
> >
> >   Original Estimate: 1h
> >  Remaining Estimate: 1h
> >
> > The isDeleted() method on IndexReader has been mentioned a number  
> of times as a potential synchronization bottleneck. However, the  
> reason this  bottleneck occurs is actually at a higher level that  
> wasn't focused on (at least in the threads I read).
> > In every case I saw where a stack trace was provided to show the  
> lock/block, higher in the stack you see the MatchAllScorer.next()  
> method. In Solr paricularly, this scorer is used for "NOT" queries.  
> We saw incredibly poor performance (order of magnitude) on our load  
> tests for NOT queries, due to this bottleneck. The problem is that  
> every single document is run through this isDeleted() method, which  
> is synchronized. Having an optimized index exacerbates this issues,  
> as there is only a single SegmentReader to synchronize on, causing a  
> major thread pileup waiting for the lock.
> > By simply having the MatchAllScorer see if there have been any  
> deletions in the reader, much of this can be avoided. Especially in  
> a read-only environment for production where you have slaves doing  
> all the high load searching.
> > I modified line 67 in the MatchAllDocsQuery
> > FROM:
> >   if (!reader.isDeleted(id)) {
> > TO:
> >   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> > In our micro load test for NOT queries only, this was a major  
> performance improvement.  We also got the same query results. I  
> don't believe this will improve the situation for indexes that have  
> deletions.
> > Please consider making this adjustment for a future bug fix release.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Resolved: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by Peter Keegan <pe...@gmail.com>.
I just ran into the same bottleneck with ValueSourceScorer.
Here's a stack trace:

Name: QueryThread group 1,#4
State: BLOCKED on org.apache.lucene.index.SegmentReader@49d7fb83 owned by:
QueryThread group 1,#8
Total blocked: 1,535,881  Total waited: 1,080

Stack trace:
org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java:663)
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:221)
org.apache.lucene.search.function.ValueSourceQuery$ValueSourceScorer.next(ValueSourceQuery.java:138)
org.apache.lucene.search.function.ValueSourceQuery$ValueSourceScorer.skipTo(ValueSourceQuery.java:159)
org.apache.lucene.search.function.CustomScoreQuery$CustomScorer.skipTo(CustomScoreQuery.java:399)
org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:59)
org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:51)
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:319)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)
org.apache.lucene.search.Searcher.search(Searcher.java:118)

Peter


On Sun, Jan 25, 2009 at 9:41 AM, Michael McCandless (JIRA)
<ji...@apache.org>wrote:

>
>     [
> https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Michael McCandless resolved LUCENE-1316.
> ----------------------------------------
>
>    Resolution: Fixed
>
> Committed revision 737513.  Thanks everyone!
>
> > Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> > ------------------------------------------------------------------------
> >
> >                 Key: LUCENE-1316
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
> >             Project: Lucene - Java
> >          Issue Type: Bug
> >          Components: Query/Scoring
> >    Affects Versions: 2.3
> >         Environment: All
> >            Reporter: Todd Feak
> >            Assignee: Michael McCandless
> >            Priority: Minor
> >             Fix For: 2.9
> >
> >         Attachments: LUCENE-1316.patch, LUCENE-1316.patch,
> LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch,
> MatchAllDocsQuery.java
> >
> >   Original Estimate: 1h
> >  Remaining Estimate: 1h
> >
> > The isDeleted() method on IndexReader has been mentioned a number of
> times as a potential synchronization bottleneck. However, the reason this
>  bottleneck occurs is actually at a higher level that wasn't focused on (at
> least in the threads I read).
> > In every case I saw where a stack trace was provided to show the
> lock/block, higher in the stack you see the MatchAllScorer.next() method. In
> Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly
> poor performance (order of magnitude) on our load tests for NOT queries, due
> to this bottleneck. The problem is that every single document is run through
> this isDeleted() method, which is synchronized. Having an optimized index
> exacerbates this issues, as there is only a single SegmentReader to
> synchronize on, causing a major thread pileup waiting for the lock.
> > By simply having the MatchAllScorer see if there have been any deletions
> in the reader, much of this can be avoided. Especially in a read-only
> environment for production where you have slaves doing all the high load
> searching.
> > I modified line 67 in the MatchAllDocsQuery
> > FROM:
> >   if (!reader.isDeleted(id)) {
> > TO:
> >   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> > In our micro load test for NOT queries only, this was a major performance
> improvement.  We also got the same query results. I don't believe this will
> improve the situation for indexes that have deletions.
> > Please consider making this adjustment for a future bug fix release.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608162#action_12608162 ] 

Mark Miller commented on LUCENE-1316:
-------------------------------------

If I remember correctly, volatile does not work correctly until java 1.5. At best I think it was implementation dependent with the old memory model.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1316:
-------------------------------------

    Attachment: LUCENE-1316.patch

Added synchronized on parent when obtaining the deleted docs in SegmentTermDocs.  

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608160#action_12608160 ] 

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

bq. declaring the deletedDocs volatile should do the trick.

Right... that would be cheaper when no docs were deleted.  But would it be more expensive when there were deleted docs (a volatile + a synchronized?)  I don't know if lock coarsening could do anything with this case...

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664276#action_12664276 ] 

Michael McCandless commented on LUCENE-1316:
--------------------------------------------

I think at least AllTermDocs.java is missing from the latest patch here.  When you apply a patch, make a change, and then "svn diff" in order to post a new patch, you have to [frustratingly] remember to first "svn add" any new files in the first patch.

I think the Subversion team is working on an "svn patch" command that would do this (and apply properties, and maybe deal with a 3-way merge, etc) but last I checked it's not done yet.  So until then we have to workaround...

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Todd Feak (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Feak updated LUCENE-1316:
------------------------------


I like Hoss' suggestion better. I'll try that fix locally and if it provides the same improvement, I will submit a patch for you.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649219#action_12649219 ] 

Jason Rutherglen commented on LUCENE-1316:
------------------------------------------

I looked at 2.4 and am surprised this patch did not make it in.  Read only IndexReader shouldn't be necessary to solve this issue.  Were there any problems in the unit tests for this patch or something?

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Todd Feak (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Feak updated LUCENE-1316:
------------------------------


Further investigation indicates that the ValueSourceQuery$ValueSourceScorer may suffer from the same issue and benefit from a similar optimization.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608849#action_12608849 ] 

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

Sigh... the problem is that things are done in a two step process by default.
A TermDocs is created, and then seek is called (by which time the impl is set already). 

{code}
  public TermDocs termDocs(Term term) throws IOException {
    ensureOpen();
    TermDocs termDocs = termDocs();
    termDocs.seek(term);
    return termDocs;
  }

  public abstract TermDocs termDocs() throws IOException;
{code}

 So a full solution down this road would be slightly more involved  (overriding termDocs(Term) in all the sublcasses rather than just termDocs())



> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1316:
---------------------------------------

    Fix Version/s: 2.9

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1316:
-------------------------------------

    Attachment: LUCENE-1316.patch

Patch includes AllTermDocs.  

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1316.patch, LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656944#action_12656944 ] 

Michael McCandless commented on LUCENE-1316:
--------------------------------------------

Yonik is your patch here (creating an AllTermDocs) ready to commit, once you fix SegmentTermDocs to also synchronize on parent when grabbing the deletedDocs?

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Todd Feak (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Feak updated LUCENE-1316:
------------------------------

    Attachment: MatchAllDocsQuery.java

My version of MatchAlldocsQuery.java which has the modification in it.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "robert engels (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608149#action_12608149 ] 

robert engels commented on LUCENE-1316:
---------------------------------------

The Pattern#5 referenced (cheap read-write lock) is exactly what is trying to be accomplished.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637186#action_12637186 ] 

Robert Muir commented on LUCENE-1316:
-------------------------------------

we use MatchAllDocs query also. In addition to what is described here we got very nice gains by overriding the Scorer.score() methods that take a HitCollector.

This seems pretty dumb but I guess since it has to score every doc the method call overhead actually matters? 

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608189#action_12608189 ] 

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

bq. is your point that without synchronization on the null check there's no garuntee that B will ever see the change to deletedDocs even if it does execute after delete()

Right... it's about the memory barrier.

The reality is that there is normally a need for higher level synchronization anyway.  That's why it was always silly for things like StringBuffer to be synchronized.

bq. assuming we fix that then it seems like the original issue is back to square one: synchro bottlenecks when there are no deletions.

A scorer could just check once when initialized... there's never been any guarantees about in-flight queries immediately seeing deleted docs changes - now that *really* doesn't make sense.  TermScorer grabs the whole bit vector at the start and never checks again.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Todd Feak (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Feak updated LUCENE-1316:
------------------------------


I applied that patch locally to a 2.3.0 build. Test results show this solution performs equally as the other solution did. I'd be happy with it for that reason alone. I cannot argue as to the quality of the proposed solution. I will leave that to those more knowledgeable on Lucene itself. Thank you for the time you guys have put into this issue for me.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608147#action_12608147 ] 

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

bq. why would deletes be stop being instantly visible

It's minor, but before, if thread A deleted a document, and then thread B checked if it was deleted, thread B was guaranteed to see that it was in fact deleted.

If the check for deletedDocs == null were moved outside of the synchronized, there's no guarantee when thread B will see (if ever) that deletedDocs has been set (no memory barrier).

It's not a major issue since client code shouldn't be written that way IMO, but it's worth factoring into the decision.


> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608134#action_12608134 ] 

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

> a more generalized improvements would probably be to change SegmentReader.isDeleted so that instead of the entire method being synchronized

Right, but that's not totally back compatible.  Code that depended on deletes being instantly visible across threads would no longer be guaranteed.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Todd Feak (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Feak updated LUCENE-1316:
------------------------------


10 threads in JMeter throwing load at the Tomcat as fast as possible. The Tomcat was on a separate machine with more then 10 worker threads, though only 10 were in use at any one time.

To eliminate any differences, the tests were run back to back. The only difference was the lucene-core JAR and a Tomcat bounce between the tests. Otherwise, the same exact test is run in both cases. What you have is threads all trying to synchronize on isDeleted() 4700+ times per request. Lock contention goes through the roof. At any point during the test I can take a thread stack snapshot and they are all blocked waiting for isDeleted().

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608162#action_12608162 ] 

markrmiller@gmail.com edited comment on LUCENE-1316 at 6/25/08 12:40 PM:
---------------------------------------------------------------

If I remember correctly, volatile does not work correctly until java 1.5. At best I think it was implementation dependent with the old memory model.

*edit*

maybe its ok under certain circumstances:

http://www.ibm.com/developerworks/library/j-jtp02244.html

Problem #2: Reordering volatile and nonvolatile stores


      was (Author: markrmiller@gmail.com):
    If I remember correctly, volatile does not work correctly until java 1.5. At best I think it was implementation dependent with the old memory model.
  
> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646152#action_12646152 ] 

Mark Miller commented on LUCENE-1316:
-------------------------------------

The new ReadOnly IndexReader option resolves this issue, correct?

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "robert engels (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608187#action_12608187 ] 

robert engels commented on LUCENE-1316:
---------------------------------------

Hoss, that is indeed the case, another thread would see deletedDocs as null, even though another thread has set it

hasDeletions does not need to be synchronized if deletedDocs is volatile

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned LUCENE-1316:
------------------------------------------

    Assignee: Michael McCandless

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1316.patch, LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667088#action_12667088 ] 

Michael McCandless commented on LUCENE-1316:
--------------------------------------------

OK, no problem Yonik -- thanks for reviewing.  I'll commit shortly.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1316.patch, LUCENE-1316.patch, LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "robert engels (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608148#action_12608148 ] 

robert engels commented on LUCENE-1316:
---------------------------------------

According to

http://www.ibm.com/developerworks/java/library/j-jtp06197.html

declaring the deletedDocs volatile should do the trick.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610123#action_12610123 ] 

Hoss Man commented on LUCENE-1316:
----------------------------------


bq. Right, but using TermDocs, they already have the same style of optimization and hence suffer no synchronization overhead.

Arggg.... sorry, my bad: i was thinking reader.isDeleted was used in those
cases as well -- I was totally missing that SegmentTermDocs takes care of
it directly.

but ... thinking about how SegmentTermDocs are constructed for a moment,
isn't the (unsynchronized) usage of deletedDocs in SegmentTermDocs.next
prone to the same concern you had about removing (or reducing) the
synchronization in SegmentReader.isDeleted ... "deletes aren't instantly
visible across threads" ... are they?

Is SegmentTermDocs.next too lax in how it deals with deletedDocs, or is
SegmentReader.isDeleted to paranoid?



> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated LUCENE-1316:
---------------------------------

    Attachment: LUCENE_1316.patch

Attaching prototype patch (needs javadoc + tests if approach is acceptable) that avoids all synchronization when iterating over all documents.

If IndexReader.termDocs(Term term) is called with null, a TermDocs implementation is returned that matches all documents.  This is the same approach used by TermScorer via SegmentTermDocs to avoid synchronization by grabbing the BitVector of deleted docs at instantiation.

This patch also updates MatchAllDocuments to use this TermDocs to iterate over all documents.

Advantages:
  - adds no new methods or interfaces, simply adds extra semantics to an existing method
  - works from the bottom-up... no need to instantiate a big BitVector
  - exposes the functionality to expert users for use in custom queries
  - avoids a binary search to find the correct IndexReader in a MultiReader for each call (it leverages all the TermDocs code in all IndexReader implementations such as MultiTermDocs).

Disadvantages:
  - TermDocs instance returned cannot be used to seek to a different term.  However, this is minor and not a back compatibility concern since "null" was not previously a supported value.

On balance, I think it's 10% hack, 90% useful.  Thoughts?

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610228#action_12610228 ] 

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

{quote}but ... thinking about how SegmentTermDocs are constructed for a moment,
isn't the (unsynchronized) usage of deletedDocs in SegmentTermDocs.next
prone to the same concern you had about removing (or reducing) the
synchronization in SegmentReader.isDeleted ... "deletes aren't instantly
visible across threads" ... are they?
{quote}

No, deletes aren't instantly visible across threads (when one thread has started a query and the other thread does a delete).  AFAIK it's always been that way, so I think it's probably acceptable.  On a practical level, seeking on a TermDocs crosses synchronization points, so deletes won't go unrecognized by other threads forever either.

But there is a thread-safety issue I've been mulling over since I wrote this patch.
SegmentTermDocs.next() is fine... unsynchronized reads look benign on the BitVector class since writes just change a byte at a time.  "count" could be off, but that's OK too... it will simply be a stale value since updates to it are atomic.

The issue is the possibility of grabbing a partially constructed BitVector object to begin with.  Notice how I synchronize to avoid this in AllTermDocs:
{code}
  protected AllTermDocs(SegmentReader parent) {
    synchronized (parent) {
      this.deletedDocs = parent.deletedDocs;
    }
    this.maxDoc = parent.maxDoc();
  }
{code}

That should probably be done in SegmentTermDocs too.  Without it, a null pointer exception or an array out of bounds exception could result when checking the BitVector.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667083#action_12667083 ] 

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

+1 to the latest patch.
Much thanks for picking this up! (I've been completely swamped.)

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1316.patch, LUCENE-1316.patch, LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "robert engels (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608146#action_12608146 ] 

robert engels commented on LUCENE-1316:
---------------------------------------

According to the java memory model, hasDeletions() would need to be synchronized as well , since if another thread did perform a deletion, it would need to be up to date.

This might work in later JVMs by declaring the deletedDocs variable volatile, but no guarantees.

Seems better to ALLOW this behavior, that a reader might not see up to date deletions made during a query, and do a single synchronized check of deletions at the start.



> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610101#action_12610101 ] 

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

bq. so essentially this approach only improves MatchAllDocsQuery correct?

It would essentially simulate a term indexed for every document, and improve any query that could use that (i.e. that needed to iterate over all documents).  For things like MatchAllDocuments and FunctionQuery on a MultiReader, it should actually be superior to the elimination of all synchronization on isDeleted() since it also removes  the binary search to find the correct reader for a document.

bq. Other use cases where lots of termDoc enumeration take place (RangeFilter and PrefixFilter type code) wouldn't' benefit from this at all.

Right, but using TermDocs, they already have the same style of optimization and hence suffer no synchronization overhead.

bq. the other approach that occurred to me along the lines of a "read only" IndexReader is that if we started providing a public method for getting the list of all deleted docs

Right... that could be more useful for someone that needs random access to isDeleted(), at the cost of greater setup cost, and greater memory usage.  However the TermDocs approach solves the use-cases we have today in lucene-core.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1316.
----------------------------------------

    Resolution: Fixed

Committed revision 737513.  Thanks everyone!

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1316.patch, LUCENE-1316.patch, LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608841#action_12608841 ] 

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

I'll look into it (that's why it was marked as "prototype")

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608183#action_12608183 ] 

Hoss Man commented on LUCENE-1316:
----------------------------------

bq. if thread A deleted a document, and then thread B checked if it was deleted, thread B was guaranteed to see that it was in fact deleted.

Hmmm.... i'll take your word for it, but i don't follow the rational: the current synchronization just ensured that either the isDeleted() call will complete before the delete() call started or vice versa -- but you have no guarantee that thread B would run after thread A and get true.   .... unless... is your point that without synchronization on the null check there's no garuntee that B will ever see the change to deletedDocs even if it does execute after delete() ?

either way: robert's point about hasDeletions() needing to be synchronized seems like a bigger issue -- isn't that a bug in the current implementation?  assuming we fix that then it seems like the original issue is back to square one: synchro bottlenecks when there are no deletions.





> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608128#action_12608128 ] 

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

Although this doesn't solve the general problem, this probably still makes sense to do now for the no-deletes case.
Todd, can you produce a patch?  See http://wiki.apache.org/lucene-java/HowToContribute

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Todd Feak (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Feak updated LUCENE-1316:
------------------------------


I don't think that patch provides correct functionality. I went to run the load tests this morning against an un-optimized index and the query results do not match what an unpatched version does. Simply swapping the JAR and restarting returns different results for the same query. Specifically, empty (incorrect) results.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608137#action_12608137 ] 

Hoss Man commented on LUCENE-1316:
----------------------------------

bq. Code that depended on deletes being instantly visible across threads would no longer be guaranteed.

you lost me there ... why would deletes be stop being instantly visible if we changed this...

{code}
  public synchronized boolean isDeleted(int n) {
    return (deletedDocs != null && deletedDocs.get(n));
  }
{code}

...to this...

{code}
  public boolean isDeleted(int n) {
    if (null == deletedDocs) return false;
    synchronized (this) { return (deletedDocs.get(n)); }
  }
{code}

?

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Todd Feak (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Feak updated LUCENE-1316:
------------------------------


I applied the patch to the 2.3.0 file. I ran against an optimized and non-optimized (12 segment) index with 4700 entries.

2.3.0 non-optimized index  *104 tps*
2.3.0 patched non-optimized index *482 tps*

2.3.0 optimized index *21 tps*
2.3.0 patched optimized index *718 tps*

The patch provided improvements in both optimized and unoptimized indexes. Thanks again Yonik.


> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664567#action_12664567 ] 

Jason Rutherglen commented on LUCENE-1316:
------------------------------------------

{quote} 
In a way, this is a mirror image of Jason's request in
LUCENE-1476 for a getDeletedDocs() that returned a DocIdSet...
provided it also worked on a MultiReader, etc. MatchAllDocs could be
efficiently implemented with that. 
{quote}

I think the API of reader.termDocs(null) is fine. If we move to a
DocIdSet for deleteddocs we can change the all docs implementation if
needed. 

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1316.patch, LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608844#action_12608844 ] 

yseeley@gmail.com edited comment on LUCENE-1316 at 6/27/08 10:15 AM:
----------------------------------------------------------------

I saw and fixed a bug that would affect multireaders.... patch attached.
I've not yet written tests, so no guarantees.
Edit: I reproduced... it still doesn't work, so hold off using this.

      was (Author: yseeley@gmail.com):
    I saw and fixed a bug that would affect multireaders.... patch attached.
I've not yet written tests, so no guarantees.
  
> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated LUCENE-1316:
---------------------------------

    Attachment: LUCENE_1316.patch

Here's an updated patch that should work correctly.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated LUCENE-1316:
---------------------------------

    Attachment: LUCENE_1316.patch

I saw and fixed a bug that would affect multireaders.... patch attached.
I've not yet written tests, so no guarantees.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1316:
---------------------------------------

    Attachment: LUCENE-1316.patch

OK, new patch attached:

  * Added more tests of *Reader.termDocs(null)

  * Changed IndexReader.termDocs javadoc to explain term=null case

  * Fixed MemoryIndex.termDocs(null) to return freq=1

  * Added CHANGES.txt entry

I think it's ready to commit.  I'll wait a day or two.


> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1316.patch, LUCENE-1316.patch, LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608472#action_12608472 ] 

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

20tps (queries per second?) for 4700 docs is *very* slow... how many threads were you testing with?

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608129#action_12608129 ] 

Hoss Man commented on LUCENE-1316:
----------------------------------

rather then attempting localized optimizations of individual Query classes, a more generalized improvements would probably be to change SegmentReader.isDeleted so that instead of the entire method being synchronized, it first checks if the segment has any deletions, and if not then enters a synchronized block to check deletedDocs.get(n).

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by robert engels <re...@ix.netcom.com>.
I think a better approach might be a specialized class  
ConcurrentBitSet designed for read lots, writes little (or just make  
a query not check deletes once it is started).

The ConcurrentHashMap in the JDK is a basis for the implementation.

Then, a SegmentDeletes that extends it (with IO functions) would  
complete the story.

On Jun 26, 2008, at 10:25 AM, Todd Feak (JIRA) wrote:

>
>      [ https://issues.apache.org/jira/browse/LUCENE-1316? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Todd Feak updated LUCENE-1316:
> ------------------------------
>
>
> I wanted to share my micro load test results with you, to make sure  
> you all understand scale of the bottleneck as we are experiencing it.
>
> For an optimized index with 4700+ documents (ie small), a NOT query  
> varies by a factor of 35 under heavy load. Using 2.3.0 release I  
> got 20 tps. With the volatile/synchronized fix suggested, I got 700  
> tps. The limiting factor on the 700 tps was the CPU on the computer  
> throwing load, so this may be even worse. Furthermore, the more  
> documents that exist in the index, the worse this may get, as it  
> synchonizes on every single iteration through the loop.
>
> An argument can be made that this is just a necessary evil, and  
> that we *must* synchronize on this for the possibility of updates  
> during reads. I have 2 questions regarding that.
>
> 1. What is the cost of a dirty read in that case? Is it stale data,  
> incorrect data, or a corrupted system?
> 2. What is more prevalent in a production system. Indexes with no  
> deletes, indexes with *some* deletes, or indexes with frequent  
> deletes?
>
> Do we need to have 1 class that does it all, or should we consider  
> 2 different implementation for 2 different uses. What about a read- 
> only SegmentReader for read-only slaves in production environments?
>
>
>
>
>> Avoidable synchronization bottleneck in MatchAlldocsQuery 
>> $MatchAllScorer
>> --------------------------------------------------------------------- 
>> ---
>>
>>                 Key: LUCENE-1316
>>                 URL: https://issues.apache.org/jira/browse/ 
>> LUCENE-1316
>>             Project: Lucene - Java
>>          Issue Type: Bug
>>          Components: Query/Scoring
>>    Affects Versions: 2.3
>>         Environment: All
>>            Reporter: Todd Feak
>>            Priority: Minor
>>         Attachments: MatchAllDocsQuery.java
>>
>>   Original Estimate: 1h
>>  Remaining Estimate: 1h
>>
>> The isDeleted() method on IndexReader has been mentioned a number  
>> of times as a potential synchronization bottleneck. However, the  
>> reason this  bottleneck occurs is actually at a higher level that  
>> wasn't focused on (at least in the threads I read).
>> In every case I saw where a stack trace was provided to show the  
>> lock/block, higher in the stack you see the MatchAllScorer.next()  
>> method. In Solr paricularly, this scorer is used for "NOT"  
>> queries. We saw incredibly poor performance (order of magnitude)  
>> on our load tests for NOT queries, due to this bottleneck. The  
>> problem is that every single document is run through this isDeleted 
>> () method, which is synchronized. Having an optimized index  
>> exacerbates this issues, as there is only a single SegmentReader  
>> to synchronize on, causing a major thread pileup waiting for the  
>> lock.
>> By simply having the MatchAllScorer see if there have been any  
>> deletions in the reader, much of this can be avoided. Especially  
>> in a read-only environment for production where you have slaves  
>> doing all the high load searching.
>> I modified line 67 in the MatchAllDocsQuery
>> FROM:
>>   if (!reader.isDeleted(id)) {
>> TO:
>>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
>> In our micro load test for NOT queries only, this was a major  
>> performance improvement.  We also got the same query results. I  
>> don't believe this will improve the situation for indexes that  
>> have deletions.
>> Please consider making this adjustment for a future bug fix release.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Todd Feak (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Feak updated LUCENE-1316:
------------------------------


I wanted to share my micro load test results with you, to make sure you all understand scale of the bottleneck as we are experiencing it.

For an optimized index with 4700+ documents (ie small), a NOT query varies by a factor of 35 under heavy load. Using 2.3.0 release I got 20 tps. With the volatile/synchronized fix suggested, I got 700 tps. The limiting factor on the 700 tps was the CPU on the computer throwing load, so this may be even worse. Furthermore, the more documents that exist in the index, the worse this may get, as it synchonizes on every single iteration through the loop.

An argument can be made that this is just a necessary evil, and that we *must* synchronize on this for the possibility of updates during reads. I have 2 questions regarding that.

1. What is the cost of a dirty read in that case? Is it stale data, incorrect data, or a corrupted system?
2. What is more prevalent in a production system. Indexes with no deletes, indexes with *some* deletes, or indexes with frequent deletes?

Do we need to have 1 class that does it all, or should we consider 2 different implementation for 2 different uses. What about a read-only SegmentReader for read-only slaves in production environments?




> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608542#action_12608542 ] 

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

bq. Test results show this solution performs equally as the other solution did.

That's good news (as I assume this was for an optimized index).
Would it be possible for you to try the same test on a non-optimized index (multi-segment) with a couple deletions?


> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608956#action_12608956 ] 

Hoss Man commented on LUCENE-1316:
----------------------------------

bq. TermDocs instance returned cannot be used to seek to a different term. However, this is minor and not a back compatibility concern since "null" was not previously a supported value.

so essentially this approach only improves MatchAllDocsQuery correct? .... Other use cases where lots of termDoc enumeration take place (RangeFilter and PrefixFilter type code) wouldn't' benefit from this at all.

Assuming the genuinely eliminating the synchronization is infeasible, the other approach that occurred to me along the lines of a "read only" IndexReader is that if we started providing a public method for getting the list of all deleted docs (either as a BitVector or as a DocIdSet or something) then it would be easy to implement a SnapshotFilteredIndexReader that on construction cached the current list of all deleted docs in the IndexReader it's wrapping, used it for all isDeleted() methods, and proxied all other methods to the underlying reader.

then things like MatchAllDocs, RangeFilter, and PrefixFilter could (optionally?) construct one of those prior to their big iteration loops, and use it in place of the original IndexReader.  Trade the syncro bottleneck for deletion data as of the moment the query was started (a fair trade for most people)

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664404#action_12664404 ] 

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

In a way, this is a mirror image of Jason's request in LUCENE-1476 for a getDeletedDocs() that returned a DocIdSet... provided it also worked on a MultiReader, etc.  MatchAllDocs could be efficiently implemented with that.

It does seem like having some sort of iterator over existing docs is useful to avoid the binary search cost of associating ids with segments, but there was never any feedback on what the API should be.  Instead of adding new functionality to termDocs(), we could also add a getAllDocs() that returns either a DocIdSet or an interator.



> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org