You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2008/07/03 03:26:45 UTC
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610101#action_12610101 ] 

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

bq. so essentially this approach only improves MatchAllDocsQuery correct?

It would essentially simulate a term indexed for every document, and improve any query that could use that (i.e. that needed to iterate over all documents).  For things like MatchAllDocuments and FunctionQuery on a MultiReader, it should actually be superior to the elimination of all synchronization on isDeleted() since it also removes  the binary search to find the correct reader for a document.

bq. Other use cases where lots of termDoc enumeration take place (RangeFilter and PrefixFilter type code) wouldn't' benefit from this at all.

Right, but using TermDocs, they already have the same style of optimization and hence suffer no synchronization overhead.

bq. the other approach that occurred to me along the lines of a "read only" IndexReader is that if we started providing a public method for getting the list of all deleted docs

Right... that could be more useful for someone that needs random access to isDeleted(), at the cost of greater setup cost, and greater memory usage.  However the TermDocs approach solves the use-cases we have today in lucene-core.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.  We also got the same query results. I don't believe this will improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org