You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2005/10/13 05:46:06 UTC

[jira] Created: (LUCENE-454) lazily create SegmentMergeInfo.docMap

lazily create SegmentMergeInfo.docMap
-------------------------------------

         Key: LUCENE-454
         URL: http://issues.apache.org/jira/browse/LUCENE-454
     Project: Lucene - Java
        Type: Improvement
    Versions: CVS Nightly - Specify date in submission    
    Reporter: Yonik Seeley


Since creating the docMap is expensive, and it's only used during segment merging, not searching, defer creation until it is requested.

SegmentMergeInfo is also used in MultiTermEnum, the term enumerator for a MultiReader.  TermEnum is used by queries such as PrefixQuery, RangeQuery, WildcardQuery, as well as RangeFilter, DateFilter, and sorting the first time (filling the FieldCache).

Performance Results:
  A simple single field index with 555,555 documents, and 1000 random deletions was queried 1000 times with a PrefixQuery matching a single document.

Performance Before Patch:
  indexing time = 121,656 ms
  querying time = 58,812 ms

Performance After Patch:
  indexing time = 121,000 ms
  querying time =         598 ms

A 100 fold increase in query performance!

All lucene unit tests pass.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-454) lazily create SegmentMergeInfo.docMap

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/LUCENE-454?page=all ]

Yonik Seeley updated LUCENE-454:
--------------------------------

    Attachment: docMap.txt

Also deferred creation of SegmentMergeInfo.postings (TermPositions) for another 15% gain.

Same index and query were used to test, but this time 100,000 query iterations.

defer docMap only:
  indexing time = 121,734 ms
  querying time = 18,266 ms

defer docMap and postings:
  indexing time = 120,860 ms
  querying time = 15,625 ms


> lazily create SegmentMergeInfo.docMap
> -------------------------------------
>
>          Key: LUCENE-454
>          URL: http://issues.apache.org/jira/browse/LUCENE-454
>      Project: Lucene - Java
>         Type: Improvement
>     Versions: CVS Nightly - Specify date in submission
>     Reporter: Yonik Seeley
>  Attachments: docMap.txt, docMap.txt
>
> Since creating the docMap is expensive, and it's only used during segment merging, not searching, defer creation until it is requested.
> SegmentMergeInfo is also used in MultiTermEnum, the term enumerator for a MultiReader.  TermEnum is used by queries such as PrefixQuery, RangeQuery, WildcardQuery, as well as RangeFilter, DateFilter, and sorting the first time (filling the FieldCache).
> Performance Results:
>   A simple single field index with 555,555 documents, and 1000 random deletions was queried 1000 times with a PrefixQuery matching a single document.
> Performance Before Patch:
>   indexing time = 121,656 ms
>   querying time = 58,812 ms
> Performance After Patch:
>   indexing time = 121,000 ms
>   querying time =         598 ms
> A 100 fold increase in query performance!
> All lucene unit tests pass.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-454) lazily create SegmentMergeInfo.docMap

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/LUCENE-454?page=all ]

Yonik Seeley updated LUCENE-454:
--------------------------------

    Attachment: docMap.txt

attaching patch

> lazily create SegmentMergeInfo.docMap
> -------------------------------------
>
>          Key: LUCENE-454
>          URL: http://issues.apache.org/jira/browse/LUCENE-454
>      Project: Lucene - Java
>         Type: Improvement
>     Versions: CVS Nightly - Specify date in submission
>     Reporter: Yonik Seeley
>  Attachments: docMap.txt
>
> Since creating the docMap is expensive, and it's only used during segment merging, not searching, defer creation until it is requested.
> SegmentMergeInfo is also used in MultiTermEnum, the term enumerator for a MultiReader.  TermEnum is used by queries such as PrefixQuery, RangeQuery, WildcardQuery, as well as RangeFilter, DateFilter, and sorting the first time (filling the FieldCache).
> Performance Results:
>   A simple single field index with 555,555 documents, and 1000 random deletions was queried 1000 times with a PrefixQuery matching a single document.
> Performance Before Patch:
>   indexing time = 121,656 ms
>   querying time = 58,812 ms
> Performance After Patch:
>   indexing time = 121,000 ms
>   querying time =         598 ms
> A 100 fold increase in query performance!
> All lucene unit tests pass.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-454) lazily create SegmentMergeInfo.docMap

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/LUCENE-454?page=all ]
     
Yonik Seeley resolved LUCENE-454:
---------------------------------

    Fix Version: 1.9
     Resolution: Fixed
      Assign To: Yonik Seeley

> lazily create SegmentMergeInfo.docMap
> -------------------------------------
>
>          Key: LUCENE-454
>          URL: http://issues.apache.org/jira/browse/LUCENE-454
>      Project: Lucene - Java
>         Type: Improvement
>     Versions: CVS Nightly - Specify date in submission
>     Reporter: Yonik Seeley
>     Assignee: Yonik Seeley
>      Fix For: 1.9
>  Attachments: docMap.txt, docMap.txt
>
> Since creating the docMap is expensive, and it's only used during segment merging, not searching, defer creation until it is requested.
> SegmentMergeInfo is also used in MultiTermEnum, the term enumerator for a MultiReader.  TermEnum is used by queries such as PrefixQuery, RangeQuery, WildcardQuery, as well as RangeFilter, DateFilter, and sorting the first time (filling the FieldCache).
> Performance Results:
>   A simple single field index with 555,555 documents, and 1000 random deletions was queried 1000 times with a PrefixQuery matching a single document.
> Performance Before Patch:
>   indexing time = 121,656 ms
>   querying time = 58,812 ms
> Performance After Patch:
>   indexing time = 121,000 ms
>   querying time =         598 ms
> A 100 fold increase in query performance!
> All lucene unit tests pass.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Closed: (LUCENE-454) lazily create SegmentMergeInfo.docMap

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/LUCENE-454?page=all ]
     
Yonik Seeley closed LUCENE-454:
-------------------------------


> lazily create SegmentMergeInfo.docMap
> -------------------------------------
>
>          Key: LUCENE-454
>          URL: http://issues.apache.org/jira/browse/LUCENE-454
>      Project: Lucene - Java
>         Type: Improvement
>     Versions: CVS Nightly - Specify date in submission
>     Reporter: Yonik Seeley
>     Assignee: Yonik Seeley
>      Fix For: 1.9
>  Attachments: docMap.txt, docMap.txt
>
> Since creating the docMap is expensive, and it's only used during segment merging, not searching, defer creation until it is requested.
> SegmentMergeInfo is also used in MultiTermEnum, the term enumerator for a MultiReader.  TermEnum is used by queries such as PrefixQuery, RangeQuery, WildcardQuery, as well as RangeFilter, DateFilter, and sorting the first time (filling the FieldCache).
> Performance Results:
>   A simple single field index with 555,555 documents, and 1000 random deletions was queried 1000 times with a PrefixQuery matching a single document.
> Performance Before Patch:
>   indexing time = 121,656 ms
>   querying time = 58,812 ms
> Performance After Patch:
>   indexing time = 121,000 ms
>   querying time =         598 ms
> A 100 fold increase in query performance!
> All lucene unit tests pass.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org