You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2005/10/13 05:46:06 UTC
[jira] Created: (LUCENE-454) lazily create SegmentMergeInfo.docMap
lazily create SegmentMergeInfo.docMap
-------------------------------------
Key: LUCENE-454
URL: http://issues.apache.org/jira/browse/LUCENE-454
Project: Lucene - Java
Type: Improvement
Versions: CVS Nightly - Specify date in submission
Reporter: Yonik Seeley
Since creating the docMap is expensive, and it's only used during segment merging, not searching, defer creation until it is requested.
SegmentMergeInfo is also used in MultiTermEnum, the term enumerator for a MultiReader. TermEnum is used by queries such as PrefixQuery, RangeQuery, WildcardQuery, as well as RangeFilter, DateFilter, and sorting the first time (filling the FieldCache).
Performance Results:
A simple single field index with 555,555 documents, and 1000 random deletions was queried 1000 times with a PrefixQuery matching a single document.
Performance Before Patch:
indexing time = 121,656 ms
querying time = 58,812 ms
Performance After Patch:
indexing time = 121,000 ms
querying time = 598 ms
A 100 fold increase in query performance!
All lucene unit tests pass.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-454) lazily create SegmentMergeInfo.docMap
Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/LUCENE-454?page=all ]
Yonik Seeley updated LUCENE-454:
--------------------------------
Attachment: docMap.txt
Also deferred creation of SegmentMergeInfo.postings (TermPositions) for another 15% gain.
Same index and query were used to test, but this time 100,000 query iterations.
defer docMap only:
indexing time = 121,734 ms
querying time = 18,266 ms
defer docMap and postings:
indexing time = 120,860 ms
querying time = 15,625 ms
> lazily create SegmentMergeInfo.docMap
> -------------------------------------
>
> Key: LUCENE-454
> URL: http://issues.apache.org/jira/browse/LUCENE-454
> Project: Lucene - Java
> Type: Improvement
> Versions: CVS Nightly - Specify date in submission
> Reporter: Yonik Seeley
> Attachments: docMap.txt, docMap.txt
>
> Since creating the docMap is expensive, and it's only used during segment merging, not searching, defer creation until it is requested.
> SegmentMergeInfo is also used in MultiTermEnum, the term enumerator for a MultiReader. TermEnum is used by queries such as PrefixQuery, RangeQuery, WildcardQuery, as well as RangeFilter, DateFilter, and sorting the first time (filling the FieldCache).
> Performance Results:
> A simple single field index with 555,555 documents, and 1000 random deletions was queried 1000 times with a PrefixQuery matching a single document.
> Performance Before Patch:
> indexing time = 121,656 ms
> querying time = 58,812 ms
> Performance After Patch:
> indexing time = 121,000 ms
> querying time = 598 ms
> A 100 fold increase in query performance!
> All lucene unit tests pass.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-454) lazily create SegmentMergeInfo.docMap
Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/LUCENE-454?page=all ]
Yonik Seeley updated LUCENE-454:
--------------------------------
Attachment: docMap.txt
attaching patch
> lazily create SegmentMergeInfo.docMap
> -------------------------------------
>
> Key: LUCENE-454
> URL: http://issues.apache.org/jira/browse/LUCENE-454
> Project: Lucene - Java
> Type: Improvement
> Versions: CVS Nightly - Specify date in submission
> Reporter: Yonik Seeley
> Attachments: docMap.txt
>
> Since creating the docMap is expensive, and it's only used during segment merging, not searching, defer creation until it is requested.
> SegmentMergeInfo is also used in MultiTermEnum, the term enumerator for a MultiReader. TermEnum is used by queries such as PrefixQuery, RangeQuery, WildcardQuery, as well as RangeFilter, DateFilter, and sorting the first time (filling the FieldCache).
> Performance Results:
> A simple single field index with 555,555 documents, and 1000 random deletions was queried 1000 times with a PrefixQuery matching a single document.
> Performance Before Patch:
> indexing time = 121,656 ms
> querying time = 58,812 ms
> Performance After Patch:
> indexing time = 121,000 ms
> querying time = 598 ms
> A 100 fold increase in query performance!
> All lucene unit tests pass.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Resolved: (LUCENE-454) lazily create SegmentMergeInfo.docMap
Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/LUCENE-454?page=all ]
Yonik Seeley resolved LUCENE-454:
---------------------------------
Fix Version: 1.9
Resolution: Fixed
Assign To: Yonik Seeley
> lazily create SegmentMergeInfo.docMap
> -------------------------------------
>
> Key: LUCENE-454
> URL: http://issues.apache.org/jira/browse/LUCENE-454
> Project: Lucene - Java
> Type: Improvement
> Versions: CVS Nightly - Specify date in submission
> Reporter: Yonik Seeley
> Assignee: Yonik Seeley
> Fix For: 1.9
> Attachments: docMap.txt, docMap.txt
>
> Since creating the docMap is expensive, and it's only used during segment merging, not searching, defer creation until it is requested.
> SegmentMergeInfo is also used in MultiTermEnum, the term enumerator for a MultiReader. TermEnum is used by queries such as PrefixQuery, RangeQuery, WildcardQuery, as well as RangeFilter, DateFilter, and sorting the first time (filling the FieldCache).
> Performance Results:
> A simple single field index with 555,555 documents, and 1000 random deletions was queried 1000 times with a PrefixQuery matching a single document.
> Performance Before Patch:
> indexing time = 121,656 ms
> querying time = 58,812 ms
> Performance After Patch:
> indexing time = 121,000 ms
> querying time = 598 ms
> A 100 fold increase in query performance!
> All lucene unit tests pass.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Closed: (LUCENE-454) lazily create SegmentMergeInfo.docMap
Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/LUCENE-454?page=all ]
Yonik Seeley closed LUCENE-454:
-------------------------------
> lazily create SegmentMergeInfo.docMap
> -------------------------------------
>
> Key: LUCENE-454
> URL: http://issues.apache.org/jira/browse/LUCENE-454
> Project: Lucene - Java
> Type: Improvement
> Versions: CVS Nightly - Specify date in submission
> Reporter: Yonik Seeley
> Assignee: Yonik Seeley
> Fix For: 1.9
> Attachments: docMap.txt, docMap.txt
>
> Since creating the docMap is expensive, and it's only used during segment merging, not searching, defer creation until it is requested.
> SegmentMergeInfo is also used in MultiTermEnum, the term enumerator for a MultiReader. TermEnum is used by queries such as PrefixQuery, RangeQuery, WildcardQuery, as well as RangeFilter, DateFilter, and sorting the first time (filling the FieldCache).
> Performance Results:
> A simple single field index with 555,555 documents, and 1000 random deletions was queried 1000 times with a PrefixQuery matching a single document.
> Performance Before Patch:
> indexing time = 121,656 ms
> querying time = 58,812 ms
> Performance After Patch:
> indexing time = 121,000 ms
> querying time = 598 ms
> A 100 fold increase in query performance!
> All lucene unit tests pass.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org