You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2007/04/05 23:33:32 UTC
[jira] Updated: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Otis Gospodnetic updated LUCENE-584:
------------------------------------
Attachment: bench-diff.txt
Perhaps I did something wrong with the benchmark, but I didn't get any speed-up when using searcher.match(Query, MatchCollector) vs. searcher.search(Query, HitCollector).
Here are the benchmark numbers (50000 queries with each), HitCollector first, MatchCollector second:
HITCOLLECTOR:
[java] ------------> Report Sum By (any) Name (11 about 41 out of 41)
[java] Operation round mrg buf runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem
[java] Rounds_4 0 10 10 1 808020 787.5 1,026.04 7,217,624 17,780,736
[java] Populate - - - - - - - - - - - - 4 - - - 2003 - - 129.9 - - 61.67 - 9,938,986 - 13,821,952
[java] CreateIndex - - - 4 1 4.4 0.91 3,937,522 10,916,864
[java] MAddDocs_2000 - - - - - - - - - - 4 - - - 2000 - - 138.1 - - 57.92 - 9,368,584 - 13,821,952
[java] Optimize - - - 4 1 1.4 2.83 9,938,218 13,821,952
[java] CloseIndex - - - - - - - - - - - 4 - - - - 1 - - 2,000.0 - - 0.00 - 9,938,986 - 13,821,952
[java] OpenReader - - - 4 1 24.0 0.17 9,957,592 13,821,952
[java] SearchSameRdr_50000 - - - - - - - - 4 - - 50000 - - 1,070.3 - - 186.86 - 10,500,146 - 13,821,952
[java] CloseReader - - - 4 1 4,000.0 0.00 9,059,756 13,821,952
[java] WarmNewRdr_50 - - - - - - - - - - 4 - - 100000 - 16,237.7 - - 24.63 - 9,060,268 - 13,821,952
[java] SrchNewRdr_50000 - - - 4 50000 265.9 752.02 10,800,006 13,821,952
[java] ------------> Report sum by Prefix (MAddDocs) and Round (4 about 4 out of 41)
[java] Operation round mrg buf runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem
[java] MAddDocs_2000 0 10 10 1 2000 94.6 21.15 7,844,112 10,407,936
[java] MAddDocs_2000 - 1 100 10 - - 1 - - - 2000 - - 136.7 - - 14.63 - 8,968,144 - 11,309,056
[java] MAddDocs_2000 2 10 100 1 2000 173.2 11.55 10,528,264 15,740,928
[java] MAddDocs_2000 - 3 100 100 - - 1 - - - 2000 - - 188.7 - - 10.60 - 10,133,816 - 17,829,888
MATCHCOLLECTOR:
[java] ------------> Report Sum By (any) Name (11 about 41 out of 41)
[java] Operation round mrg buf runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem
[java] Rounds_4 0 10 10 1 808020 781.0 1,034.62 10,566,608 15,859,712
[java] Populate - - - - - - - - - - - - 4 - - - 2003 - - 130.9 - - 61.23 - 10,963,452 - 14,806,016
[java] CreateIndex - - - 4 1 33.9 0.12 3,616,570 11,020,288
[java] MAddDocs_2000 - - - - - - - - - - 4 - - - 2000 - - 137.3 - - 58.29 - 10,445,568 - 14,806,016
[java] Optimize - - - 4 1 1.4 2.82 10,979,398 14,806,016
[java] CloseIndex - - - - - - - - - - - 4 - - - - 1 - - 2,000.0 - - 0.00 - 10,963,452 - 14,806,016
[java] OpenReader - - - 4 1 22.0 0.18 10,982,058 14,806,016
[java] SearchSameRdr_50000 - - - - - - - - 4 - - 50000 - - 1,064.7 - - 187.84 - 11,060,036 - 14,806,016
[java] CloseReader - - - 4 1 4,000.0 0.00 10,353,206 14,806,016
[java] WarmNewRdr_50 - - - - - - - - - - 4 - - 100000 - 16,419.0 - - 24.36 - 10,431,062 - 14,806,016
[java] SrchNewRdr_50000 - - - 4 50000 263.0 760.34 11,912,358 14,806,016
[java] ------------> Report sum by Prefix (MAddDocs) and Round (4 about 4 out of 41)
[java] Operation round mrg buf runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem
[java] MAddDocs_2000 0 10 10 1 2000 92.2 21.69 7,844,112 10,407,936
[java] MAddDocs_2000 - 1 100 10 - - 1 - - - 2000 - - 136.6 - - 14.64 - 7,720,352 - 10,407,936
[java] MAddDocs_2000 2 10 100 1 2000 167.8 11.92 11,325,952 17,571,840
[java] MAddDocs_2000 - 3 100 100 - - 1 - - - 2000 - - 199.3 - - 10.03 - 14,891,856 - 20,836,352
This is what I did for the benchmark. I used Doron's handy conf/benchmark.
I added a new .alg based on micro-standard.alg, here's the diff:
$ diff conf/micro-standard.alg conf/matcher-micro-standard.alg
60c60
< { "SearchSameRdr" Search > : 50000
---
> { "SearchSameRdr" SearchMatch > : 50000
65c65
< { "SrchNewRdr" Search > : 50000
---
> { "SrchNewRdr" SearchMatch > : 50000
Then I added 2 new Tasks for benchamrking the Matcher (searcher.search(Query, MatchCollector)) and modified the ReadTask to call searcher.search(Query, HitCollector) instead of the method to get Hits.
I commented out all search results traversal and doc retrieval, as I didn't care to measure that.
> Decouple Filter from BitSet
> ---------------------------
>
> Key: LUCENE-584
> URL: https://issues.apache.org/jira/browse/LUCENE-584
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.0.1
> Reporter: Peter Schäfer
> Priority: Minor
> Attachments: bench-diff.txt, BitsMatcher.java, Filter-20060628.patch, HitCollector-20060628.patch, IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable
> {
> public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet
> {
> public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of memory. It would be desirable to have an alternative BitSet implementation with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation could still delegate to =java.util.BitSet=.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org