You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2007/04/05 23:33:32 UTC

[jira] Updated: (LUCENE-584) Decouple Filter from BitSet

     [ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated LUCENE-584:
------------------------------------

    Attachment: bench-diff.txt

Perhaps I did something wrong with the benchmark, but I didn't get any speed-up when using searcher.match(Query, MatchCollector) vs. searcher.search(Query, HitCollector).

Here are the benchmark numbers (50000 queries with each), HitCollector first, MatchCollector second:

HITCOLLECTOR:

     [java] ------------> Report Sum By (any) Name (11 about 41 out of 41)
     [java] Operation           round mrg buf   runCnt   recsPerRun        rec/s  elapsedSec    avgUsedMem    avgTotalMem
     [java] Rounds_4                0  10  10        1       808020        787.5    1,026.04     7,217,624     17,780,736
     [java] Populate -  -  -  -  -  - - - - - -  -   4 -  -  - 2003 -  -   129.9 -  -  61.67 -   9,938,986 -   13,821,952
     [java] CreateIndex             -   -   -        4            1          4.4        0.91     3,937,522     10,916,864
     [java] MAddDocs_2000 -  -  -   - - - - - -  -   4 -  -  - 2000 -  -   138.1 -  -  57.92 -   9,368,584 -   13,821,952
     [java] Optimize                -   -   -        4            1          1.4        2.83     9,938,218     13,821,952
     [java] CloseIndex -  -  -  -   - - - - - -  -   4 -  -  -  - 1 -  - 2,000.0 -  -   0.00 -   9,938,986 -   13,821,952
     [java] OpenReader              -   -   -        4            1         24.0        0.17     9,957,592     13,821,952
     [java] SearchSameRdr_50000 -   - - - - - -  -   4 -  -   50000 -  - 1,070.3 -  - 186.86 -  10,500,146 -   13,821,952
     [java] CloseReader             -   -   -        4            1      4,000.0        0.00     9,059,756     13,821,952
     [java] WarmNewRdr_50 -  -  -   - - - - - -  -   4 -  -  100000 -   16,237.7 -  -  24.63 -   9,060,268 -   13,821,952
     [java] SrchNewRdr_50000        -   -   -        4        50000        265.9      752.02    10,800,006     13,821,952


     [java] ------------> Report sum by Prefix (MAddDocs) and Round (4 about 4 out of 41)
     [java] Operation     round mrg buf   runCnt   recsPerRun        rec/s  elapsedSec    avgUsedMem    avgTotalMem
     [java] MAddDocs_2000     0  10  10        1         2000         94.6       21.15     7,844,112     10,407,936
     [java] MAddDocs_2000 -   1 100  10 -  -   1 -  -  - 2000 -  -   136.7 -  -  14.63 -   8,968,144 -   11,309,056
     [java] MAddDocs_2000     2  10 100        1         2000        173.2       11.55    10,528,264     15,740,928
     [java] MAddDocs_2000 -   3 100 100 -  -   1 -  -  - 2000 -  -   188.7 -  -  10.60 -  10,133,816 -   17,829,888


MATCHCOLLECTOR:


     [java] ------------> Report Sum By (any) Name (11 about 41 out of 41)
     [java] Operation           round mrg buf   runCnt   recsPerRun        rec/s  elapsedSec    avgUsedMem    avgTotalMem
     [java] Rounds_4                0  10  10        1       808020        781.0    1,034.62    10,566,608     15,859,712
     [java] Populate -  -  -  -  -  - - - - - -  -   4 -  -  - 2003 -  -   130.9 -  -  61.23 -  10,963,452 -   14,806,016
     [java] CreateIndex             -   -   -        4            1         33.9        0.12     3,616,570     11,020,288
     [java] MAddDocs_2000 -  -  -   - - - - - -  -   4 -  -  - 2000 -  -   137.3 -  -  58.29 -  10,445,568 -   14,806,016
     [java] Optimize                -   -   -        4            1          1.4        2.82    10,979,398     14,806,016
     [java] CloseIndex -  -  -  -   - - - - - -  -   4 -  -  -  - 1 -  - 2,000.0 -  -   0.00 -  10,963,452 -   14,806,016
     [java] OpenReader              -   -   -        4            1         22.0        0.18    10,982,058     14,806,016
     [java] SearchSameRdr_50000 -   - - - - - -  -   4 -  -   50000 -  - 1,064.7 -  - 187.84 -  11,060,036 -   14,806,016
     [java] CloseReader             -   -   -        4            1      4,000.0        0.00    10,353,206     14,806,016
     [java] WarmNewRdr_50 -  -  -   - - - - - -  -   4 -  -  100000 -   16,419.0 -  -  24.36 -  10,431,062 -   14,806,016
     [java] SrchNewRdr_50000        -   -   -        4        50000        263.0      760.34    11,912,358     14,806,016


     [java] ------------> Report sum by Prefix (MAddDocs) and Round (4 about 4 out of 41)
     [java] Operation     round mrg buf   runCnt   recsPerRun        rec/s  elapsedSec    avgUsedMem    avgTotalMem
     [java] MAddDocs_2000     0  10  10        1         2000         92.2       21.69     7,844,112     10,407,936
     [java] MAddDocs_2000 -   1 100  10 -  -   1 -  -  - 2000 -  -   136.6 -  -  14.64 -   7,720,352 -   10,407,936
     [java] MAddDocs_2000     2  10 100        1         2000        167.8       11.92    11,325,952     17,571,840
     [java] MAddDocs_2000 -   3 100 100 -  -   1 -  -  - 2000 -  -   199.3 -  -  10.03 -  14,891,856 -   20,836,352



This is what I did for the benchmark.  I used Doron's handy conf/benchmark.
I added a new .alg based on micro-standard.alg, here's the diff:


$ diff conf/micro-standard.alg conf/matcher-micro-standard.alg 
60c60
<     { "SearchSameRdr" Search > : 50000
---
>     { "SearchSameRdr" SearchMatch > : 50000
65c65
<     { "SrchNewRdr" Search > : 50000
---
>     { "SrchNewRdr" SearchMatch > : 50000


Then I added 2 new Tasks for benchamrking the Matcher (searcher.search(Query, MatchCollector)) and modified the ReadTask to call searcher.search(Query, HitCollector) instead of the method to get Hits.

I commented out all search results traversal and doc retrieval, as I didn't care to measure that.


> Decouple Filter from BitSet
> ---------------------------
>
>                 Key: LUCENE-584
>                 URL: https://issues.apache.org/jira/browse/LUCENE-584
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.0.1
>            Reporter: Peter Schäfer
>            Priority: Minor
>         Attachments: bench-diff.txt, BitsMatcher.java, Filter-20060628.patch, HitCollector-20060628.patch, IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of memory. It would be desirable to have an alternative BitSet implementation with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org