You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Karl Wettin (JIRA)" <ji...@apache.org> on 2007/03/13 03:37:09 UTC

[jira] Commented: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

    [ https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12480241 ] 

Karl Wettin commented on LUCENE-550:
------------------------------------

A note on, and output from contrib/benchmark:

I'm getting really poor results compared to my own test and live enviroment stats. At query time I expected maximum 1/6th time spent in InstantiatedIndex than RAMDirectory, but it turns out that in the benchmarker the speed is almost the same as RAMDirectory. Retrieving documents is only 1/5th of the speed rather than maximum 1/60th as expected.

Investigated the code a bit and noticed that ReadTask creates a new instance of IndexReader and IndexSearcher for each query. Could this be the reason?

Memory consumption is 3x of a RAMDirectory, but half of the memory is spent on keeping the Document instances in heap. Perhaps it would be interesting to use the same persistency for these as in the Directory implementations.

The merge factor sweet spot is around 2500, where it turns out to be a little bit faster than the RAMDirectory sweet spot. At defualt 10 InstantiatedIndex consumes about 5x more time than a RAMDirectory. If I fix the locklessness as suggested in previous comment, it most probably will be much faster than a RAMDirectory at any setting.

/**
   * The sweet spot for this implementation is at 2500.
   * <p/>
   * Benchmark output:
   * <pre>
   *  ------------> Report sum by Prefix (MAddDocs) and Round (8 about 8 out of 160153)
   *  Operation      round  mrg buf cmpnd   runCnt   recsPerRun        rec/s  elapsedSec    avgUsedMem    avgTotalMem
   *  MAddDocs_20000     0   10  10  true        1        20000         81,4      245,68   200 325 152    268 156 928
   *  MAddDocs_20000 -   1 1000  10  true -  -   1 -  -   20000 -  -   494,1 -  -  40,47 - 247 119 072 -  347 025 408
   *  MAddDocs_20000     2   10 100  true        1        20000        104,8      190,81   233 895 552    363 720 704
   *  MAddDocs_20000 -   3 2000 100  true -  -   1 -  -   20000 -  -   527,2 -  -  37,94 - 266 136 448 -  378 273 792
   *  MAddDocs_20000     4   10  10 false        1        20000        103,2      193,75   222 089 792    378 273 792
   *  MAddDocs_20000 -   5 3000  10 false -  -   1 -  -   20000 -  -   545,2 -  -  36,69 - 237 917 152 -  378 273 792
   *  MAddDocs_20000     6   10 100 false        1        20000        102,7      194,67   237 018 976    378 273 792
   *  MAddDocs_20000 -   7 4000 100 false -  -   1 -  -   20000 -  -   535,8 -  -  37,33 - 309 680 640 -  501 968 896
   * </pre>
   *
   * @see org.apache.lucene.index.IndexWriterInterface#setMergeFactor(int)
   */
  public void setMergeFactor(int mergeFactor) {


I would not pay to much attention to the numbers below until I've got the benchmarker under control, but here are the stats:

Output from InstantiatedIndex:

 [java] ------------> Report Sum By (any) Name (19 about 160153 out of 160153)
     [java] Operation                       round mrg buf cmpnd   runCnt   recsPerRun        rec/s  elapsedSec    avgUsedMem    avgTotalMem
     [java] Rounds_8                            0  10  10  true        1     25142792     19?842,0    1?267,15   291?055?680    377?163?776
     [java] Populate -  -  -  -  -  -  -  -  -  - - - - - -   - -  -   8 -  -   20003 -  -   148,1 -  1?080,73 - 249?711?264 -  354?926?592
     [java] CreateIndex                         -   -   -     -        8            1      1?142,9        0,01   178?670?624    322?181?120
     [java] MAddDocs_20000 -  -  -  -  -  -  -  - - - - - -   - -  -   8 -  -   20000 -  -   148,0 -  1?080,72 - 249?706?256 -  354?926?592
     [java] AddDoc                              -   -   -     -   160000            1        156,2    1?024,02   228?890?976    339?588?384
     [java] Optimize -  -  -  -  -  -  -  -  -  - - - - - -   - -  -   8 -  -  -  - 1 -  - 8?000,0 -  -   0,00 - 249?679?056 -  354?926?592
     [java] CloseIndex                          -   -   -     -        8            1      2?666,7        0,00   249?689?056    354?926?592
     [java] OpenReader -  -  -  -  -  -  -  -   - - - - - -   - -  -  16 -  -  -  - 1 -   16?000,0 -  -   0,00 - 246?507?072 -  354?926?592
     [java] SearchSameRdr_5000                  -   -   -     -        8         5000        806,6       49,59   250?121?728    354?926?592
     [java] CloseReader -  -  -  -  -  -  -  -  - - - - - -   - -  -  16 -  -  -  - 1 -   16?000,0 -  -   0,00 - 249?146?336 -  354?971?648
     [java] WarmNewRdr_50                       -   -   -     -        8      1000000  3?118?908,5        2,57   249?616?272    354?926?592
     [java] SrchNewRdr_500 -  -  -  -  -  -  -  - - - - - -   - -  -   8 -  -  -  500 -  -   806,5 -  -   4,96 - 252?762?128 -  354?926?592
     [java] SrchTrvNewRdr_300                   -   -   -     -        8       335500    135?891,9       19,75   250?484?240    354?926?592
     [java] SrchTrvRetNewRdr_100 -  -  -  -  -  - - - - - -   - -  -   8 -  -  209216 -  267?326,0 -  -   6,26 - 245?991?776 -  354?926?592
     [java] SearchSameRdr_5000_2500/sec_Par     -   -   -     -        8         5000      1?163,3       34,39   250?892?304    355?016?704
     [java] WarmNewRdr_50_25/sec_Par -  -  -  - - - - - - -   - -  -   8 -  - 1000000 -  507?872,0 -  -  15,75 - 250?855?648 -  355?016?704
     [java] SrchNewRdr_50_25/sec_Par            -   -   -     -        8           50         25,5       15,69   254?289?584    355?016?704
     [java] SrchTrvNewRdr_300_150/sec_Par -  -  - - - - - -   - -  -   8 -  -  335500 -  177?807,2 -  -  15,10 - 251?699?584 -  355?016?704
     [java] SrchTrvRetNewRdr_100_50/sec_Par     -   -   -     -        8       232076    117?106,6       15,85   252?423?376    355?016?704


Output from RAMDirectory:
[java] ------------> Report Sum By (any) Name (19 about 160153 out of 160153)
     [java] Operation                       round mrg buf cmpnd   runCnt   recsPerRun        rec/s  elapsedSec    avgUsedMem    avgTotalMem
     [java] Rounds_8                            0  10  10  true        1     25142792     36?177,3      694,99   119?427?680    182?538?240
     [java] Populate -  -  -  -  -  -  -  -  -  - - - - - -   - -  -   8 -  -   20003 -  -   482,0 -  - 331,99 - 114?288?472 -  140?156?416
     [java] CreateIndex                         -   -   -     -        8            1      2?666,7        0,00    48?867?204    124?752?384
     [java] MAddDocs_20000 -  -  -  -  -  -  -  - - - - - -   - -  -   8 -  -   20000 -  -   499,2 -  - 320,51 - 111?734?320 -  135?969?280
     [java] AddDoc                              -   -   -     -   160000            1        604,9      264,49    90?860?048    130?812?488
     [java] Optimize -  -  -  -  -  -  -  -  -  - - - - - -   - -  -   8 -  -  -  - 1 -  -  -  0,7 -  -  11,48 - 123?532?104 -  140?156?416
     [java] CloseIndex                          -   -   -     -        8            1      8?000,0        0,00   114?288?472    140?156?416
     [java] OpenReader -  -  -  -  -  -  -  -   - - - - - -   - -  -  16 -  -  -  - 1 -  -   197,5 -  -   0,08 - 113?600?096 -  143?475?712
     [java] SearchSameRdr_5000                  -   -   -     -        8         5000      1?209,4       33,07   115?720?920    143?314?944
     [java] CloseReader -  -  -  -  -  -  -  -  - - - - - -   - -  -  16 -  -  -  - 1 -   16?000,0 -  -   0,00 - 102?590?368 -  145?079?552
     [java] WarmNewRdr_50                       -   -   -     -        8      1000000     65?734,9      121,70   105?734?472    143?314?944
     [java] SrchNewRdr_500 -  -  -  -  -  -  -  - - - - - -   - -  -   8 -  -  -  500 -  -   417,4 -  -   9,58 - 104?480?168 -  146?795?008
     [java] SrchTrvNewRdr_300                   -   -   -     -        8       335500    133?532,3       20,10   116?353?456    146?795?008
     [java] SrchTrvRetNewRdr_100 -  -  -  -  -  - - - - - -   - -  -   8 -  -  209216 -   60?686,3 -  -  27,58 - 124?211?040 -  146?795?008
     [java] SearchSameRdr_5000_2500/sec_Par     -   -   -     -        8         5000      1?596,0       25,06   114?145?856    146?844?160
     [java] WarmNewRdr_50_25/sec_Par -  -  -  - - - - - - -   - -  -   8 -  - 1000000 -  105?678,9 -  -  75,70 - 104?830?320 -  146?844?160
     [java] SrchNewRdr_50_25/sec_Par            -   -   -     -        8           50         25,5       15,70   107?417?728    146?844?160
     [java] SrchTrvNewRdr_300_150/sec_Par -  -  - - - - - -   - -  -   8 -  -  335500 -  178?635,6 -  -  15,02 - 116?779?312 -  146?835?968
     [java] SrchTrvRetNewRdr_100_50/sec_Par     -   -   -     -        8       232076    100?569,2       18,46   111?881?152    146?819?584




> InstantiatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
>                 Key: LUCENE-550
>                 URL: https://issues.apache.org/jira/browse/LUCENE-550
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0
>            Reporter: Karl Wettin
>         Assigned To: Karl Wettin
>         Attachments: lucene-550.jpg, test-reports.zip, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2
>
>
> An non file centrinc all in memory index. Consumes some 2x the memory of a RAMDirectory (in a term satured index) but is between 3x-60x faster depending on application and how one counts. Average query is about 8x faster. IndexWriter and IndexModifier have been realized in InterfaceIndexWriter and InterfaceIndexModifier. 
> InstantiatedIndex is wrapped in a new top layer index facade (class Index) that comes with factory methods for writers, readers and searchers for unison index handeling. There are decorators with notification handling that can be used for automatically syncronizing searchers on updates, et.c. 
> Index also comes with FS/RAMDirectory implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org