You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Karl Wettin (JIRA)" <ji...@apache.org> on 2007/03/13 03:37:09 UTC
[jira] Commented: (LUCENE-550) InstantiatedIndex - faster but
memory consuming index
[ https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12480241 ]
Karl Wettin commented on LUCENE-550:
------------------------------------
A note on, and output from contrib/benchmark:
I'm getting really poor results compared to my own test and live enviroment stats. At query time I expected maximum 1/6th time spent in InstantiatedIndex than RAMDirectory, but it turns out that in the benchmarker the speed is almost the same as RAMDirectory. Retrieving documents is only 1/5th of the speed rather than maximum 1/60th as expected.
Investigated the code a bit and noticed that ReadTask creates a new instance of IndexReader and IndexSearcher for each query. Could this be the reason?
Memory consumption is 3x of a RAMDirectory, but half of the memory is spent on keeping the Document instances in heap. Perhaps it would be interesting to use the same persistency for these as in the Directory implementations.
The merge factor sweet spot is around 2500, where it turns out to be a little bit faster than the RAMDirectory sweet spot. At defualt 10 InstantiatedIndex consumes about 5x more time than a RAMDirectory. If I fix the locklessness as suggested in previous comment, it most probably will be much faster than a RAMDirectory at any setting.
/**
* The sweet spot for this implementation is at 2500.
* <p/>
* Benchmark output:
* <pre>
* ------------> Report sum by Prefix (MAddDocs) and Round (8 about 8 out of 160153)
* Operation round mrg buf cmpnd runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem
* MAddDocs_20000 0 10 10 true 1 20000 81,4 245,68 200 325 152 268 156 928
* MAddDocs_20000 - 1 1000 10 true - - 1 - - 20000 - - 494,1 - - 40,47 - 247 119 072 - 347 025 408
* MAddDocs_20000 2 10 100 true 1 20000 104,8 190,81 233 895 552 363 720 704
* MAddDocs_20000 - 3 2000 100 true - - 1 - - 20000 - - 527,2 - - 37,94 - 266 136 448 - 378 273 792
* MAddDocs_20000 4 10 10 false 1 20000 103,2 193,75 222 089 792 378 273 792
* MAddDocs_20000 - 5 3000 10 false - - 1 - - 20000 - - 545,2 - - 36,69 - 237 917 152 - 378 273 792
* MAddDocs_20000 6 10 100 false 1 20000 102,7 194,67 237 018 976 378 273 792
* MAddDocs_20000 - 7 4000 100 false - - 1 - - 20000 - - 535,8 - - 37,33 - 309 680 640 - 501 968 896
* </pre>
*
* @see org.apache.lucene.index.IndexWriterInterface#setMergeFactor(int)
*/
public void setMergeFactor(int mergeFactor) {
I would not pay to much attention to the numbers below until I've got the benchmarker under control, but here are the stats:
Output from InstantiatedIndex:
[java] ------------> Report Sum By (any) Name (19 about 160153 out of 160153)
[java] Operation round mrg buf cmpnd runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem
[java] Rounds_8 0 10 10 true 1 25142792 19?842,0 1?267,15 291?055?680 377?163?776
[java] Populate - - - - - - - - - - - - - - - - - - 8 - - 20003 - - 148,1 - 1?080,73 - 249?711?264 - 354?926?592
[java] CreateIndex - - - - 8 1 1?142,9 0,01 178?670?624 322?181?120
[java] MAddDocs_20000 - - - - - - - - - - - - - - - - 8 - - 20000 - - 148,0 - 1?080,72 - 249?706?256 - 354?926?592
[java] AddDoc - - - - 160000 1 156,2 1?024,02 228?890?976 339?588?384
[java] Optimize - - - - - - - - - - - - - - - - - - 8 - - - - 1 - - 8?000,0 - - 0,00 - 249?679?056 - 354?926?592
[java] CloseIndex - - - - 8 1 2?666,7 0,00 249?689?056 354?926?592
[java] OpenReader - - - - - - - - - - - - - - - - - 16 - - - - 1 - 16?000,0 - - 0,00 - 246?507?072 - 354?926?592
[java] SearchSameRdr_5000 - - - - 8 5000 806,6 49,59 250?121?728 354?926?592
[java] CloseReader - - - - - - - - - - - - - - - - - 16 - - - - 1 - 16?000,0 - - 0,00 - 249?146?336 - 354?971?648
[java] WarmNewRdr_50 - - - - 8 1000000 3?118?908,5 2,57 249?616?272 354?926?592
[java] SrchNewRdr_500 - - - - - - - - - - - - - - - - 8 - - - 500 - - 806,5 - - 4,96 - 252?762?128 - 354?926?592
[java] SrchTrvNewRdr_300 - - - - 8 335500 135?891,9 19,75 250?484?240 354?926?592
[java] SrchTrvRetNewRdr_100 - - - - - - - - - - - - - - 8 - - 209216 - 267?326,0 - - 6,26 - 245?991?776 - 354?926?592
[java] SearchSameRdr_5000_2500/sec_Par - - - - 8 5000 1?163,3 34,39 250?892?304 355?016?704
[java] WarmNewRdr_50_25/sec_Par - - - - - - - - - - - - - 8 - - 1000000 - 507?872,0 - - 15,75 - 250?855?648 - 355?016?704
[java] SrchNewRdr_50_25/sec_Par - - - - 8 50 25,5 15,69 254?289?584 355?016?704
[java] SrchTrvNewRdr_300_150/sec_Par - - - - - - - - - - - 8 - - 335500 - 177?807,2 - - 15,10 - 251?699?584 - 355?016?704
[java] SrchTrvRetNewRdr_100_50/sec_Par - - - - 8 232076 117?106,6 15,85 252?423?376 355?016?704
Output from RAMDirectory:
[java] ------------> Report Sum By (any) Name (19 about 160153 out of 160153)
[java] Operation round mrg buf cmpnd runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem
[java] Rounds_8 0 10 10 true 1 25142792 36?177,3 694,99 119?427?680 182?538?240
[java] Populate - - - - - - - - - - - - - - - - - - 8 - - 20003 - - 482,0 - - 331,99 - 114?288?472 - 140?156?416
[java] CreateIndex - - - - 8 1 2?666,7 0,00 48?867?204 124?752?384
[java] MAddDocs_20000 - - - - - - - - - - - - - - - - 8 - - 20000 - - 499,2 - - 320,51 - 111?734?320 - 135?969?280
[java] AddDoc - - - - 160000 1 604,9 264,49 90?860?048 130?812?488
[java] Optimize - - - - - - - - - - - - - - - - - - 8 - - - - 1 - - - 0,7 - - 11,48 - 123?532?104 - 140?156?416
[java] CloseIndex - - - - 8 1 8?000,0 0,00 114?288?472 140?156?416
[java] OpenReader - - - - - - - - - - - - - - - - - 16 - - - - 1 - - 197,5 - - 0,08 - 113?600?096 - 143?475?712
[java] SearchSameRdr_5000 - - - - 8 5000 1?209,4 33,07 115?720?920 143?314?944
[java] CloseReader - - - - - - - - - - - - - - - - - 16 - - - - 1 - 16?000,0 - - 0,00 - 102?590?368 - 145?079?552
[java] WarmNewRdr_50 - - - - 8 1000000 65?734,9 121,70 105?734?472 143?314?944
[java] SrchNewRdr_500 - - - - - - - - - - - - - - - - 8 - - - 500 - - 417,4 - - 9,58 - 104?480?168 - 146?795?008
[java] SrchTrvNewRdr_300 - - - - 8 335500 133?532,3 20,10 116?353?456 146?795?008
[java] SrchTrvRetNewRdr_100 - - - - - - - - - - - - - - 8 - - 209216 - 60?686,3 - - 27,58 - 124?211?040 - 146?795?008
[java] SearchSameRdr_5000_2500/sec_Par - - - - 8 5000 1?596,0 25,06 114?145?856 146?844?160
[java] WarmNewRdr_50_25/sec_Par - - - - - - - - - - - - - 8 - - 1000000 - 105?678,9 - - 75,70 - 104?830?320 - 146?844?160
[java] SrchNewRdr_50_25/sec_Par - - - - 8 50 25,5 15,70 107?417?728 146?844?160
[java] SrchTrvNewRdr_300_150/sec_Par - - - - - - - - - - - 8 - - 335500 - 178?635,6 - - 15,02 - 116?779?312 - 146?835?968
[java] SrchTrvRetNewRdr_100_50/sec_Par - - - - 8 232076 100?569,2 18,46 111?881?152 146?819?584
> InstantiatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
> Key: LUCENE-550
> URL: https://issues.apache.org/jira/browse/LUCENE-550
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Store
> Affects Versions: 2.0.0
> Reporter: Karl Wettin
> Assigned To: Karl Wettin
> Attachments: lucene-550.jpg, test-reports.zip, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2
>
>
> An non file centrinc all in memory index. Consumes some 2x the memory of a RAMDirectory (in a term satured index) but is between 3x-60x faster depending on application and how one counts. Average query is about 8x faster. IndexWriter and IndexModifier have been realized in InterfaceIndexWriter and InterfaceIndexModifier.
> InstantiatedIndex is wrapped in a new top layer index facade (class Index) that comes with factory methods for writers, readers and searchers for unison index handeling. There are decorators with notification handling that can be used for automatically syncronizing searchers on updates, et.c.
> Index also comes with FS/RAMDirectory implementation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org