You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2008/05/10 12:35:55 UTC
[jira] Updated: (LUCENE-1283) Factor out ByteSliceWriter from
DocumentsWriterFieldData
[ https://issues.apache.org/jira/browse/LUCENE-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-1283:
---------------------------------------
Attachment: LUCENE-1283.patch
Attached patch. I plan to commit in a day or two.
> Factor out ByteSliceWriter from DocumentsWriterFieldData
> --------------------------------------------------------
>
> Key: LUCENE-1283
> URL: https://issues.apache.org/jira/browse/LUCENE-1283
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Affects Versions: 2.3, 2.3.1
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 2.4
>
> Attachments: LUCENE-1283.patch
>
>
> DocumentsWriter uses byte slices into shared byte[]'s to hold the
> growing postings data for many different terms in memory. This is
> probably the trickiest (most confusing) part of DocumentsWriter.
> Right now it's not cleanly factored out and not easy to separately
> test. In working on this issue:
> http://mail-archives.apache.org/mod_mbox/lucene-java-user/200805.mbox/%3c126142c0805061426n1168421ya5594ef854fae5e4@mail.gmail.com%3e
> which eventually turned out to be a bug in Oracle JRE's JIT compiler,
> I factored out ByteSliceWriter and created a unit test to stress test
> the writing & reading of byte slices. The test just randomly writes N
> streams interleaved into shared byte[]'s, then reads them back
> verifying the results are correct.
> I created the stress test to try to find any bugs in that code. The
> test ran fine (no bugs were found) but I think the refactoring is
> still very much worthwhile.
> I expected the changes to reduce indexing throughput, so I ran a test
> indexing first 200K Wikipedia docs using this alg:
> {code}
> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
> doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
> docs.file=/Volumes/External/lucene/wiki.txt
> doc.stored = true
> doc.term.vector = true
> doc.add.log.step=2000
> directory=FSDirectory
> autocommit=false
> compound=true
> ram.flush.mb=256
> { "Rounds"
> ResetSystemErase
> { "BuildIndex"
> - CreateIndex
> { "AddDocs" AddDoc > : 200000
> - CloseIndex
> }
> NewRound
> } : 4
> RepSumByPrefRound BuildIndex
> {code}
> Ok trunk it produces these results:
> {code}
> Operation round runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem
> BuildIndex 0 1 200000 791.7 252.63 338,552,096 1,061,814,272
> BuildIndex - - 1 - - 1 - - 200000 - - 793.1 - - 252.18 - 605,262,080 1,061,814,272
> BuildIndex 2 1 200000 794.8 251.63 601,966,528 1,061,814,272
> BuildIndex - - 3 - - 1 - - 200000 - - 782.5 - - 255.58 - 608,699,712 1,061,814,272
> {code}
> and with the patch:
> {code}
> Operation round runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem
> BuildIndex 0 1 200000 745.0 268.47 338,318,784 1,061,814,272
> BuildIndex - - 1 - - 1 - - 200000 - - 792.7 - - 252.30 - 605,331,776 1,061,814,272
> BuildIndex 2 1 200000 786.7 254.24 602,915,712 1,061,814,272
> BuildIndex - - 3 - - 1 - - 200000 - - 795.3 - - 251.48 - 602,378,624 1,061,814,272
> {code}
> So it looks like the performance cost of this change is negligible (in
> the noise).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org