You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2008/01/10 15:08:34 UTC
[jira] Updated: (LUCENE-1125) Excessive Arrays.fill(0) in
DocumentsWriter drastically slows down small docs (3.9X slowdown!)
[ https://issues.apache.org/jira/browse/LUCENE-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-1125:
---------------------------------------
Attachment: LUCENE-1125.patch
Attached patch. I ran a test where I index the first 6M docs from
Wikipedia preprocessed to 100 bytes each, using this alg:
{code}
analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
work.dir=/lucene/work
doc.stored = true
doc.term.vector = true
doc.term.vector.positions = true
doc.term.vector.offsets = true
ram.flush.mb = 64
compound = false
autocommit = false
docs.file=/Volumes/External/lucene/wikifull100.txt
doc.add.log.step=10000
directory=FSDirectory
ResetSystemErase
{ "BuildIndex"
CreateIndex
[ { "AddDocs" AddDoc > : 1500000 ]: 4
CloseIndex
}
RepSumByPrefRound BuildIndex
{code}
With this fix, it takes 158.5 seconds. Without it, it takes 621.8
seconds = 3.9X slower!
The fix is very low risk. All tests pass.
Michael, I think we should spin 2.3 RC2 to include this fix? Sorry to
only find it so late in the game :(
> Excessive Arrays.fill(0) in DocumentsWriter drastically slows down small docs (3.9X slowdown!)
> ----------------------------------------------------------------------------------------------
>
> Key: LUCENE-1125
> URL: https://issues.apache.org/jira/browse/LUCENE-1125
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 2.3
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 2.3
>
> Attachments: LUCENE-1125.patch
>
>
> I've been doing some "final" performance testing of 2.3RC1 and
> uncovered a fairly serious bug that adds a large fixed CPU cost when
> documents have any term vector enabled fields.
> The bug does not affect correctness, just performance.
> Basically, for every document, we were calling Arrays.fill(0) on a
> large (32 KB) byte array when in fact we only needed to zero a small
> part of it. This only happens if term vectors are turned on, and is
> especially devastating for small documents.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org