You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2008/01/10 15:08:34 UTC

[jira] Updated: (LUCENE-1125) Excessive Arrays.fill(0) in DocumentsWriter drastically slows down small docs (3.9X slowdown!)

     [ https://issues.apache.org/jira/browse/LUCENE-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1125:
---------------------------------------

    Attachment: LUCENE-1125.patch

Attached patch.  I ran a test where I index the first 6M docs from
Wikipedia preprocessed to 100 bytes each, using this alg:

{code}
  analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
  doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
  work.dir=/lucene/work
  doc.stored = true
  doc.term.vector = true
  doc.term.vector.positions = true
  doc.term.vector.offsets = true
  ram.flush.mb = 64
  compound = false
  autocommit = false
  docs.file=/Volumes/External/lucene/wikifull100.txt
  doc.add.log.step=10000
  
  directory=FSDirectory
  
  ResetSystemErase
  { "BuildIndex"
    CreateIndex
    [ { "AddDocs" AddDoc > : 1500000 ]: 4
    CloseIndex
  }
  
  RepSumByPrefRound BuildIndex
{code}

With this fix, it takes 158.5 seconds.  Without it, it takes 621.8
seconds = 3.9X slower!

The fix is very low risk.  All tests pass.

Michael, I think we should spin 2.3 RC2 to include this fix?  Sorry to
only find it so late in the game :(


> Excessive Arrays.fill(0) in DocumentsWriter drastically slows down small docs (3.9X slowdown!)
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1125
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1125
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.3
>
>         Attachments: LUCENE-1125.patch
>
>
> I've been doing some "final" performance testing of 2.3RC1 and
> uncovered a fairly serious bug that adds a large fixed CPU cost when
> documents have any term vector enabled fields.
> The bug does not affect correctness, just performance.
> Basically, for every document, we were calling Arrays.fill(0) on a
> large (32 KB) byte array when in fact we only needed to zero a small
> part of it.  This only happens if term vectors are turned on, and is
> especially devastating for small documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org