You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2008/01/10 15:04:34 UTC

[jira] Created: (LUCENE-1125) Excessive Arrays.fill(0) in DocumentsWriter drastically slows down small docs (3.9X slowdown!)

Excessive Arrays.fill(0) in DocumentsWriter drastically slows down small docs (3.9X slowdown!)
----------------------------------------------------------------------------------------------

                 Key: LUCENE-1125
                 URL: https://issues.apache.org/jira/browse/LUCENE-1125
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
    Affects Versions: 2.3
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 2.3


I've been doing some "final" performance testing of 2.3RC1 and
uncovered a fairly serious bug that adds a large fixed CPU cost when
documents have any term vector enabled fields.

The bug does not affect correctness, just performance.

Basically, for every document, we were calling Arrays.fill(0) on a
large (32 KB) byte array when in fact we only needed to zero a small
part of it.  This only happens if term vectors are turned on, and is
especially devastating for small documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1125) Excessive Arrays.fill(0) in DocumentsWriter drastically slows down small docs (3.9X slowdown!)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1125.
----------------------------------------

    Resolution: Fixed

Fixed on trunk & 2.3 branch.

> Excessive Arrays.fill(0) in DocumentsWriter drastically slows down small docs (3.9X slowdown!)
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1125
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1125
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.3
>
>         Attachments: LUCENE-1125.patch
>
>
> I've been doing some "final" performance testing of 2.3RC1 and
> uncovered a fairly serious bug that adds a large fixed CPU cost when
> documents have any term vector enabled fields.
> The bug does not affect correctness, just performance.
> Basically, for every document, we were calling Arrays.fill(0) on a
> large (32 KB) byte array when in fact we only needed to zero a small
> part of it.  This only happens if term vectors are turned on, and is
> especially devastating for small documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1125) Excessive Arrays.fill(0) in DocumentsWriter drastically slows down small docs (3.9X slowdown!)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1125:
---------------------------------------

    Attachment: LUCENE-1125.patch

Attached patch.  I ran a test where I index the first 6M docs from
Wikipedia preprocessed to 100 bytes each, using this alg:

{code}
  analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
  doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
  work.dir=/lucene/work
  doc.stored = true
  doc.term.vector = true
  doc.term.vector.positions = true
  doc.term.vector.offsets = true
  ram.flush.mb = 64
  compound = false
  autocommit = false
  docs.file=/Volumes/External/lucene/wikifull100.txt
  doc.add.log.step=10000
  
  directory=FSDirectory
  
  ResetSystemErase
  { "BuildIndex"
    CreateIndex
    [ { "AddDocs" AddDoc > : 1500000 ]: 4
    CloseIndex
  }
  
  RepSumByPrefRound BuildIndex
{code}

With this fix, it takes 158.5 seconds.  Without it, it takes 621.8
seconds = 3.9X slower!

The fix is very low risk.  All tests pass.

Michael, I think we should spin 2.3 RC2 to include this fix?  Sorry to
only find it so late in the game :(


> Excessive Arrays.fill(0) in DocumentsWriter drastically slows down small docs (3.9X slowdown!)
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1125
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1125
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.3
>
>         Attachments: LUCENE-1125.patch
>
>
> I've been doing some "final" performance testing of 2.3RC1 and
> uncovered a fairly serious bug that adds a large fixed CPU cost when
> documents have any term vector enabled fields.
> The bug does not affect correctness, just performance.
> Basically, for every document, we were calling Arrays.fill(0) on a
> large (32 KB) byte array when in fact we only needed to zero a small
> part of it.  This only happens if term vectors are turned on, and is
> especially devastating for small documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1125) Excessive Arrays.fill(0) in DocumentsWriter drastically slows down small docs (3.9X slowdown!)

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557723#action_12557723 ] 

Michael Busch commented on LUCENE-1125:
---------------------------------------

{quote}
The fix is very low risk. All tests pass.
{quote}

Yes, all contrib & core tests pass for me too. And after reading the
patch I agree that it looks good and is low risk.

{quote}
Michael, I think we should spin 2.3 RC2 to include this fix? Sorry to
only find it so late in the game 
{quote}

OK, why don't you commit this today. Meanwhile I'll look into the
small issues Hoss pointed out and then build RC2 end of today.

Oh, and no need to be sorry! This is what the code freeze period is
for - finding and fixing problems! :-)

> Excessive Arrays.fill(0) in DocumentsWriter drastically slows down small docs (3.9X slowdown!)
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1125
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1125
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.3
>
>         Attachments: LUCENE-1125.patch
>
>
> I've been doing some "final" performance testing of 2.3RC1 and
> uncovered a fairly serious bug that adds a large fixed CPU cost when
> documents have any term vector enabled fields.
> The bug does not affect correctness, just performance.
> Basically, for every document, we were calling Arrays.fill(0) on a
> large (32 KB) byte array when in fact we only needed to zero a small
> part of it.  This only happens if term vectors are turned on, and is
> especially devastating for small documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1125) Excessive Arrays.fill(0) in DocumentsWriter drastically slows down small docs (3.9X slowdown!)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557735#action_12557735 ] 

Michael McCandless commented on LUCENE-1125:
--------------------------------------------

{quote}
OK, why don't you commit this today. Meanwhile I'll look into the
small issues Hoss pointed out and then build RC2 end of today.
{quote}
OK, will do, thanks!



> Excessive Arrays.fill(0) in DocumentsWriter drastically slows down small docs (3.9X slowdown!)
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1125
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1125
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.3
>
>         Attachments: LUCENE-1125.patch
>
>
> I've been doing some "final" performance testing of 2.3RC1 and
> uncovered a fairly serious bug that adds a large fixed CPU cost when
> documents have any term vector enabled fields.
> The bug does not affect correctness, just performance.
> Basically, for every document, we were calling Arrays.fill(0) on a
> large (32 KB) byte array when in fact we only needed to zero a small
> part of it.  This only happens if term vectors are turned on, and is
> especially devastating for small documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org