You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2011/06/28 15:49:17 UTC

[jira] [Created] (LUCENE-3254) BitVector.isSparse is sometimes wrong

BitVector.isSparse is sometimes wrong
-------------------------------------

                 Key: LUCENE-3254
                 URL: https://issues.apache.org/jira/browse/LUCENE-3254
             Project: Lucene - Java
          Issue Type: Bug
          Components: core/other
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 3.4, 4.0


In working on LUCENE-3246, I found a few problems with
BitVector.isSparse:

  * Its math can overflow int, such that if there are enough deleted
    docs and maxDoc() is largish, isSparse may incorrectly return true

  * It over-estimates the size of the sparse file, since when
    estimating number of bytes for the vInt dgaps it uses bits.length
    instead of bits.length divided by number of set bits (ie, the
    "average" gap between set bits)

This is relatively harmless (just affects performance / size of .del
file on disk, not correctness).


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3254) BitVector.isSparse is sometimes wrong

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-3254.
----------------------------------------

    Resolution: Fixed

> BitVector.isSparse is sometimes wrong
> -------------------------------------
>
>                 Key: LUCENE-3254
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3254
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/other
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3254.patch
>
>
> In working on LUCENE-3246, I found a few problems with
> BitVector.isSparse:
>   * Its math can overflow int, such that if there are enough deleted
>     docs and maxDoc() is largish, isSparse may incorrectly return true
>   * It over-estimates the size of the sparse file, since when
>     estimating number of bytes for the vInt dgaps it uses bits.length
>     instead of bits.length divided by number of set bits (ie, the
>     "average" gap between set bits)
> This is relatively harmless (just affects performance / size of .del
> file on disk, not correctness).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3254) BitVector.isSparse is sometimes wrong

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-3254:
---------------------------------------

    Attachment: LUCENE-3254.patch

Patch, fixing those two issues, and also 1) marking BV as
@lucene.internal, 2) removing BV.subset (we don't use it), 3) adding a
back-compat version header to the BV file (I need this for
LUCENE-3246), and other small changes.


> BitVector.isSparse is sometimes wrong
> -------------------------------------
>
>                 Key: LUCENE-3254
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3254
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/other
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3254.patch
>
>
> In working on LUCENE-3246, I found a few problems with
> BitVector.isSparse:
>   * Its math can overflow int, such that if there are enough deleted
>     docs and maxDoc() is largish, isSparse may incorrectly return true
>   * It over-estimates the size of the sparse file, since when
>     estimating number of bytes for the vInt dgaps it uses bits.length
>     instead of bits.length divided by number of set bits (ie, the
>     "average" gap between set bits)
> This is relatively harmless (just affects performance / size of .del
> file on disk, not correctness).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org