You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2010/07/10 22:46:49 UTC

[jira] Created: (LUCENE-2531) FieldComparator.TermOrdValComparator compares by value unnecessarily

FieldComparator.TermOrdValComparator compares by value unnecessarily
--------------------------------------------------------------------

                 Key: LUCENE-2531
                 URL: https://issues.apache.org/jira/browse/LUCENE-2531
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Search
            Reporter: Michael McCandless
            Priority: Minor
             Fix For: 3.1, 4.0


Digging on LUCENE-2504, I noticed that TermOrdValComparator's compareBottom method falls back on compare-by-value when it needn't.

Specifically, if we know the current bottom ord "matches" the current segment, we can skip the value comparison when the ords are the same (ie, return 0) because the ords are exactly comparable.

This is hurting string sort perf especially for optimized indices (and also unoptimized indices), and especially for highly redundant (not many unique values) fields.  This affects all releases >= 2.9.x, but trunk is likely more severely affected since looking up a value is more costly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2531) FieldComparator.TermOrdValComparator compares by value unnecessarily

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2531:
---------------------------------------

    Attachment: LUCENE-2531.patch
                LUCENE-2531-3x.patch

Another iteration on the patches with some small further optimizing.

> FieldComparator.TermOrdValComparator compares by value unnecessarily
> --------------------------------------------------------------------
>
>                 Key: LUCENE-2531
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2531
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2504-3x.patch, LUCENE-2504.patch, LUCENE-2531-3x.patch, LUCENE-2531.patch
>
>
> Digging on LUCENE-2504, I noticed that TermOrdValComparator's compareBottom method falls back on compare-by-value when it needn't.
> Specifically, if we know the current bottom ord "matches" the current segment, we can skip the value comparison when the ords are the same (ie, return 0) because the ords are exactly comparable.
> This is hurting string sort perf especially for optimized indices (and also unoptimized indices), and especially for highly redundant (not many unique values) fields.  This affects all releases >= 2.9.x, but trunk is likely more severely affected since looking up a value is more costly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2531) FieldComparator.TermOrdValComparator compares by value unnecessarily

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-2531.
----------------------------------------

    Resolution: Fixed

> FieldComparator.TermOrdValComparator compares by value unnecessarily
> --------------------------------------------------------------------
>
>                 Key: LUCENE-2531
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2531
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2504-3x.patch, LUCENE-2504.patch, LUCENE-2531-3x.patch, LUCENE-2531.patch
>
>
> Digging on LUCENE-2504, I noticed that TermOrdValComparator's compareBottom method falls back on compare-by-value when it needn't.
> Specifically, if we know the current bottom ord "matches" the current segment, we can skip the value comparison when the ords are the same (ie, return 0) because the ords are exactly comparable.
> This is hurting string sort perf especially for optimized indices (and also unoptimized indices), and especially for highly redundant (not many unique values) fields.  This affects all releases >= 2.9.x, but trunk is likely more severely affected since looking up a value is more costly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2531) FieldComparator.TermOrdValComparator compares by value unnecessarily

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887122#action_12887122 ] 

Yonik Seeley commented on LUCENE-2531:
--------------------------------------

The thing that jumps out at me is the removal of "reversed"... I'd need to apply the patch to study enough to figure that out, but any clues you could give would be helpful (yes, I'm being lazy ;-)

> FieldComparator.TermOrdValComparator compares by value unnecessarily
> --------------------------------------------------------------------
>
>                 Key: LUCENE-2531
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2531
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2504-3x.patch, LUCENE-2504.patch, LUCENE-2531-3x.patch, LUCENE-2531.patch
>
>
> Digging on LUCENE-2504, I noticed that TermOrdValComparator's compareBottom method falls back on compare-by-value when it needn't.
> Specifically, if we know the current bottom ord "matches" the current segment, we can skip the value comparison when the ords are the same (ie, return 0) because the ords are exactly comparable.
> This is hurting string sort perf especially for optimized indices (and also unoptimized indices), and especially for highly redundant (not many unique values) fields.  This affects all releases >= 2.9.x, but trunk is likely more severely affected since looking up a value is more costly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2531) FieldComparator.TermOrdValComparator compares by value unnecessarily

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2531:
---------------------------------------

    Attachment: LUCENE-2504.patch
                LUCENE-2504-3x.patch

Attached patches, for 3.1 and 4.0.

This also cleans up the String/TermOrdValComparator, absorbing the convert method directly into setBottom, and doing away with some dead code.  With this fix we no longer compare by value at all if index is a single segment (I think!).

> FieldComparator.TermOrdValComparator compares by value unnecessarily
> --------------------------------------------------------------------
>
>                 Key: LUCENE-2531
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2531
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2504-3x.patch, LUCENE-2504.patch
>
>
> Digging on LUCENE-2504, I noticed that TermOrdValComparator's compareBottom method falls back on compare-by-value when it needn't.
> Specifically, if we know the current bottom ord "matches" the current segment, we can skip the value comparison when the ords are the same (ie, return 0) because the ords are exactly comparable.
> This is hurting string sort perf especially for optimized indices (and also unoptimized indices), and especially for highly redundant (not many unique values) fields.  This affects all releases >= 2.9.x, but trunk is likely more severely affected since looking up a value is more costly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2531) FieldComparator.TermOrdValComparator compares by value unnecessarily

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887126#action_12887126 ] 

Michael McCandless commented on LUCENE-2531:
--------------------------------------------

I hear you; this stuff is kinda hairy ;)

The usage of reverse (and, sortPos) was only in dead code, inside the convert method.

The convert method used to be used more often (to do on-demand convert of the ord when its readerGen != current one), but we stopped doing that and only convert the bottom slot now, which mean this if:

{noformat}
      if (sortPos == 0 && bottomSlot != -1 && bottomSlot != slot) {
{noformat}

Can never be true (bottomSlot is *always* == slot), so that true clause was all dead code anyway.

> FieldComparator.TermOrdValComparator compares by value unnecessarily
> --------------------------------------------------------------------
>
>                 Key: LUCENE-2531
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2531
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2504-3x.patch, LUCENE-2504.patch, LUCENE-2531-3x.patch, LUCENE-2531.patch
>
>
> Digging on LUCENE-2504, I noticed that TermOrdValComparator's compareBottom method falls back on compare-by-value when it needn't.
> Specifically, if we know the current bottom ord "matches" the current segment, we can skip the value comparison when the ords are the same (ie, return 0) because the ords are exactly comparable.
> This is hurting string sort perf especially for optimized indices (and also unoptimized indices), and especially for highly redundant (not many unique values) fields.  This affects all releases >= 2.9.x, but trunk is likely more severely affected since looking up a value is more costly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org