You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2011/12/21 11:09:31 UTC

[jira] [Issue Comment Edited] (LUCENE-3654) Optimize BytesRef comparator to use Unsafe long based comparison (when possible)

    [ https://issues.apache.org/jira/browse/LUCENE-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173973#comment-13173973 ] 

Uwe Schindler edited comment on LUCENE-3654 at 12/21/11 10:09 AM:
------------------------------------------------------------------

The SIGSEGV can be solved by doing some safety checks at the beginning of compare: check that offset>=0 and offset+length<=bytes.length. If you use Unsafe, you have to make sure that your parameters are 1000% correct, that's all. This is why java.nio does lots of checks in their Buffer methods.

*EDIT*
You also have to copy offset, length and the actual byte[] reference to a local variable at the beginning and before the bounds checks (because otherwise another thread could change the *public* non-final fields in BytesRef and cause SIGSEGV). BytesRef is a user-visible class so it must be 100% safe against all usage-violations.

Based on this additional overhead, the whole comparator makes no sense except for terms with a size of 200 bytes. But Lucene terms are in 99% of all cases shorter.

If you want to use this comparator, just subclass Lucene40Codec and return it as term comparator, this can be completely outside Lucene. You can even use Guava.
                
      was (Author: thetaphi):
    The SIGSEGV can be solved by doing some safety checks at the beginning of compare: check that offset>=0 and offset+length<=bytes.length. If you use Unsafe, you have to make sure that your parameters are 1000% correct, that's all. This is why java.nio does lots of checks in their Buffer methods.

*EDIT*
You also have to copy offset, length and the actual byte[] reference to a local variable at the beginning and before the bounds checks (because otherwise another thread could change the *public* npon-final fields in BytesRef and cause OOM). BytesRef is a user-visible class so it must be 100% safe against all usage-violations.

Based on this additional overhead, the whole comparator makes no sense except for terms with a size of 200 bytes. But Lucene terms are in 99% of all cases shorter.

If you want to use this comparator, just subclass Lucene40Codec and return it as term comparator, this can be completely outside Lucene. You can even use Guava.
                  
> Optimize BytesRef comparator to use Unsafe long based comparison (when possible)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3654
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3654
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>            Reporter: Shay Banon
>         Attachments: LUCENE-3654.patch
>
>
> Inspire by Google Guava UnsignedBytes lexi comparator, that uses unsafe to do long based comparisons over the bytes instead of one by one (which yields 2-4x better perf), use similar logic in BytesRef comparator. The code was adapted to support offset/length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org