You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/05/27 19:47:47 UTC

[jira] [Created] (HBASE-3928) Some potential performance improvements to Bytes/KeyValue

Some potential performance improvements to Bytes/KeyValue
---------------------------------------------------------

                 Key: HBASE-3928
                 URL: https://issues.apache.org/jira/browse/HBASE-3928
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 0.92.0
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon
            Priority: Minor
             Fix For: 0.92.0


We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
Case 1) the length is not the same - only need to do one comparison
Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3928) Some potential performance improvements to Bytes/KeyValue

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040376#comment-13040376 ] 

stack commented on HBASE-3928:
------------------------------

+1 on commit to TRUNK.  Can only improve things.  We'll be profiling 0.92 before commit so if a problem, a newly-introduced hotspot, we'll see it then (I'd doubt that this optimization would show in YCSB -- not unless it made stuff really bad, or really good which I don't think is going to be happening here).

Odd we don't cache this Bytes.toBytes("MAX_SEQ_ID_KEY") and the other TIMERANGE constant that follows.



> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
>                 Key: HBASE-3928
>                 URL: https://issues.apache.org/jira/browse/HBASE-3928
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>             Fix For: 0.92.0
>
>         Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3928) Some potential performance improvements to Bytes/KeyValue

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040855#comment-13040855 ] 

Andrew Purtell commented on HBASE-3928:
---------------------------------------

+1

KV comparison is a key hotspot in memstore and upsert especially.

@Stack If you want additional confirmation I can see about running jprofiler in all-localhost with an upsert heavy workload. Any % improvement would be worth it though in my opinion.

> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
>                 Key: HBASE-3928
>                 URL: https://issues.apache.org/jira/browse/HBASE-3928
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>             Fix For: 0.92.0
>
>         Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3928) Some potential performance improvements to Bytes/KeyValue

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041405#comment-13041405 ] 

stack commented on HBASE-3928:
------------------------------

@Andrew Todd has two +1s now but he's the one asking for loadings (Todd if you want us to commit just say.  If you want to wait on proof that its not breaking perf before commit, just say -- sounds like Andrew will give it a go).

> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
>                 Key: HBASE-3928
>                 URL: https://issues.apache.org/jira/browse/HBASE-3928
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>             Fix For: 0.92.0
>
>         Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3928) Some potential performance improvements to Bytes/KeyValue

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HBASE-3928:
-------------------------------

    Attachment: hbase-3928.txt

This patch hasn't been thoroughly tested or benchmarked yet but might be helpful. Someone got a cluster handy for YCSB scan/get benchmark for in-memory active set size?

> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
>                 Key: HBASE-3928
>                 URL: https://issues.apache.org/jira/browse/HBASE-3928
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>             Fix For: 0.92.0
>
>         Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3928) Some potential performance improvements to Bytes/KeyValue

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048464#comment-13048464 ] 

Todd Lipcon commented on HBASE-3928:
------------------------------------

I ran some single-node benchmarks on my laptop. It doesn't seem to make a significant difference either way. Since logic says it should help, may as well commit it - certainly doesn't make things worse.

> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
>                 Key: HBASE-3928
>                 URL: https://issues.apache.org/jira/browse/HBASE-3928
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>         Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3928) Some potential performance improvements to Bytes/KeyValue

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3928:
-------------------------

    Priority: Critical  (was: Minor)

Marking critical because has patch available and perf.

> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
>                 Key: HBASE-3928
>                 URL: https://issues.apache.org/jira/browse/HBASE-3928
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>         Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3928) Some potential performance improvements to Bytes/KeyValue

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040380#comment-13040380 ] 

stack commented on HBASE-3928:
------------------------------

oh, I looked closely to see if you flipped args or did a compare w/ a presumption that hadn't been testing a statement or two earlier and it all looks right to me.

> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
>                 Key: HBASE-3928
>                 URL: https://issues.apache.org/jira/browse/HBASE-3928
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>             Fix For: 0.92.0
>
>         Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3928) Some potential performance improvements to Bytes/KeyValue

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050973#comment-13050973 ] 

Hudson commented on HBASE-3928:
-------------------------------

Integrated in HBase-TRUNK #1976 (See [https://builds.apache.org/job/HBase-TRUNK/1976/])
    

> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
>                 Key: HBASE-3928
>                 URL: https://issues.apache.org/jira/browse/HBASE-3928
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-3928) Some potential performance improvements to Bytes/KeyValue

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-3928.
--------------------------

       Resolution: Fixed
    Fix Version/s: 0.92.0
     Hadoop Flags: [Reviewed]

Thanks for trying it Todd.  I committed it for you to TRUNK.

> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
>                 Key: HBASE-3928
>                 URL: https://issues.apache.org/jira/browse/HBASE-3928
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira