You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/05/27 19:47:47 UTC
[jira] [Created] (HBASE-3928) Some potential performance
improvements to Bytes/KeyValue
Some potential performance improvements to Bytes/KeyValue
---------------------------------------------------------
Key: HBASE-3928
URL: https://issues.apache.org/jira/browse/HBASE-3928
Project: HBase
Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
Fix For: 0.92.0
We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
Case 1) the length is not the same - only need to do one comparison
Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3928) Some potential performance
improvements to Bytes/KeyValue
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040376#comment-13040376 ]
stack commented on HBASE-3928:
------------------------------
+1 on commit to TRUNK. Can only improve things. We'll be profiling 0.92 before commit so if a problem, a newly-introduced hotspot, we'll see it then (I'd doubt that this optimization would show in YCSB -- not unless it made stuff really bad, or really good which I don't think is going to be happening here).
Odd we don't cache this Bytes.toBytes("MAX_SEQ_ID_KEY") and the other TIMERANGE constant that follows.
> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
> Key: HBASE-3928
> URL: https://issues.apache.org/jira/browse/HBASE-3928
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.92.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Minor
> Fix For: 0.92.0
>
> Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3928) Some potential performance
improvements to Bytes/KeyValue
Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040855#comment-13040855 ]
Andrew Purtell commented on HBASE-3928:
---------------------------------------
+1
KV comparison is a key hotspot in memstore and upsert especially.
@Stack If you want additional confirmation I can see about running jprofiler in all-localhost with an upsert heavy workload. Any % improvement would be worth it though in my opinion.
> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
> Key: HBASE-3928
> URL: https://issues.apache.org/jira/browse/HBASE-3928
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.92.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Minor
> Fix For: 0.92.0
>
> Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3928) Some potential performance
improvements to Bytes/KeyValue
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041405#comment-13041405 ]
stack commented on HBASE-3928:
------------------------------
@Andrew Todd has two +1s now but he's the one asking for loadings (Todd if you want us to commit just say. If you want to wait on proof that its not breaking perf before commit, just say -- sounds like Andrew will give it a go).
> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
> Key: HBASE-3928
> URL: https://issues.apache.org/jira/browse/HBASE-3928
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.92.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Minor
> Fix For: 0.92.0
>
> Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3928) Some potential performance
improvements to Bytes/KeyValue
Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon updated HBASE-3928:
-------------------------------
Attachment: hbase-3928.txt
This patch hasn't been thoroughly tested or benchmarked yet but might be helpful. Someone got a cluster handy for YCSB scan/get benchmark for in-memory active set size?
> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
> Key: HBASE-3928
> URL: https://issues.apache.org/jira/browse/HBASE-3928
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.92.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Minor
> Fix For: 0.92.0
>
> Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3928) Some potential performance
improvements to Bytes/KeyValue
Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048464#comment-13048464 ]
Todd Lipcon commented on HBASE-3928:
------------------------------------
I ran some single-node benchmarks on my laptop. It doesn't seem to make a significant difference either way. Since logic says it should help, may as well commit it - certainly doesn't make things worse.
> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
> Key: HBASE-3928
> URL: https://issues.apache.org/jira/browse/HBASE-3928
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.92.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Critical
> Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3928) Some potential performance
improvements to Bytes/KeyValue
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3928:
-------------------------
Priority: Critical (was: Minor)
Marking critical because has patch available and perf.
> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
> Key: HBASE-3928
> URL: https://issues.apache.org/jira/browse/HBASE-3928
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.92.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Critical
> Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3928) Some potential performance
improvements to Bytes/KeyValue
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040380#comment-13040380 ]
stack commented on HBASE-3928:
------------------------------
oh, I looked closely to see if you flipped args or did a compare w/ a presumption that hadn't been testing a statement or two earlier and it all looks right to me.
> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
> Key: HBASE-3928
> URL: https://issues.apache.org/jira/browse/HBASE-3928
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.92.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Minor
> Fix For: 0.92.0
>
> Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3928) Some potential performance
improvements to Bytes/KeyValue
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050973#comment-13050973 ]
Hudson commented on HBASE-3928:
-------------------------------
Integrated in HBase-TRUNK #1976 (See [https://builds.apache.org/job/HBase-TRUNK/1976/])
> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
> Key: HBASE-3928
> URL: https://issues.apache.org/jira/browse/HBASE-3928
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.92.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Critical
> Fix For: 0.92.0
>
> Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-3928) Some potential performance
improvements to Bytes/KeyValue
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack resolved HBASE-3928.
--------------------------
Resolution: Fixed
Fix Version/s: 0.92.0
Hadoop Flags: [Reviewed]
Thanks for trying it Todd. I committed it for you to TRUNK.
> Some potential performance improvements to Bytes/KeyValue
> ---------------------------------------------------------
>
> Key: HBASE-3928
> URL: https://issues.apache.org/jira/browse/HBASE-3928
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.92.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Critical
> Fix For: 0.92.0
>
> Attachments: hbase-3928.txt
>
>
> We use Bytes.compareTo() a lot where we could be using a more efficient equals() method. The trick that makes equals() faster than compareTo is that we can short-circuit two common cases:
> Case 1) the length is not the same - only need to do one comparison
> Case 2) the two arrays have the same length and a common prefix: compare the last byte first, since it's the one most likely to differ (given we are usually comparing adjacent sorted data).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira