You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "David Allsopp (JIRA)" <ji...@apache.org> on 2011/07/03 18:23:21 UTC

[jira] [Created] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Converting bytes to hex string is unnecessarily slow
----------------------------------------------------

                 Key: CASSANDRA-2850
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.8.1
            Reporter: David Allsopp
            Priority: Minor
             Fix For: 0.8.2


ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.

(OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)

Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059268#comment-13059268 ] 

David Allsopp commented on CASSANDRA-2850:
------------------------------------------

Updated diff (2850a) includes the same optimisation in both the FBUtilities version and the ByteBufferUtils version, with similar x3 speedup in each case. Now uses a literal lookup table rather than generating it in static block.

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: BytesToHexBenchmark.java, BytesToHexBenchmark2.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp updated CASSANDRA-2850:
-------------------------------------

    Attachment: BytesToHexBenchmark3.java

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059589#comment-13059589 ] 

David Allsopp commented on CASSANDRA-2850:
------------------------------------------

I think you mean (bytes.remaining() * 2) not (bytes.remaining() / 2) - we need twice as many chars as bytes.

Also, shouldn't byteToChar[] have length 16, not 256.

Not sure what string creation you are referring to?

I attach 2 further versions of bytesToHex (as another benchmark class 3). Results are below (I've had to increasse the number of repeats so the stats are significant!).

v3 uses 'normal' code and is another 20% faster for large values, and _another_ factor of 2 faster than v2, i.e. 7-10 time sfatser than the original.

v4 uses nasty reflection to avoid doing an arraycopy on the byte array - this avoids a large chunk of memory (all the previous solutions end up doing an arraycopy somewhere). This is now 11-13 times fatser than the original.

20M old: 1482
20M new: 360
20M  v2: 249
20M  v3: 203
20M  v4: 125
----
old: 2137
new: 859
 v2: 718
 v3: 203
 v4: 156
----
old: 2138
new: 843
 v2: 733
 v3: 188
 v4: 156
----



> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2850:
--------------------------------------

             Reviewer: slebresne
    Affects Version/s:     (was: 0.7.6)
                           (was: 0.8.1)

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: David Allsopp
>            Assignee: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, 2850-v4.patch, 2850-v4a.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066780#comment-13066780 ] 

Jonathan Ellis commented on CASSANDRA-2850:
-------------------------------------------

This doesn't apply to either 0.7 or 0.8 branches for me...

{noformat}
form:svn-0.7 jonathan$ patch -p1 < 2850-v4.patch 
patching file src/java/org/apache/cassandra/utils/ByteBufferUtil.java
Hunk #1 FAILED at 20.
Hunk #2 succeeded at 446 (offset -33 lines).
1 out of 2 hunks FAILED -- saving rejects to file src/java/org/apache/cassandra/utils/ByteBufferUtil.java.rej
patching file src/java/org/apache/cassandra/utils/FBUtilities.java
Hunk #1 FAILED at 19.
Hunk #2 FAILED at 44.
Hunk #3 FAILED at 61.
Hunk #4 FAILED at 350.
Hunk #5 FAILED at 651.
5 out of 5 hunks FAILED -- saving rejects to file src/java/org/apache/cassandra/utils/FBUtilities.java.rej
{noformat}

Can you rebase on top of 0.8 head?  It's pretty much a non-issue for 0.7 anyway since it is never used on a client op path.

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Assignee: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, 2850-v4.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp updated CASSANDRA-2850:
-------------------------------------

    Attachment:     (was: BytesToHexBenchmark3.java)

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp updated CASSANDRA-2850:
-------------------------------------

    Attachment:     (was: BytesToHexBenchmark3.java)

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072895#comment-13072895 ] 

Hudson commented on CASSANDRA-2850:
-----------------------------------

Integrated in Cassandra-0.8 #244 (See [https://builds.apache.org/job/Cassandra-0.8/244/])
    Speedup bytes to hex conversions dramatically
patch by dallsopp; reviewed by slebresne for CASSANDRA-2850

slebresne : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1152265
Files : 
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/utils/ByteBufferUtil.java
* /cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/utils/ByteBufferUtilTest.java
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/utils/FBUtilities.java


> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: David Allsopp
>            Assignee: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.3
>
>         Attachments: 2850-rebased.txt, 2850-v2.patch, 2850-v4.patch, 2850-v4a.patch, 2850-v5.patch, 2850-v6_08.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp updated CASSANDRA-2850:
-------------------------------------

    Attachment: BytesToHexBenchmark2.java
                BytesToHexBenchmark.java

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: BytesToHexBenchmark.java, BytesToHexBenchmark2.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070069#comment-13070069 ] 

David Allsopp edited comment on CASSANDRA-2850 at 7/23/11 10:32 PM:
--------------------------------------------------------------------

OK - found bug, which only manifests when the ByteBuffer is at a non-zero position - the unit tests that directly test ByteBufferUtil don't test this circumstance, so will add something there, and recheck performance with a fixed version...

Oddly, testCleanupWithIndexes (and others) pass fine if I run them from Eclipse individually, but they fail if run via the Ant test task.

An aside - I initially couldn't get your patch to apply either - perhaps a newline character issue since I've been doing this work on Windows (never again!) - but, I did discover that it applied fine by setting the patch 'fuzz factor' to 2.

      was (Author: dallsopp):
    OK - found bug, which only manifests when the ByteBuffer is at a non-zero position - the unit tests that directly test ByteBufferUtil don't test this circumstance, so will add something there, and recheck performance with a fixed version...

Oddly, testCleanupWithIndexes (and others) pass fine if I run them from Eclipse individually, but they fail if run via the Ant test task.

An aside - I initially couldn't get your patch to apply either - perhaps a newline character issue since I've been doing this work on Windows (never again!) - but, I did discover that it applied fine using the patch function in Eclipse, by setting the 'fuzz factor' to 2.
  
> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: David Allsopp
>            Assignee: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.3
>
>         Attachments: 2850-rebased.txt, 2850-v2.patch, 2850-v4.patch, 2850-v4a.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064310#comment-13064310 ] 

Jonathan Ellis commented on CASSANDRA-2850:
-------------------------------------------

David, bytesToHex4 looks good to me.  Can you post a version of v2, with your improvements incorporated?

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059622#comment-13059622 ] 

David Allsopp commented on CASSANDRA-2850:
------------------------------------------

Update - the benchmark version 3 was running v3 twice, not v3 then v4. Have re-attached. New results are:

20M old: 1435
20M new: 376
20M  v2: 405
20M  v3: 141
20M  v4: 93
20M old: 1265
20M new: 360
20M  v2: 234
20M  v3: 187
20M  v4: 78
20M old: 1233
20M new: 376
20M  v2: 452
20M  v3: 125
20M  v4: 63
----
old: 2184
new: 906
 v2: 577
 v3: 188
 v4: 172
----
old: 2215
new: 937
 v2: 593
 v3: 188
 v4: 156
----

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067556#comment-13067556 ] 

David Allsopp commented on CASSANDRA-2850:
------------------------------------------

Something funny going on with the versions? - the revision numbers in your patch seem way higher than the ones I can see in SVN for trunk or 0.8 or 0.8.1 branches. I don't see the unit test failures here.

However, I _think_ one bug may be using bytes.get(i) rather than bytes.get(i) & 0xff as in the older code, to treat values as unsigned. Will take another look tonight.


> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: David Allsopp
>            Assignee: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-rebased.txt, 2850-v2.patch, 2850-v4.patch, 2850-v4a.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp updated CASSANDRA-2850:
-------------------------------------

    Affects Version/s: 0.7.6

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: BytesToHexBenchmark.java, cassandra-2850.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067556#comment-13067556 ] 

David Allsopp edited comment on CASSANDRA-2850 at 7/23/11 10:15 PM:
--------------------------------------------------------------------

Something funny going on with the versions? - the revision numbers in your patch seem way higher than the ones I can see in SVN for trunk or 0.8 or 0.8.1 branches. I don't see the unit test failures here.

-However, I _think_ one bug may be using bytes.get(i) rather than bytes.get(i) & 0xff as in the older code, to treat values as unsigned. Will take another look tonight.-


      was (Author: dallsopp):
    Something funny going on with the versions? - the revision numbers in your patch seem way higher than the ones I can see in SVN for trunk or 0.8 or 0.8.1 branches. I don't see the unit test failures here.

However, I _think_ one bug may be using bytes.get(i) rather than bytes.get(i) & 0xff as in the older code, to treat values as unsigned. Will take another look tonight.

  
> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: David Allsopp
>            Assignee: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.3
>
>         Attachments: 2850-rebased.txt, 2850-v2.patch, 2850-v4.patch, 2850-v4a.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2850:
----------------------------------------

    Attachment: 2850-v2.patch

Attaching a so-called v2 version that avoids the string object creation of
each byte by encodind each char separately. This version shows a >30% speedup
on the 10MB array conversion (and ~15% speedup on the 1K array conversion)
compared to the version of the previous patch. It also will generate less
garbage.

I've also broaden the scope of this ticket because hexToBytes also need some
love (actually even more so) and the v2 patch ships with a improved version of
hexToByte. As it turns out hexToByte was really naive and was using
substring() on every 2 characters, generating a lot of String objects. On a
micro-benchmark converting strings of 1000 characters, the attached version
shows a ~13x (!) speedup improvement. It also generate much less garbage.

To add to what David said, let's note that those methods used to not matter
too much (they were used non performance sensitive places, like debug/error
messages, or SSTable2json (though performance in those tools don't hurt)), but
are now used by CQL for BytesType.

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059607#comment-13059607 ] 

David Allsopp commented on CASSANDRA-2850:
------------------------------------------

I can't improve any further on Sylvain's hexToByte - nice work!

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp updated CASSANDRA-2850:
-------------------------------------

    Attachment: BytesToHexBenchmark3.java

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059311#comment-13059311 ] 

Jonathan Ellis commented on CASSANDRA-2850:
-------------------------------------------

Would prefer code to generate the table, the performance hit is negligible and that way it's obvious where the data comes from.

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: BytesToHexBenchmark.java, BytesToHexBenchmark2.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059241#comment-13059241 ] 

David Allsopp commented on CASSANDRA-2850:
------------------------------------------

Just noticed that there's a bytesToHex(byte... bytes) method in FBUtilities - whatever implementation is chosen, these methods probably ought to be together?

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: BytesToHexBenchmark.java, cassandra-2850.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp updated CASSANDRA-2850:
-------------------------------------

    Attachment: BytesToHexBenchmark3.java

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059964#comment-13059964 ] 

Jonathan Ellis commented on CASSANDRA-2850:
-------------------------------------------

bq. An issue with using hex at all is that we can't represent the maximum 2GB column value

I'm not worried about this.  In practice you need to start chunking large objects long before 2GB.  Ideally, single-digit MB or at most double-digit.

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059595#comment-13059595 ] 

David Allsopp commented on CASSANDRA-2850:
------------------------------------------

An issue with using hex at all is that we can't represent the maximum 2GB column value. If we have Integer.MAX_VALUE bytes, then we need twice as many chars - and arrays in Java are limited to Integer.MAX_VALUE.



> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2850:
----------------------------------------

    Attachment: 2850-v6_08.patch

+1, committed, thanks.

I've slightly updated the patch so that we use a copying String constructor in case using the package-protected one fails. Since it's not part of the public API, it avoids potential bug with some JDK that wouldn't have this constructor.

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: David Allsopp
>            Assignee: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.3
>
>         Attachments: 2850-rebased.txt, 2850-v2.patch, 2850-v4.patch, 2850-v4a.patch, 2850-v5.patch, 2850-v6_08.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp updated CASSANDRA-2850:
-------------------------------------

    Attachment: BytesToHexBenchmark.java

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: BytesToHexBenchmark.java, cassandra-2850.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp updated CASSANDRA-2850:
-------------------------------------

    Attachment:     (was: cassandra-2850.diff)

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: BytesToHexBenchmark.java, BytesToHexBenchmark2.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070069#comment-13070069 ] 

David Allsopp commented on CASSANDRA-2850:
------------------------------------------

OK - found bug, which only manifests when the ByteBuffer is at a non-zero position - the unit tests that directly test ByteBufferUtil don't test this circumstance, so will add something there, and recheck performance with a fixed version...

Oddly, testCleanupWithIndexes (and others) pass fine if I run them from Eclipse individually, but they fail if run via the Ant test task.

An aside - I initially couldn't get your patch to apply either - perhaps a newline character issue since I've been doing this work on Windows (never again!) - but, I did discover that it applied fine using the patch function in Eclipse, by setting the 'fuzz factor' to 2.

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: David Allsopp
>            Assignee: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.3
>
>         Attachments: 2850-rebased.txt, 2850-v2.patch, 2850-v4.patch, 2850-v4a.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp updated CASSANDRA-2850:
-------------------------------------

    Attachment:     (was: BytesToHexBenchmark3.java)

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp updated CASSANDRA-2850:
-------------------------------------

    Attachment:     (was: BytesToHexBenchmark.java)

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: BytesToHexBenchmark.java, BytesToHexBenchmark2.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059633#comment-13059633 ] 

David Allsopp commented on CASSANDRA-2850:
------------------------------------------

Although the bytesToHex reflection hack is a bit horrible, it makes a huge difference with really big values - I've just been trying different input sizes (with -Xmx4g -Xms4g on a 6GB machine) and the JVM falls over with OOM at about 300MB for all the other versions, but copes with 675MB for v4. 

With the other versions, for byte array size N, we also need at least 2N for the StringBuilder or char[], then another 2N for the String (because the normal String constructors and methods always do an arraycopy of the input byte[] -> 5N. 

I wonder where else in the code this sort of thing occurs...?

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp updated CASSANDRA-2850:
-------------------------------------

    Attachment: BytesToHexBenchmark3.java

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059239#comment-13059239 ] 

David Allsopp commented on CASSANDRA-2850:
------------------------------------------

Just a note as to why might matter (in the absence of hard profiling data) - hexToBytes() is called in a lot of places in the code, and is sometimes called on the entire column value.

A column value of 10MB takes over 750ms to convert on my machine (180ms with the patch), so it's significant load - and column values go up to 2GB (extrapolating linearly, 2GB could take over 2 minutes to convert, assuming you've got the spare RAM!).



> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: BytesToHexBenchmark.java, cassandra-2850.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070071#comment-13070071 ] 

David Allsopp commented on CASSANDRA-2850:
------------------------------------------

I think the different test behaviour in different contexts is because the tests use Java "assert" as well as JUnit assertEquals() etc, so are presumably expected to be run with assertions enabled (with -ea).

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: David Allsopp
>            Assignee: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.3
>
>         Attachments: 2850-rebased.txt, 2850-v2.patch, 2850-v4.patch, 2850-v4a.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp updated CASSANDRA-2850:
-------------------------------------

    Attachment: cassandra-2850a.diff

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: BytesToHexBenchmark.java, BytesToHexBenchmark2.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072867#comment-13072867 ] 

Sylvain Lebresne edited comment on CASSANDRA-2850 at 7/29/11 3:37 PM:
----------------------------------------------------------------------

+1, committed, thanks.

I've slightly updated the patch so that we use a copying String constructor in case using the package-protected one fails. Since it's not part of the public API, it avoids potential bug with some JDK that wouldn't have this constructor (I'm attaching the committed patch for the record).

      was (Author: slebresne):
    +1, committed, thanks.

I've slightly updated the patch so that we use a copying String constructor in case using the package-protected one fails. Since it's not part of the public API, it avoids potential bug with some JDK that wouldn't have this constructor.
  
> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: David Allsopp
>            Assignee: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.3
>
>         Attachments: 2850-rebased.txt, 2850-v2.patch, 2850-v4.patch, 2850-v4a.patch, 2850-v5.patch, 2850-v6_08.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059622#comment-13059622 ] 

David Allsopp edited comment on CASSANDRA-2850 at 7/4/11 9:48 PM:
------------------------------------------------------------------

Update - the benchmark version 3 was running v3 twice, not v3 then v4. Have re-attached. New results are 15-19x faster for 20MB values, 13-14x faster for 1KB values.

20M old: 1435
20M new: 376
20M  v2: 405
20M  v3: 141
20M  v4: 93
20M old: 1265
20M new: 360
20M  v2: 234
20M  v3: 187
20M  v4: 78
20M old: 1233
20M new: 376
20M  v2: 452
20M  v3: 125
20M  v4: 63
----
old: 2184
new: 906
 v2: 577
 v3: 188
 v4: 172
----
old: 2215
new: 937
 v2: 593
 v3: 188
 v4: 156
----

      was (Author: dallsopp):
    Update - the benchmark version 3 was running v3 twice, not v3 then v4. Have re-attached. New results are:

20M old: 1435
20M new: 376
20M  v2: 405
20M  v3: 141
20M  v4: 93
20M old: 1265
20M new: 360
20M  v2: 234
20M  v3: 187
20M  v4: 78
20M old: 1233
20M new: 376
20M  v2: 452
20M  v3: 125
20M  v4: 63
----
old: 2184
new: 906
 v2: 577
 v3: 188
 v4: 172
----
old: 2215
new: 937
 v2: 593
 v3: 188
 v4: 156
----
  
> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp updated CASSANDRA-2850:
-------------------------------------

    Attachment: 2850-v4a.patch

2850-v4a.patch is now against trunk rather than 0.8.1  - is that right?


> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Assignee: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, 2850-v4.patch, 2850-v4a.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp updated CASSANDRA-2850:
-------------------------------------

    Attachment: 2850-v4.patch

OK - the attached patch v4 includes both sets of improvements.

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Assignee: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, 2850-v4.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp updated CASSANDRA-2850:
-------------------------------------

    Attachment: cassandra-2850.diff

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: BytesToHexBenchmark.java, cassandra-2850.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059589#comment-13059589 ] 

David Allsopp edited comment on CASSANDRA-2850 at 7/4/11 7:46 PM:
------------------------------------------------------------------

I think you mean (bytes.remaining() * 2) not (bytes.remaining() / 2) - we need twice as many chars as bytes.

Also, shouldn't byteToChar[] have length 16, not 256?

Not sure what string creation you are referring to?

I attach 2 further versions of bytesToHex (as another benchmark class 3). Results are below (I've had to increase the number of repeats so the stats are significant!).

v3 uses 'normal' code and is another 20% faster for large values, and _another_ factor of 2 faster than v2, i.e. 7-10 times faster than the original.

v4 uses nasty reflection to avoid doing an arraycopy on the byte array - this avoids a large chunk of memory (all the previous solutions end up doing an arraycopy somewhere). This is now 11-13 times faster than the original.

20M old: 1482
20M new: 360
20M  v2: 249
20M  v3: 203
20M  v4: 125
----
old: 2137
new: 859
 v2: 718
 v3: 203
 v4: 156
----
old: 2138
new: 843
 v2: 733
 v3: 188
 v4: 156
----



      was (Author: dallsopp):
    I think you mean (bytes.remaining() * 2) not (bytes.remaining() / 2) - we need twice as many chars as bytes.

Also, shouldn't byteToChar[] have length 16, not 256.

Not sure what string creation you are referring to?

I attach 2 further versions of bytesToHex (as another benchmark class 3). Results are below (I've had to increasse the number of repeats so the stats are significant!).

v3 uses 'normal' code and is another 20% faster for large values, and _another_ factor of 2 faster than v2, i.e. 7-10 time sfatser than the original.

v4 uses nasty reflection to avoid doing an arraycopy on the byte array - this avoids a large chunk of memory (all the previous solutions end up doing an arraycopy somewhere). This is now 11-13 times fatser than the original.

20M old: 1482
20M new: 360
20M  v2: 249
20M  v3: 203
20M  v4: 125
----
old: 2137
new: 859
 v2: 718
 v3: 203
 v4: 156
----
old: 2138
new: 843
 v2: 733
 v3: 188
 v4: 156
----


  
> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp updated CASSANDRA-2850:
-------------------------------------

    Attachment: 2850-v5.patch

OK, v5 attached with bugfix; performance seems at least as good as previous (buggy) version. 

I _hope_ the patch problems have also gone away - it applies to trunk for me  without complaint (using eclipse) and without needing to adjust the fuzz factor!

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: David Allsopp
>            Assignee: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.3
>
>         Attachments: 2850-rebased.txt, 2850-v2.patch, 2850-v4.patch, 2850-v4a.patch, 2850-v5.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "David Allsopp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Allsopp reassigned CASSANDRA-2850:
----------------------------------------

    Assignee: David Allsopp

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6, 0.8.1
>            Reporter: David Allsopp
>            Assignee: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-v2.patch, 2850-v4.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2850:
--------------------------------------

    Attachment: 2850-rebased.txt

Still getting the same errors on FBUtilities, but I merged it in manually.  Result is attached.

Getting test failures like this:

{noformat}
    [junit] Testcase: testCleanupWithIndexes(org.apache.cassandra.db.CleanupTest):	Caused an ERROR
    [junit] 312
    [junit] java.lang.ArrayIndexOutOfBoundsException: 312
    [junit] 	at org.apache.cassandra.utils.ByteBufferUtil.bytesToHex(ByteBufferUtil.java:489)
    [junit] 	at org.apache.cassandra.db.marshal.LocalByPartionerType.getString(LocalByPartionerType.java:53)
    [junit] 	at org.apache.cassandra.db.Column.getString(Column.java:229)
    [junit] 	at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:123)
    [junit] 	at org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:142)
    [junit] 	at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1312)
    [junit] 	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1189)
    [junit] 	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1146)
    [junit] 	at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1501)
    [junit] 	at org.apache.cassandra.db.CleanupTest.testCleanupWithIndexes(CleanupTest.java:112)
    [junit] 
    [junit] 
    [junit] Test org.apache.cassandra.db.CleanupTest FAILED
{noformat}

> Converting bytes to hex string is unnecessarily slow
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: David Allsopp
>            Assignee: David Allsopp
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: 2850-rebased.txt, 2850-v2.patch, 2850-v4.patch, 2850-v4a.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff
>
>
> ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte.
> (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change)
> Will attach patch shortly that speeds it up by about x3, plus benchmarking test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira