You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Doug Cutting (Created) (JIRA)" <ji...@apache.org> on 2011/10/20 20:16:10 UTC

[jira] [Created] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
------------------------------------------------------------------------------

                 Key: AVRO-939
                 URL: https://issues.apache.org/jira/browse/AVRO-939
             Project: Avro
          Issue Type: New Feature
          Components: java
            Reporter: Doug Cutting


Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.

http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276

We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420765#comment-13420765 ] 

Eli Reisman commented on AVRO-939:
----------------------------------

--- 1000 bytes, trunk, 64bit jvm ---
Executing tests: 
[ByteCompare]
 readTests:true
 writeTests:true
 cycles=800
                    test name     time    M entries/sec   M bytes/sec  bytes/cycle
           ByteCompareCompute:  17129 ms       2.919         0.000             0

--- 100 bytes, trunk, 64bit jvm ---
Executing tests: 
[ByteCompare]
 readTests:true
 writeTests:true
 cycles=800
                    test name     time    M entries/sec   M bytes/sec  bytes/cycle
           ByteCompareCompute:   2286 ms      21.864         0.000             0

--- 10 bytes, trunk, 64bit jvm ---
Executing tests: 
[ByteCompare]
 readTests:true
 writeTests:true
 cycles=800
                    test name     time    M entries/sec   M bytes/sec  bytes/cycle
           ByteCompareCompute:    999 ms      50.024         0.000             0

                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420794#comment-13420794 ] 

Scott Carey commented on AVRO-939:
----------------------------------

That is what it looks like -- a modest gain for larger sizes and a loss for smaller sizes.  I have not had time to look into more details.  I suspect there are some ways to improve the 10 byte time.  It would be useful to see a few other data points around that -- 5 bytes, 25 bytes, 50 bytes.    The slowdown for smaller sizes is an issue, and the modest improvement for larger ones is a surprise.  There may be something getting in the way of better improvements there as well -- or the JVM is doing a better job than expected optimizing this.
                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418676#comment-13418676 ] 

Eli Reisman commented on AVRO-939:
----------------------------------

having a little trouble patching this back in, mvn is not finding Perf.java in org.apache.io? patch took just fine though. Also, in Perf code: usage string indicates new "-nocompare" option, but details.append() call names this "-nocompute" will this affect anything important?

I ran patch -p0 < AVRO-939.patch from the lang/java/avro/ dir since the patch starts at "src/" dir.

Thanks!

                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419540#comment-13419540 ] 

Scott Carey commented on AVRO-939:
----------------------------------

{quote}wait a minute, if you meant 32 bit kernel running 64 bit hotspot thats still no good for a real 64 bit benchmark, is it?{quote}

It is fine.  If the JVM says it is 64 bit, then it is getting the benefit of the 64 bit registers and operations in user-space, which is what we need to measure here.

{quote}Given the results I saw, unless we can make the new version much faster on shorter byte arrays we'd need to change things to only use the optimized version for longer arrays (> 100 bytes) so that performance is never slower. Even that's probably only worthwhile if performance on 64-bit systems is significantly better than the current implementation.{quote}

It looks like there are some opportunities to make this code faster.  The method seems large, breaking it up may help Hotspot since it profiles at the method level and has more optimizations enabled for smaller methods.  Using an abstract class instead of interface might help.  We may be able to reduce the number of conditionals or present them to the JIT in a way that leads to better profiling and optimization. 
                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417896#comment-13417896 ] 

Scott Carey commented on AVRO-939:
----------------------------------

Approach looks good.  The patch is adding guava as a dependency, we don't need that now, do we?

A benchmark would be useful, is there something we can add to the test tree to measure this and the before/after?
                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-939:
------------------------------

    Attachment: AVRO-939.patch

Here's the same patch but with a benchmark added to Perf.java.  This can be run with:

mvn exec:java -Dexec.classpathScope="test" -Dexec.mainClass="org.apache.avro.io.Perf" -Dexec.args="-bc"

On a 32-bit machine shorter byte arrays (~10 bytes) the older implementation is considerably faster.  For medium byte arrays (~100 bytes) they're about even, and for longer byte arrays (~1000 bytes) the new implementation is about 20% faster.

Can someone try this with a 64-bit JVM?
                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417917#comment-13417917 ] 

Eli Reisman commented on AVRO-939:
----------------------------------

I noticed in the FastByteComparisons file there were several references to the original Guava code! I realize what FBC does is expose the indexed comparisons that Guava does not, but I suspect Guava is already a part of Hadoop and not yet for Avro. 

Any reason not to replace the UnsignedBytes/Longs calls in FastByteComparisons with something from code already in Sun Java or in Avro already to avoid bringing the whole Guava library into Avro just for one file that already duplicates 90% of the Guava code we needed in the first place? Or is it time for Avro to import Guava anyway for future use?

                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418687#comment-13418687 ] 

Doug Cutting commented on AVRO-939:
-----------------------------------

Oops.  I generated the patch from the wrong directory, but (cd lang/java/avro; patch -p 0 < AVRO-939.patch) should work, since all changes are under there.

You need to first run 'mvn test-compile' since Perf.java is in the test tree.  Sorry I forgot to mention that.
                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420751#comment-13420751 ] 

Eli Reisman commented on AVRO-939:
----------------------------------

So here's my results WITH the patch, compiling and testing without now, I'll post those in a moment...


--- 1000 bytes, with patch, 64bit jvm ---
Executing tests: 
[ByteCompare]
 readTests:true
 writeTests:true
 cycles=800
                    test name     time    M entries/sec   M bytes/sec  bytes/cycle
           ByteCompareCompute:  13696 ms       3.650         0.000             0

--- 100 bytes, with patch, 64bit jvm ---
[ByteCompare]
 readTests:true
 writeTests:true
 cycles=800
                    test name     time    M entries/sec   M bytes/sec  bytes/cycle
           ByteCompareCompute:   1570 ms      31.838         0.000             0

--- 10 bytes, with patch, 64bit jvm ---
Executing tests: 
[ByteCompare]
 readTests:true
 writeTests:true
 cycles=800
                    test name     time    M entries/sec   M bytes/sec  bytes/cycle
           ByteCompareCompute:   1100 ms      45.446         0.000             0

                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated AVRO-939:
-----------------------------

    Affects Version/s: 1.7.1
               Status: Patch Available  (was: Open)

First version, I am fairly certain this uses the Guava Lexicographical byte comparator without any undue byte[] copying, but in order to support indexed comparisons in BinaryData#compareBytes() I do have to wrap the inputs in ByteBuffers temporarily.

This does not make me happy, but did seem the the only way to do this? As long as we aren't copying our byte[] arguments, the use of the static ByteBuffer methods to wrap and slice the byte[] inputs down to indexed sizes is probably not too inefficient compared to the gains we get from the Guava (sun Unsafe based) comparator? Any thoughts? Is there a better way?

                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417792#comment-13417792 ] 

Eli Reisman commented on AVRO-939:
----------------------------------

I'll do it now, thanks!

                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419549#comment-13419549 ] 

Eli Reisman commented on AVRO-939:
----------------------------------

There could be some useful notes in the Hadoop thread where they developed this class -- I just copied it over from there. Odd that they didn't break up the code as you mentioned, sounds reasonable. Good catch on the tests, of course that makes a difference (duh) I will attempt a trunk build and retest ASAP since the word is my 32 bit macbook running 64bit JVM is OK for benchmarking.

                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated AVRO-939:
-----------------------------

    Attachment: AVRO-939-3.patch

sorry, missed something. Passes mvn verify etc. now
                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419511#comment-13419511 ] 

Doug Cutting commented on AVRO-939:
-----------------------------------

Folks seem to only be running the benchmark after the change.  You need to 'svn revert src/main/java/org/apache/avro/io/BinaryData.java' and run it again to see what the performance is like without the change.  You might also try changing the LENGTH constant in Perf.java to 10 and to 1000 to see how that changes things.

Given the results I saw, unless we can make the new version much faster on shorter byte arrays we'd need to change things to only use the optimized version for longer arrays (> 100 bytes) so that performance is never slower.  Even that's probably only worthwhile if performance on 64-bit systems is significantly better than the current implementation.
                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated AVRO-939:
-----------------------------

    Attachment: AVRO-939-4.patch

No longer requires Guava imports, everything is done within FastByteComparisons.java so it also avoids Hadoop dependencies.

                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated AVRO-939:
-----------------------------

    Attachment: AVRO-939-2.patch

Is this a little more what you had in mind?
                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Joey Echeverria (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196397#comment-13196397 ] 

Joey Echeverria commented on AVRO-939:
--------------------------------------

The Google Code Search link doesn't work because it's been shutdown. Here's a (hopefully) permanent link to the compare() method:

http://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/primitives/UnsignedBytes.java?r=5fe70b8509b2a14787d3823b5f928ef399a7afdc#274
                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418796#comment-13418796 ] 

Scott Carey commented on AVRO-939:
----------------------------------

Macs are different than Linux or Windows --  32 bit kernels can run 64 bit apps.  What does java -version say?
                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418757#comment-13418757 ] 

Eli Reisman commented on AVRO-939:
----------------------------------

GOOD NEWS: tests ran through this time. BAD NEWS: I'm on a 32bit kernel (sorry, this is my work machine and I'm new to macs, was sure this was 64bit until I did uname -a):

[ByteCompare]
 readTests:true
 writeTests:true
 cycles=800
                    test name     time    M entries/sec   M bytes/sec  bytes/cycle
           ByteCompareCompute:   1560 ms      32.041         0.000             0
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 5.989s
[INFO] Finished at: Thu Jul 19 15:52:59 PDT 2012
[INFO] Final Memory: 7M/81M
[INFO] ------------------------------------------------------------------------

I can try to get set up on 64 bit rig and see what happens. Anyone else get any results yet for 64 bit?

                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419382#comment-13419382 ] 

Eli Reisman commented on AVRO-939:
----------------------------------

nice! I did check java -version and its 64bit hotspot so we're good. I'm used to Linux uname dump where 64 bit is "x86_64" this dump said I386, didn't know what that meant in mac universe. Thanks again!

                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Thiruvalluvan M. G. (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420861#comment-13420861 ] 

Thiruvalluvan M. G. commented on AVRO-939:
------------------------------------------

We seem to violate principles of good programming by using something that is not a part of published API and by accessing private stuff through reflection. The designers of Unsafe would have thought enough and decided to keep it out of the published API and declaring the function private.

This is setting a bad precedent. The code will not compile where com.sun.Unsafe is not available. There is no guarantee that it will be present in any Java distribution. Neither is there a guarantee that this will be present in Sun's (Oracle's) distribution in later releases.

Traditionally, Java has done not so great job in defining module boundaries. (Package-private is of some use, but not really a great facility). Hadoop community, for example, has taken pains to annotate classes and interfaces to indicate their accessibility outside the module. Java language is trying to fix it in Java 8 using Jigsaw. Once that is in, probably this patch won't work.

I'm not sure if marginal performance improvement in a micro-benchmark is a good enough reason. Do we have evidence that this patch would significantly improve performance in a real application?

Probably, this debate has happened and settled in the Hadoop community or they considered it a Guava's internal matter. Since we decided to use "Unsafe" directly, we should have an opinion. I'm raising it here because even if we accept this patch, I think, we should acknowledge that we are doing it with full knowledge of its consequences.

                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418894#comment-13418894 ] 

Harsh J commented on AVRO-939:
------------------------------

Yeah I think Eli did run it on x64 JVM, so it should be fine. Besides, I thought Snow Leopard alone did that 32 vs 64 thing, and its gone away in Lion now? Am wrong?

Here's mine, from a *nix box with env info added, for Eli's benefit:

{code}
[ByteCompare]
 readTests:true
 writeTests:true
 cycles=800
                    test name     time    M entries/sec   M bytes/sec  bytes/cycle
           ByteCompareCompute:   2271 ms      22.015         0.000             0
{code}

bq. Linux <host> 2.6.32-220.13.1.el6.x86_64 #1 SMP Tue Apr 17 23:56:34 BST 2012 x86_64 x86_64 x86_64 GNU/Linux
bq. Intel(R) Xeon(R) CPU L5630 @ 2.13GHz
{quote}
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
{quote}
                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated AVRO-939:
-----------------------------

    Attachment: AVRO-939-1.patch
    
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418588#comment-13418588 ] 

Eli Reisman commented on AVRO-939:
----------------------------------

I'll try it out, for one. Thanks for the help! Guava docs indicate it is on 64bit machines that this code really shines.

                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419384#comment-13419384 ] 

Eli Reisman commented on AVRO-939:
----------------------------------

wait a minute, if you meant 32 bit kernel running 64 bit hotspot thats still no good for a real 64 bit benchmark, is it?
                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423319#comment-13423319 ] 

Eli Reisman commented on AVRO-939:
----------------------------------

FYI, this is copied directly from the Hadoop code (same filename etc.) so they definitely must have had this debate somewhere, they did not use the verbatim Guava version of this. I wonder what their numbers were like? I agree for a small win this might not be worth the trouble.

                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420772#comment-13420772 ] 

Todd Lipcon commented on AVRO-939:
----------------------------------

So, if I'm reading this correctly, the patch gives 20-30% speedup for 1000-byte and 100-byte, and ~10% slowdown for 10 byte?
                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417637#comment-13417637 ] 

Eli Reisman commented on AVRO-939:
----------------------------------

I'm going to try to upload a patch for this today

                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417986#comment-13417986 ] 

Scott Carey commented on AVRO-939:
----------------------------------

I see, it is using a few utility methods from Guava.  Most of these look easily replaceable with small private methods.  I think we'll want a bigger reason to pull in guava, Avro already runs into enough downstream dependency conflicts with other libraries.
                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417767#comment-13417767 ] 

Doug Cutting commented on AVRO-939:
-----------------------------------

Look at the version in HADOOP-7761.  It handles this without creating wrapper ByteBuffers.

Ideally we could just depend on hadoop-common and use the FastByteComparisons class there, but I don't think we should add a dependency on Hadoop 2.0.  So I think we need to instead copy FastByteComparisons.java into Avro.


                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-939) Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418706#comment-13418706 ] 

Eli Reisman commented on AVRO-939:
----------------------------------

got it. will try it now. thanks again!

                
> Java: optimize BinaryData#compareBytes() to use sun.misc.Unsafe when available
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-939
>                 URL: https://issues.apache.org/jira/browse/AVRO-939
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Doug Cutting
>         Attachments: AVRO-939-1.patch, AVRO-939-2.patch, AVRO-939-3.patch, AVRO-939-4.patch, AVRO-939.patch
>
>
> Google's Guava libraries include an optimized implementation of lexicographic byte comparison based on sun.misc.Unsafe that's ~4x faster than the normal Java implementation.
> http://hiroshiyamauchi.blogspot.com/2010/08/fast-unsigned-byte-lexicographical.html
> http://www.google.com/codesearch#UKMs0lhE9bg/trunk/src/com/google/common/primitives/UnsignedBytes.java&l=276
> We might similarly optimize BinaryData#compareBytes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira