You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2012/10/13 11:03:04 UTC

[jira] [Created] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Gopal V created HADOOP-8926:
-------------------------------

             Summary: hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
                 Key: HADOOP-8926
                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
             Project: Hadoop Common
          Issue Type: Improvement
          Components: util
    Affects Versions: 2.0.3-alpha
         Environment: Ubuntu 10.10 i386 
            Reporter: Gopal V
            Priority: Trivial


While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 

The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.

milli-seconds for 1Gig (16400 loop over a 64kb chunk) 

|| platform || original || cache-aware || improvement ||
| x86 | 3894 | 2304 | 40.83 |
| x86_64 | 2131 | 1826 | 14 | 

The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.

A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment

{code}
  0x40f1e345: mov    $0x184,%ecx
  0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
                                        ; - PureJavaCrc32::update@95 (line 61)
                                        ;   {oop('PureJavaCrc32')}
  0x40f1e350: mov    %ecx,0x2c(%esp)
{code}

Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Gopal V (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gopal V updated HADOOP-8926:
----------------------------

    Status: Open  (was: Patch Available)

Reworking for readability of code
                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Trivial
>              Labels: optimization
>         Attachments: crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated HADOOP-8926:
----------------------------------------

       Resolution: Fixed
    Fix Version/s: 2.0.3-alpha
                   3.0.0
           Status: Resolved  (was: Patch Available)

Thanks Gopal.  I put this into trunk and branch-2.
                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Trivial
>              Labels: optimization
>             Fix For: 3.0.0, 2.0.3-alpha
>
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476945#comment-13476945 ] 

Hadoop QA commented on HADOOP-8926:
-----------------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12549285/crc32-faster%2Breadable.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-common-project/hadoop-common.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1632//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1632//console

This message is automatically generated.
                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Trivial
>              Labels: optimization
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Gopal V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476916#comment-13476916 ] 

Gopal V commented on HADOOP-8926:
---------------------------------

On x86_64 on an ec2 m1.xl (after changes)

Performance Table (The unit is MB/sec)
|| Num Bytes ||    CRC32 || PureJavaCrc32 ||
|          1 |     9.799 |         72.921 |
|          2 |    18.850 |        177.113 |
|          4 |    42.687 |        214.704 |
|          8 |    70.552 |        318.484 |
|         16 |   111.875 |        416.191 |
|         32 |   153.779 |        496.209 |
|         64 |   190.493 |        544.428 |
|        128 |   215.851 |        564.414 |
|        256 |   232.110 |        590.515 |
|        512 |   240.359 |        581.974 |
|       1024 |   244.682 |        597.676 |
|       2048 |   246.642 |        599.621 |
|       4096 |   249.438 |        604.247 |
|       8192 |   249.247 |        605.547 |
|      16384 |   249.524 |        606.494 |
|      32768 |   249.508 |        602.449 |
|      65536 |   250.977 |        604.064 |
|     131072 |   249.678 |        597.944 |
|     262144 |   249.505 |        603.270 |
|     524288 |   250.805 |        602.656 |
|    1048576 |   250.900 |        602.949 |
|    2097152 |   250.137 |        601.563 |
|    4194304 |   249.406 |        602.058 |
|    8388608 |   249.937 |        598.310 |
|   16777216 |   249.892 |        592.417 |

                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Trivial
>              Labels: optimization
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Gopal V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478692#comment-13478692 ] 

Gopal V commented on HADOOP-8926:
---------------------------------

Thanks Nicholas, this probably has future for improvement if we had ByteBuffers (direct) instead of byte[] arrays.

I'm tracing the code-path from the network I/O to see if a nio fix can possibly get me bytebuffers direct.

Meanwhile, If this is significant enough for a back-port, I can open a new ticket & port this patch to branch-1.1 (or is branch-1 the "live" one?).
                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: optimization
>             Fix For: 3.0.0, 2.0.3-alpha
>
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484078#comment-13484078 ] 

Hudson commented on HADOOP-8926:
--------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #415 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/415/])
    svn merge -c 1399005 FIXES: HADOOP-8926. hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data (Gopal V via bobby) (Revision 1401803)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401803
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/PureJavaCrc32.java
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/PureJavaCrc32C.java
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestPureJavaCrc32.java

                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: optimization
>             Fix For: 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Gopal V (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gopal V updated HADOOP-8926:
----------------------------

    Status: Patch Available  (was: Open)

Updated for readability and fewer instructions in the inner loop.
                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Trivial
>              Labels: optimization
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Gopal V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477318#comment-13477318 ] 

Gopal V commented on HADOOP-8926:
---------------------------------

The patches are on Hadoop 2.x (trunk) for now - will move it to branch-1.1 once it is baked in.
                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Trivial
>              Labels: optimization
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475581#comment-13475581 ] 

Hadoop QA commented on HADOOP-8926:
-----------------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12549014/crc32-faster%2Btest.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-common-project/hadoop-common.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1623//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1623//console

This message is automatically generated.
                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Priority: Trivial
>              Labels: optimization
>         Attachments: crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Gopal V (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gopal V updated HADOOP-8926:
----------------------------

    Attachment: pure-crc32-cache-hit.patch

main/ patch
                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Priority: Trivial
>         Attachments: pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Gopal V (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gopal V updated HADOOP-8926:
----------------------------

    Attachment: crc32-faster+readable.patch

Rewrite the core loop for readability & turn all loop locals into final variables.

Modify the small loop into a switch statement (java tableswitch instruction)
                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Trivial
>              Labels: optimization
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-8926:
-------------------------------------------

    Priority: Major  (was: Trivial)
    
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: optimization
>             Fix For: 3.0.0, 2.0.3-alpha
>
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477863#comment-13477863 ] 

Hudson commented on HADOOP-8926:
--------------------------------

Integrated in Hadoop-Mapreduce-trunk #1228 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1228/])
    HADOOP-8926. hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data (Gopal V via bobby) (Revision 1399005)

     Result = FAILURE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1399005
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/PureJavaCrc32.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/PureJavaCrc32C.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestPureJavaCrc32.java

                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Trivial
>              Labels: optimization
>             Fix For: 3.0.0, 2.0.3-alpha
>
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477835#comment-13477835 ] 

Hudson commented on HADOOP-8926:
--------------------------------

Integrated in Hadoop-Hdfs-trunk #1198 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1198/])
    HADOOP-8926. hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data (Gopal V via bobby) (Revision 1399005)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1399005
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/PureJavaCrc32.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/PureJavaCrc32C.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestPureJavaCrc32.java

                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Trivial
>              Labels: optimization
>             Fix For: 3.0.0, 2.0.3-alpha
>
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Gopal V (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gopal V updated HADOOP-8926:
----------------------------

    Attachment: crc32-faster+test.patch

patch for the polynomial table gen, crc32 and crc32c
                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Priority: Trivial
>         Attachments: crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477393#comment-13477393 ] 

Hudson commented on HADOOP-8926:
--------------------------------

Integrated in Hadoop-trunk-Commit #2875 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/2875/])
    HADOOP-8926. hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data (Gopal V via bobby) (Revision 1399005)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1399005
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/PureJavaCrc32.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/PureJavaCrc32C.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestPureJavaCrc32.java

                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Trivial
>              Labels: optimization
>             Fix For: 3.0.0, 2.0.3-alpha
>
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Gopal V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476795#comment-13476795 ] 

Gopal V commented on HADOOP-8926:
---------------------------------

Sure, I will fix the readability of the patch by adding T8_[0-7]_start and re-run it through the JIT to make sure the class variables are getting inlined in the math.

If that still results in a register splill, I will move them out of the loop as local final variables & re-submit a patch today.
                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Trivial
>              Labels: optimization
>         Attachments: crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Gopal V (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gopal V updated HADOOP-8926:
----------------------------

          Labels: optimization  (was: )
    Release Note: Speed up Crc32 by improving the cache hit-ratio of hadoop.util.PureJavaCrc32   (was: Improve cache hit-ratio of hadoop.util.PureJavaCrc32 )
          Status: Patch Available  (was: Open)

The peak throughput (on my machine) went from 283 MB/s to 323 MB/s (according to TestPureJavaCrc32$PerformanceTest)
                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Priority: Trivial
>              Labels: optimization
>         Attachments: crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478329#comment-13478329 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-8926:
------------------------------------------------

Gopal, Amazing works!  There are 10% or more performance improvement from your patch.

java.version = 1.6.0_35
java.runtime.name = Java(TM) SE Runtime Environment
java.runtime.version = 1.6.0_35-b10-428-11M3811
java.vm.version = 20.10-b01-428
java.vm.vendor = Apple Inc.
java.vm.name = Java HotSpot(TM) 64-Bit Server VM
java.vm.specification.version = 1.0
java.specification.version = 1.6
os.arch = x86_64
os.name = Mac OS X
os.version = 10.7.4

Performance Table (The unit is MB/sec)
|| Num Bytes ||    CRC32 || PureJavaCrc32_8926 || PureJavaCrc32 ||
|          1 |    11.896 |             129.832 |        164.845 |
|          2 |    24.097 |             192.742 |        210.266 |
|          4 |    46.274 |             222.059 |        233.902 |
|          8 |    82.332 |             488.716 |        438.514 |
|         16 |   131.682 |             587.312 |        602.784 |
|         32 |   187.265 |             796.510 |        760.628 |
|         64 |   237.088 |             938.650 |        891.017 |
|        128 |   264.795 |            1049.774 |        913.666 |
|        256 |   291.785 |            1095.084 |        987.380 |
|        512 |   298.590 |            1126.067 |       1002.899 |
|       1024 |   305.349 |            1152.375 |       1040.211 |
|       2048 |   309.342 |            1119.713 |       1033.258 |
|       4096 |   309.162 |            1170.767 |       1047.746 |
|       8192 |   321.775 |            1189.724 |       1053.065 |
|      16384 |   320.457 |            1181.128 |       1060.138 |
|      32768 |   324.524 |            1169.965 |       1050.610 |
|      65536 |   322.380 |            1160.471 |       1053.854 |
|     131072 |   315.983 |            1138.223 |       1009.193 |
|     262144 |   324.293 |            1190.476 |       1020.782 |
|     524288 |   316.003 |            1136.979 |       1015.389 |
|    1048576 |   321.715 |            1081.465 |       1033.750 |
|    2097152 |   318.330 |            1189.680 |       1072.054 |
|    4194304 |   316.710 |            1138.496 |       1024.352 |
|    8388608 |   315.701 |            1124.909 |       1030.505 |
|   16777216 |   325.575 |            1154.724 |       1031.285 |

                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Trivial
>              Labels: optimization
>             Fix For: 3.0.0, 2.0.3-alpha
>
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Gopal V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483346#comment-13483346 ] 

Gopal V commented on HADOOP-8926:
---------------------------------

Opened HADOOP-8971 as a backport of this to branch-1
                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: optimization
>             Fix For: 3.0.0, 2.0.3-alpha
>
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated HADOOP-8926:
----------------------------------------

    Fix Version/s: 0.23.5

I pulled this into branch-0.23 too.
                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: optimization
>             Fix For: 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476230#comment-13476230 ] 

Robert Joseph Evans commented on HADOOP-8926:
---------------------------------------------

The change looks like a clean refactoring to me.  The hard coded 256*X in several places in the code is not too hard to understand, but I think it would be cleaner if we could create a few static final values like

{code}
private static final int T8_7_start = 256*7;
...
private static final int T8_0_start = 0; //256*0
...
localCrc = (T[T8_7_start + (c0 & 0xff)] ^ T[T8_6_start + (c1 & 0xff)])
   ^ (T[T8_5_start + (c2 & 0xff)] ^ T[T8_4_start + (c3 & 0xff)]);
...
{code}

My only other comment is that we may now need to update the comment at the top of PureJavaCrc32.java.  ~10x to 1.8x as fast does not seem accurate any more :)



                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Trivial
>              Labels: optimization
>         Attachments: crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479245#comment-13479245 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-8926:
------------------------------------------------

Sure, let's backport this to branch-1.
                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: optimization
>             Fix For: 3.0.0, 2.0.3-alpha
>
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas reassigned HADOOP-8926:
---------------------------------------

    Assignee: Gopal V
    
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Trivial
>              Labels: optimization
>         Attachments: crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477766#comment-13477766 ] 

Hudson commented on HADOOP-8926:
--------------------------------

Integrated in Hadoop-Yarn-trunk #6 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/6/])
    HADOOP-8926. hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data (Gopal V via bobby) (Revision 1399005)

     Result = FAILURE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1399005
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/PureJavaCrc32.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/PureJavaCrc32C.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestPureJavaCrc32.java

                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Trivial
>              Labels: optimization
>             Fix For: 3.0.0, 2.0.3-alpha
>
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8926) hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477039#comment-13477039 ] 

Robert Joseph Evans commented on HADOOP-8926:
---------------------------------------------

The changes look good to me +1.  Gopal, what versions of Hadoop are you targeting with this change?
                
> hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
> ----------------------------------------------------------------
>
>                 Key: HADOOP-8926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8926
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.0.3-alpha
>         Environment: Ubuntu 10.10 i386 
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Trivial
>              Labels: optimization
>         Attachments: crc32-faster+readable.patch, crc32-faster+test.patch, pure-crc32-cache-hit.patch
>
>
> While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update(). 
> The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
> milli-seconds for 1Gig (16400 loop over a 64kb chunk) 
> || platform || original || cache-aware || improvement ||
> | x86 | 3894 | 2304 | 40.83 |
> | x86_64 | 2131 | 1826 | 14 | 
> The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
> A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
> {code}
>   0x40f1e345: mov    $0x184,%ecx
>   0x40f1e34a: mov    0x4415b560(%ecx),%ecx  ;*getstatic T8_5
>                                         ; - PureJavaCrc32::update@95 (line 61)
>                                         ;   {oop('PureJavaCrc32')}
>   0x40f1e350: mov    %ecx,0x2c(%esp)
> {code}
> Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira