You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "gene bradley (JIRA)" <ji...@apache.org> on 2016/08/05 20:12:20 UTC

[jira] [Commented] (HADOOP-12941) abort in Unsafe_GetLong when running IA64 HPUX 64bit mode

    [ https://issues.apache.org/jira/browse/HADOOP-12941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409994#comment-15409994 ] 

gene bradley commented on HADOOP-12941:
---------------------------------------

HI Colin,

How can I get this fixed.  Again it’s a  real simple fix.


Gene




> abort in Unsafe_GetLong when running IA64 HPUX 64bit mode 
> ----------------------------------------------------------
>
>                 Key: HADOOP-12941
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12941
>             Project: Hadoop Common
>          Issue Type: Bug
>         Environment: hpux IA64  running 64bit mode 
>            Reporter: gene bradley
>
> Now that we have a core to look at we can sorta see what is going on#14 0x9fffffffaf000dd0 in Java native_call_stub frame#15 0x9fffffffaf014470 in JNI frame: sun.misc.Unsafe::getLong (java.lang.Object, long) ->long#16 0x9fffffffaf0067a0 in interpreted frame: org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer::compareTo (byte[], int, int, byte[], int, int) ->int bci: 74#17 0x9fffffffaf0066e0 in interpreted frame: org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer::compareTo (java.lang.Object, int, int, java.lang.Object, int, int) ->int bci: 16#18 0x9fffffffaf006720 in interpreted frame: org.apache.hadoop.hbase.util.Bytes::compareTo (byte[], int, int, byte[], int, int) ->int bci: 11#19 0x9fffffffaf0066e0 in interpreted frame: org.apache.hadoop.hbase.KeyValue$KVComparator::compareRowKey (org.apache.hadoop.hbase.Cell, org.apache.hadoop.hbase.Cell) ->int bci: 36#20 0x9fffffffaf0066e0 in interpreted frame: org.apache.hadoop.hbase.KeyValue$KVComparator::compare (org.apache.hadoop.hbase.Cell, org.apache.hadoop.hbase.Cell) ->int bci: 3#21 0x9fffffffaf0066e0 in interpreted frame: org.apache.hadoop.hbase.KeyValue$KVComparator::compare (java.lang.Object, java.lang.Object) ->int bci: 9;; Line: 4000xc00000003ad84d30:0 <Unsafe_GetLong+0x130>:    (p1)  ld8              r45=[r34]0xc00000003ad84d30:1 <Unsafe_GetLong+0x131>:          adds             r34=16,r320xc00000003ad84d30:2 <Unsafe_GetLong+0x132>:          adds             ret0=8,r32;;0xc00000003ad84d40:0 <Unsafe_GetLong+0x140>:          add              ret1=r35,r45 <==== r35 is off0xc00000003ad84d40:1 <Unsafe_GetLong+0x141>:          ld8              r35=[r34],240xc00000003ad84d40:2 <Unsafe_GetLong+0x142>:          nop.i            0x00xc00000003ad84d50:0 <Unsafe_GetLong+0x150>:          ld8              r41=[ret0];;0xc00000003ad84d50:1 <Unsafe_GetLong+0x151>:          ld8.s            r49=[r34],-240xc00000003ad84d50:2 <Unsafe_GetLong+0x152>:          nop.i            0x00xc00000003ad84d60:0 <Unsafe_GetLong+0x160>:          ld8              r39=[ret1];; <=== abort0xc00000003ad84d60:1 <Unsafe_GetLong+0x161>:          ld8              ret0=[r35]0xc00000003ad84d60:2 <Unsafe_GetLong+0x162>:          nop.i            0x0;;0xc00000003ad84d70:0 <Unsafe_GetLong+0x170>:          cmp.ne.unc       p1=r0,ret0;;M,MI0xc00000003ad84d70:1 <Unsafe_GetLong+0x171>:    (p1)  mov              r48=r410xc00000003ad84d70:2 <Unsafe_GetLong+0x172>:    (p1)  chk.s.i          r49,Unsafe_GetLong+0x290(gdb) x /10i $pc-48*20x9fffffffaf000d70:           flushrs                                                            MMI0x9fffffffaf000d71:           mov              r44=r320x9fffffffaf000d72:           mov              r45=r330x9fffffffaf000d80:           mov              r46=r34                                           MMI0x9fffffffaf000d81:           mov              r47=r350x9fffffffaf000d82:           mov              r48=r360x9fffffffaf000d90:           mov              r49=r37                                           MMI0x9fffffffaf000d91:           mov              r50=r380x9fffffffaf000d92:           mov              r51=r39
> 0x9fffffffaf000da0:           adds             r14=0x270,r4                                      MMI(gdb) p /x $r35$9 = 0x22(gdb) x /x $ret10x9ffffffe1d0d2bda:     0x677a68676c78743a(gdb) x /x $r45+0x220x9ffffffe1d0d2bda:     0x677a68676c78743aSo here is the problem,  this is a 64bit JVM 0 : /opt/java8/bin/IA64W/java1 : -Djava.util.logging.config.file=/test28/gzh/tomcat/conf/logging.properties2 : -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager3 : -Dorg.apache.catalina.security.SecurityListener.UMASK=0224 : -server5 : -XX:PermSize=128m6 : -XX:MaxPermSize=256m7 : -Djava.endorsed.dirs=/test28/gzh/tomcat/endorsed8 : -classpath9 : /test28/gzh/tomcat/bin/bootstrap.jar:/test28/gzh/tomcat/bin/tomcat-juli.jar10 : -Dcatalina.base=/test28/gzh/tomcat11 : -Dcatalina.home=/test28/gzh/tomcat12 : -Djava.io.tmpdir=/test28/gzh/tomcat/temp13 : org.apache.catalina.startup.Bootstrap14 : startSince they are not passing and -Xmx values we are taking defaults which look at the system resources. So what is happening here is a 32 bit word aligned address is being used to index into a byte array (gdb) jo 0x9ffffffe1d0d2bb8_mark = 0x0000000000000001, _klass = 0x9fffffffa8c00768, instance of type [Blength of the array: 1180 0 0 102 0 0 0 8 0 70 103 122 104 103 108 120 116 58 70 83 78 95 50 48 49 53 49 48 50 50 44 65 44 49 52 52 53 52 55 57 57 51 51 57 53 56 46 52 56 54 55 50 48 51 49 99 57 97 101 52 57 101 97 101 49 100 56 49 51 53 51 99 99 97 97 54 98 56 100 46 4 105 110 102 111 115 101 113 110 117 109 68 117 114 105 110 103 79 112 101 110 0 0 1 80 -6 96 -95 -48 4 0 0 0 0 0 0 0 4This is the whole string gdb) x /2s 0x9ffffffe1d0d2bd80x9ffffffe1d0d2bd8:      ""0x9ffffffe1d0d2bd9:      "Fgzhglxt:FSN_20151022,A,1445479933958.48672031c9ae49eae1d81353ccaa6b8d.\004infoseqnumDuringOpen"To me this is a bug in the callee potentially in org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer::compareToWhy are they calling Unsafe_GetLong on a byte array,  there is no checking of alignment and I really think this is a bug on their part. As far as I know, GetLong expects 64 bit alignment I did find some other 64 bit users who saw this with the same stack trace as this customer
> https://issues.apache.org/jira/browse/PHOENIX-1438http://permalink.gmane.org/gmane.comp.java.hadoop.hbase.devel/39017
> the fix would go here by adding a test for ia64 
> looking at the code from a bug they are checking for if the box is sparc.  static Comparer<byte[]> getBestComparer() {
> +      if (System.getProperty("os.arch").equals("sparc")) {  <====
> +        if (LOG.isTraceEnabled()) {
> +          LOG.trace("Lexicographical comparer selected for "
> +              + "byte aligned system architecture");
> +        }
> +        return lexicographicalComparerJavaImpl();
> +      }
>        try {
>          Class<?> theClass = Class.forName(UNSAFE_COMPARER_NAME);so this is 'fixable' from a java class perspective.Hari said he will talk with his open source contact 
> This Hadoop bug report points to the same problem in the same code:
> https://issues.apache.org/jira/browse/HADOOP-11466
> In that case the symptom of the unaligned accesses was bad performance instead of a crash. This shows diffs for that fix:
> http://mail-archives.apache.org/mod_mbox/hadoop-common-commits/201501.mbox/%3Cb19d5f83ca7148b782e5b432817b6448@git.apache.org%3E
> Those diffs show that fix only avoids the bad code when running on "sparc". They really should have instead avoided that bad code for every architecture other than x86. They should not be assuming that that FastByteComparisons enhancement will work on other processors and actually improves performance. On processors that do allow unaligned accesses at much cost they are just creating bad performance that will be hard for anyone to ever find.
> For all IA64 customers this will be an issue when running 64 bit. The IA processor enforces alignment on instruction types



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org