You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Regis <xu...@gmail.com> on 2009/12/11 04:36:26 UTC

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

On 2009-12-11 4:30, Boris (JIRA) wrote:
> possible data-reordering in some hashCode-methods (e.g. String or URL)
> ----------------------------------------------------------------------
>
>                   Key: HARMONY-6404
>                   URL: https://issues.apache.org/jira/browse/HARMONY-6404
>               Project: Harmony
>            Issue Type: Bug
>              Reporter: Boris
>
>
> First I have to say that I don't know if the Java Memory Model is relevant for Harmony, so please reject the bug if this is not the case.
>
> The current implementation of several of Harmony's classes that try to cache hashCode in an optimized way are affected by a reordering-bug that can occur because of the way the hashCode is cached and retrieved.
> The problem is that the method contains no memory barriers, so that the VM may reorder the data-reads at its own will. With an unlucky reordering the method could return 0 although the actual hashCode is NOT 0.
>
> Or to speak in code: This actual code:
>          if (hashCode == 0) {
>              // calculate hash
>              hashCode = hash;
>          }
>          return hashCode;
> could be reordered to something like
>          return hashCode;
>          if (hashCode == 0) {
>              // calculate hash
>              hashCode = hash;
>          }
> (which is of course no valid Java-code, but that is what the VM might do internally).
>
> One common solution is to assign the field but then return the temp-value (in this case the variable "hash") and NOT the field itself. So that the read can not be reordered. (This way might be even faster because it may be one memory-read less)
> Another solution would be to make hashCode volatile or to not cache the hashCode, but these have a performance cost.
>
> I'll attach a patch I wrote. I could not get harmony to compile, so these are untested.
> BTW: This fix also fixes a more likely bug in BigInteger and BigDecimal: Callers of hashCode might have seen half-constructed hashCodes there.
>
> (This bug was found via the entry LANG-481 see there for some more details)
>

It's interesting topic. In my understand, re-order would be a issue only if 
hashCode is object reference. Because when "retrun hasCode" the value of 
"hasCode" pushed to stack, if hasCode is primitive type I don't think vm would 
to such re-order - the return value would be always 0, that's incorrect.

-- 
Best Regards,
Regis.

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Vijay Menon <vs...@google.com>.
BTW, as Jeremy says in the post Egor linked, the best fix is to read
this.hashCode only once:

  int hash = this.hashCode;
  if (hash == 0) {
    // calculate hash without accessing this.hashCode
    hashCode = hash;
  }
  return hash;

In this, there is exactly one read from hashCode.  The compiler is not
allowed to inject another one to, e.g., reread hash.

- Vijay

On Fri, Dec 11, 2009 at 7:50 AM, Vijay Menon <vs...@google.com> wrote:

> Egor,
>
> Thanks for the link!  I think I see the original poster's issue & I retract
> my statement.  The original example is basically identical to:
>
> tmp1 = this.hashCode
> if (tmp1 == 0) {
>   // calculate hash
>   this.hashCode = hash
> }
> tmp2 = this.hashCode
> return tmp2
>
>
> (You can think of tmp1 and tmp2 as internal registers introduced by the
> compiler.)
>
> The compiler is allowed to reorder the above to:
>
> tmp2 = this.hashCode
> tmp1 = this.hashCode
> if (tmp1 == 0) {
>   // calculate hash without accessing this.hashCode
>   this.hashCode = hash
>   tmp2 = this.hashCode
> }
> return tmp2
>
>
> This is just fine in a single-threaded setting.  If tmp1 is 0, tmp2 is
> always assigned the new hash.  If tmp1 is non-zero, then - in the
> single-threaded code - the earlier read of tmp2 would have been non-zero as
> well.
>
> In a multithreaded setting, you can get the following:
>
> tmp2 = this.hashCode  // reads zero
> // <--- another thread writes 42 to this.hashCode
> tmp1 = this.hashCode // reads 42, if clause not executed
> if (tmp1 == 0) {
>   // calculate hash without accessing this.hashCode
>   this.hashCode = hash
>   tmp2 = this.hashCode
> }
> return tmp2 // returns zero!
>
>
> Now, you get zero.  Basically, a compiler is allowed to reorder reads if it
> sees no (potentially aliasing) intervening writes.
>
> Egor: are you sure that DRLVM doesn't reorder reads to the heap?  StarJIT
> did do CSE of reads to the heap (do_memopt) - which is effectively the same
> thing.  Did DRLVM remove that from Jitrino?
>
> Vijay
>
> On Fri, Dec 11, 2009 at 6:32 AM, Egor Pasko <eg...@gmail.com> wrote:
>
>> On the 0x684 day of Apache Harmony Tim Ellison wrote:
>> > On 11/Dec/2009 04:09, Vijay Menon wrote:
>> >> Perhaps I'm missing some context, but I don't see any problem.  I don't
>> >> believe that this:
>> >>
>> >>         if (hashCode == 0) {
>> >>             // calculate hash
>> >>             hashCode = hash;
>> >>         }
>> >>         return hashCode;
>> >>
>> >> can ever return 0 (assuming hash is non-zero), regardless of memory
>> fences.
>> >>  The JMM won't allow visible reordering in a single threaded program.
>> >
>> > I agree.  In the multi-threaded case there can be a data race on the
>> > hashCode, with the effect that the same hashCode value is unnecessarily,
>> > but harmlessly, recalculated.
>>
>> Vijay, Tim, you are not 100% correct here.
>>
>> 1. there should be an extra restriction that the part "calculate hash"
>>   does not touch the hashCode field. Without that restriction more
>>   trivial races can happen as discussed in LANG-481.
>>
>> So we should assume this code:
>>
>> if (this.hashCode == 0) {
>>  int hash;
>>  if (this.hashCode == 0) {
>>    // Calculate using 'hash' only, not this.hashCode.
>>    this.hashCode = hash;
>>  }
>>  return this.hashCode;
>> }
>>
>> where 'this.*' is always a memory reference while 'hash' is a thread
>> private var.
>>
>> 2. DRLVM JIT indeed does not privatize field access to threads. It
>>   always loads fields from memory in original order. Hence this
>>   potential bug does not affect DRLVM .. now. But potentially it can
>>   start optimizing things this way because current behavior prevents
>>   a bunch of high level optimizations.
>>
>> 3. Jeremy Manson, being an expert in Java Memory Model claims [1] that
>>   such reordering is allowed theoretically. I.e. like this:
>>
>> int hash = this.hashCode;
>> if (this.hashCode == 0) {
>>  hash = this.hashCode = // calculate hash
>> }
>> return hash;
>>
>> This is a correct single-threaded code. What happened here is a
>> lengthy thread privatization of this.hashCode (again, does not happen
>> in DRLVM). That is incorrect in multithreaded environment and needs
>> extra synchronization/volatile/etc.
>>
>> 4. I do not know why a JIT would want to do that, i.e. two sequential
>>   reads from the same memory location. Sounds like a bit synthetic
>> example.
>>
>>
>> [1]
>> http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
>>
>> --
>> Egor Pasko
>>
>>
>

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Egor Pasko <eg...@gmail.com>.
On the 0x684 day of Apache Harmony Vijay Menon wrote:
> Egor,
>
> Thanks for the link!  I think I see the original poster's issue & I retract
> my statement.  The original example is basically identical to:
>
> tmp1 = this.hashCode
> if (tmp1 == 0) {
>   // calculate hash
>   this.hashCode = hash
> }
> tmp2 = this.hashCode
> return tmp2
>
>
> (You can think of tmp1 and tmp2 as internal registers introduced by the
> compiler.)
>
> The compiler is allowed to reorder the above to:
>
> tmp2 = this.hashCode
> tmp1 = this.hashCode
> if (tmp1 == 0) {
>   // calculate hash without accessing this.hashCode
>   this.hashCode = hash
>   tmp2 = this.hashCode
> }
> return tmp2
>
>
> This is just fine in a single-threaded setting.  If tmp1 is 0, tmp2 is
> always assigned the new hash.  If tmp1 is non-zero, then - in the
> single-threaded code - the earlier read of tmp2 would have been non-zero as
> well.
>
> In a multithreaded setting, you can get the following:
>
> tmp2 = this.hashCode  // reads zero
> // <--- another thread writes 42 to this.hashCode
> tmp1 = this.hashCode // reads 42, if clause not executed
> if (tmp1 == 0) {
>   // calculate hash without accessing this.hashCode
>   this.hashCode = hash
>   tmp2 = this.hashCode
> }
> return tmp2 // returns zero!
>
>
> Now, you get zero.  Basically, a compiler is allowed to reorder reads if it
> sees no (potentially aliasing) intervening writes.
>
> Egor: are you sure that DRLVM doesn't reorder reads to the heap?  StarJIT
> did do CSE of reads to the heap (do_memopt) - which is effectively the same
> thing.  Did DRLVM remove that from Jitrino?

I was not 100% correct ;) .. and should recheck this. The memopt
limitation comes in presence of virtual calls that we don't have
here. The String.hashCode() is officially dangerous again.

-- 
Egor Pasko


Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Vijay Menon <vs...@google.com>.
Egor,

Thanks for the link!  I think I see the original poster's issue & I retract
my statement.  The original example is basically identical to:

tmp1 = this.hashCode
if (tmp1 == 0) {
  // calculate hash
  this.hashCode = hash
}
tmp2 = this.hashCode
return tmp2


(You can think of tmp1 and tmp2 as internal registers introduced by the
compiler.)

The compiler is allowed to reorder the above to:

tmp2 = this.hashCode
tmp1 = this.hashCode
if (tmp1 == 0) {
  // calculate hash without accessing this.hashCode
  this.hashCode = hash
  tmp2 = this.hashCode
}
return tmp2


This is just fine in a single-threaded setting.  If tmp1 is 0, tmp2 is
always assigned the new hash.  If tmp1 is non-zero, then - in the
single-threaded code - the earlier read of tmp2 would have been non-zero as
well.

In a multithreaded setting, you can get the following:

tmp2 = this.hashCode  // reads zero
// <--- another thread writes 42 to this.hashCode
tmp1 = this.hashCode // reads 42, if clause not executed
if (tmp1 == 0) {
  // calculate hash without accessing this.hashCode
  this.hashCode = hash
  tmp2 = this.hashCode
}
return tmp2 // returns zero!


Now, you get zero.  Basically, a compiler is allowed to reorder reads if it
sees no (potentially aliasing) intervening writes.

Egor: are you sure that DRLVM doesn't reorder reads to the heap?  StarJIT
did do CSE of reads to the heap (do_memopt) - which is effectively the same
thing.  Did DRLVM remove that from Jitrino?

Vijay

On Fri, Dec 11, 2009 at 6:32 AM, Egor Pasko <eg...@gmail.com> wrote:

> On the 0x684 day of Apache Harmony Tim Ellison wrote:
> > On 11/Dec/2009 04:09, Vijay Menon wrote:
> >> Perhaps I'm missing some context, but I don't see any problem.  I don't
> >> believe that this:
> >>
> >>         if (hashCode == 0) {
> >>             // calculate hash
> >>             hashCode = hash;
> >>         }
> >>         return hashCode;
> >>
> >> can ever return 0 (assuming hash is non-zero), regardless of memory
> fences.
> >>  The JMM won't allow visible reordering in a single threaded program.
> >
> > I agree.  In the multi-threaded case there can be a data race on the
> > hashCode, with the effect that the same hashCode value is unnecessarily,
> > but harmlessly, recalculated.
>
> Vijay, Tim, you are not 100% correct here.
>
> 1. there should be an extra restriction that the part "calculate hash"
>   does not touch the hashCode field. Without that restriction more
>   trivial races can happen as discussed in LANG-481.
>
> So we should assume this code:
>
> if (this.hashCode == 0) {
>  int hash;
>  if (this.hashCode == 0) {
>    // Calculate using 'hash' only, not this.hashCode.
>    this.hashCode = hash;
>  }
>  return this.hashCode;
> }
>
> where 'this.*' is always a memory reference while 'hash' is a thread
> private var.
>
> 2. DRLVM JIT indeed does not privatize field access to threads. It
>   always loads fields from memory in original order. Hence this
>   potential bug does not affect DRLVM .. now. But potentially it can
>   start optimizing things this way because current behavior prevents
>   a bunch of high level optimizations.
>
> 3. Jeremy Manson, being an expert in Java Memory Model claims [1] that
>   such reordering is allowed theoretically. I.e. like this:
>
> int hash = this.hashCode;
> if (this.hashCode == 0) {
>  hash = this.hashCode = // calculate hash
> }
> return hash;
>
> This is a correct single-threaded code. What happened here is a
> lengthy thread privatization of this.hashCode (again, does not happen
> in DRLVM). That is incorrect in multithreaded environment and needs
> extra synchronization/volatile/etc.
>
> 4. I do not know why a JIT would want to do that, i.e. two sequential
>   reads from the same memory location. Sounds like a bit synthetic example.
>
>
> [1]
> http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
>
> --
> Egor Pasko
>
>

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Egor Pasko <eg...@gmail.com>.
On the 0x686 day of Apache Harmony Nathan Beyer wrote:
> On Sat, Dec 12, 2009 at 11:04 AM, sebb <se...@gmail.com> wrote:
>> On 12/12/2009, Vijay Menon <vs...@google.com> wrote:
>>> On Sat, Dec 12, 2009 at 7:34 AM, sebb <se...@gmail.com> wrote:
>>>
>>>  > On 12/12/2009, Nathan Beyer <nd...@apache.org> wrote:
>>>  > > On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <t....@gmail.com>
>>>  > wrote:
>>>  > >  > On 11/Dec/2009 14:32, Egor Pasko wrote:
>>>  > >  >> On the 0x684 day of Apache Harmony Tim Ellison wrote:
>>>  > >  >>> On 11/Dec/2009 04:09, Vijay Menon wrote:
>>>  > >  >>>> Perhaps I'm missing some context, but I don't see any problem.  I
>>>  > don't
>>>  > >  >>>> believe that this:
>>>  > >  >>>>
>>>  > >  >>>>         if (hashCode == 0) {
>>>  > >  >>>>             // calculate hash
>>>  > >  >>>>             hashCode = hash;
>>>  > >  >>>>         }
>>>  > >  >>>>         return hashCode;
>>>  > >  >>>>
>>>  > >  >>>> can ever return 0 (assuming hash is non-zero), regardless of memory
>>>  > fences.
>>>  > >  >>>>  The JMM won't allow visible reordering in a single threaded
>>>  > program.
>>>  > >  >>> I agree.  In the multi-threaded case there can be a data race on the
>>>  > >  >>> hashCode, with the effect that the same hashCode value is
>>>  > unnecessarily,
>>>  > >  >>> but harmlessly, recalculated.
>>>  > >  >>
>>>  > >  >> Vijay, Tim, you are not 100% correct here.
>>>  > >  >>
>>>  > >  >> 1. there should be an extra restriction that the part "calculate
>>>  > hash"
>>>  > >  >>    does not touch the hashCode field. Without that restriction more
>>>  > >  >>    trivial races can happen as discussed in LANG-481.
>>>  > >  >>
>>>  > >  >> So we should assume this code:
>>>  > >  >>
>>>  > >  >> if (this.hashCode == 0) {
>>>  > >  >>   int hash;
>>>  > >  >>   if (this.hashCode == 0) {
>>>  > >  >>     // Calculate using 'hash' only, not this.hashCode.
>>>  > >  >>     this.hashCode = hash;
>>>  > >  >>   }
>>>  > >  >>   return this.hashCode;
>>>  > >  >> }
>>>  > >  >
>>>  > >  > Yes, I guess I figured that was assumed :-)
>>>  > >  >
>>>  > >  > Of course, there are lots of things you could put into the
>>>  > >  > "// Calculate..." section that would be unsafe.  We should stick with
>>>  > >  > showing the non-abbreviated implementation to avoid ambiguity:
>>>  > >  >
>>>  > >  >    public int hashCode() {
>>>  > >  >        if (hashCode == 0) {
>>>  > >  >            if (count == 0) {
>>>  > >  >                return 0;
>>>  > >  >            }
>>>  > >  >            int hash = 0;
>>>  > >  >            for (int i = offset; i < count + offset; i++) {
>>>  > >  >                hash = value[i] + ((hash << 5) - hash);
>>>  > >  >            }
>>>  > >  >            hashCode = hash;
>>>  > >  >        }
>>>  > >  >        return hashCode;
>>>  > >  >    }
>>>  > >  >
>>>  > >
>>>  > >
>>>  > > I think I understand the concern, after some additional reading. The
>>>  > >  issue seems to be that 'hashCode' is read twice and the field is not
>>>  > >  protected by any memory barriers (synchronized, volatile, etc). As
>>>  > >  such, it would be okay for the second read to be done using a cached
>>>  > >  value, which means that both reads could return 0 in the same thread
>>>  > >  of execution. Another way to look at it is that the write to
>>>  > >  'hashCode' may or may not affect subsequent reads of 'hashCode'. This
>>>  > >  is how I understand it.
>>>  > >
>>>  > >  Will that happen in practice? I have no idea. It does seem possible.
>>>  >
>>>  > The Java MM guarantees that a single thread behaves as if the code is
>>>  > processed sequentially.
>>>  > So if the thread writes non-zero to this.hashCode it cannot then
>>>  > return zero for the value of this.hashCode if no other threads
>>>  > intervene. The thread cannot ignore updates to values it has itself
>>>  > cached!
>>>  >
>>>  > If another thread writes to this.hashCode concurrently, then this
>>>  > thread may or may not see the value stored by that thread. In this
>>>  > case, it's not a problem, as another thread can only write a fixed
>>>  > value. So the worst that can happen is that this.hashCode is written
>>>  > more than once, and the current thread may fetch the value written by
>>>  > another thread. But this is the same value it wrote anyway.
>>>  >
>>>
>>>
>>> In a multithreaded setting, this code *can* break and return 0 if hashCode
>>>  is read twice.  This is not just a performance optimization - it is a
>>>  correctness issue.  The compiler / runtime / hardware is allowed to reorder
>>>  read operations.  The following racy execution is allowable under the JMM:
>>>
>>>  1. Thread 1 reads 0 from hashCode and stores 0 into a local (t1).
>>>  2. Thread 2 write 42 into hashCode.
>>>  3. Thread 1 reads 42 from hashCode and stores 42 into a local (t2).
>>>  4. Thread 1 compares t2 (42) with 0 and does not execute the if clause.
>>>  5. Thread 1 returns t1 (0).
>>>
>>
>> But why would Thread 1 read hashCode twice before the comparison?
>>
>> Seems to me that would break the "as if serial" guarantee for a single thread.
>> In the code sequence, the comparison is before the return, and
>> therefore "happens-before" the return. I.e. step 3 "happens-before"
>> step 1+5.
>>
>> I'm not saying Harmony should keep the current code - the suggested
>> temp variable version seems better anyway - just trying to understand
>> what (if anything) is currently broken.
>
> I'm still open to counter arguments as it still seems weird. I keep
> focusing on the bit that 'this.hashCode' is referenced twice as a
> 'read' - "if (hashCode == 0)" and "return hashCode". Since 'hashCode'
> isn't final, volatile or in a syncrhonized region, the read into the
> stack could be cached. As I understand it, it's not the reorder of the
> Java code, it's the reorder of the generated code that just reads the
> value from the heap to the stack.

Nathan,

you are right, the value can be cached and if id does, correctness
does not suffer. In fact the whole case of two consequent reads is
rather synthetic: it is just too easy for JIT to eliminate this double
memory load of the same memory location. So, most likely JIT would not
let this reordering happen.

Anyway, it is legal for JIT to perform such optimization
(i.e. privatize some fields to the stack, not privatize others,
reorder loads, add stores in a correct single-threaded way) in case
there are no synchronization, volatile memory locations, virtual
calls, etc.

> I'm basing this off of the 'racy single-check' idiom that's mentioned
> in Effective Java. I'd like to get a complete answer to this though.
>
> -Nathan

-- 
Egor Pasko


Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Nathan Beyer <nd...@apache.org>.
On Sat, Dec 12, 2009 at 11:04 AM, sebb <se...@gmail.com> wrote:
> On 12/12/2009, Vijay Menon <vs...@google.com> wrote:
>> On Sat, Dec 12, 2009 at 7:34 AM, sebb <se...@gmail.com> wrote:
>>
>>  > On 12/12/2009, Nathan Beyer <nd...@apache.org> wrote:
>>  > > On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <t....@gmail.com>
>>  > wrote:
>>  > >  > On 11/Dec/2009 14:32, Egor Pasko wrote:
>>  > >  >> On the 0x684 day of Apache Harmony Tim Ellison wrote:
>>  > >  >>> On 11/Dec/2009 04:09, Vijay Menon wrote:
>>  > >  >>>> Perhaps I'm missing some context, but I don't see any problem.  I
>>  > don't
>>  > >  >>>> believe that this:
>>  > >  >>>>
>>  > >  >>>>         if (hashCode == 0) {
>>  > >  >>>>             // calculate hash
>>  > >  >>>>             hashCode = hash;
>>  > >  >>>>         }
>>  > >  >>>>         return hashCode;
>>  > >  >>>>
>>  > >  >>>> can ever return 0 (assuming hash is non-zero), regardless of memory
>>  > fences.
>>  > >  >>>>  The JMM won't allow visible reordering in a single threaded
>>  > program.
>>  > >  >>> I agree.  In the multi-threaded case there can be a data race on the
>>  > >  >>> hashCode, with the effect that the same hashCode value is
>>  > unnecessarily,
>>  > >  >>> but harmlessly, recalculated.
>>  > >  >>
>>  > >  >> Vijay, Tim, you are not 100% correct here.
>>  > >  >>
>>  > >  >> 1. there should be an extra restriction that the part "calculate
>>  > hash"
>>  > >  >>    does not touch the hashCode field. Without that restriction more
>>  > >  >>    trivial races can happen as discussed in LANG-481.
>>  > >  >>
>>  > >  >> So we should assume this code:
>>  > >  >>
>>  > >  >> if (this.hashCode == 0) {
>>  > >  >>   int hash;
>>  > >  >>   if (this.hashCode == 0) {
>>  > >  >>     // Calculate using 'hash' only, not this.hashCode.
>>  > >  >>     this.hashCode = hash;
>>  > >  >>   }
>>  > >  >>   return this.hashCode;
>>  > >  >> }
>>  > >  >
>>  > >  > Yes, I guess I figured that was assumed :-)
>>  > >  >
>>  > >  > Of course, there are lots of things you could put into the
>>  > >  > "// Calculate..." section that would be unsafe.  We should stick with
>>  > >  > showing the non-abbreviated implementation to avoid ambiguity:
>>  > >  >
>>  > >  >    public int hashCode() {
>>  > >  >        if (hashCode == 0) {
>>  > >  >            if (count == 0) {
>>  > >  >                return 0;
>>  > >  >            }
>>  > >  >            int hash = 0;
>>  > >  >            for (int i = offset; i < count + offset; i++) {
>>  > >  >                hash = value[i] + ((hash << 5) - hash);
>>  > >  >            }
>>  > >  >            hashCode = hash;
>>  > >  >        }
>>  > >  >        return hashCode;
>>  > >  >    }
>>  > >  >
>>  > >
>>  > >
>>  > > I think I understand the concern, after some additional reading. The
>>  > >  issue seems to be that 'hashCode' is read twice and the field is not
>>  > >  protected by any memory barriers (synchronized, volatile, etc). As
>>  > >  such, it would be okay for the second read to be done using a cached
>>  > >  value, which means that both reads could return 0 in the same thread
>>  > >  of execution. Another way to look at it is that the write to
>>  > >  'hashCode' may or may not affect subsequent reads of 'hashCode'. This
>>  > >  is how I understand it.
>>  > >
>>  > >  Will that happen in practice? I have no idea. It does seem possible.
>>  >
>>  > The Java MM guarantees that a single thread behaves as if the code is
>>  > processed sequentially.
>>  > So if the thread writes non-zero to this.hashCode it cannot then
>>  > return zero for the value of this.hashCode if no other threads
>>  > intervene. The thread cannot ignore updates to values it has itself
>>  > cached!
>>  >
>>  > If another thread writes to this.hashCode concurrently, then this
>>  > thread may or may not see the value stored by that thread. In this
>>  > case, it's not a problem, as another thread can only write a fixed
>>  > value. So the worst that can happen is that this.hashCode is written
>>  > more than once, and the current thread may fetch the value written by
>>  > another thread. But this is the same value it wrote anyway.
>>  >
>>
>>
>> In a multithreaded setting, this code *can* break and return 0 if hashCode
>>  is read twice.  This is not just a performance optimization - it is a
>>  correctness issue.  The compiler / runtime / hardware is allowed to reorder
>>  read operations.  The following racy execution is allowable under the JMM:
>>
>>  1. Thread 1 reads 0 from hashCode and stores 0 into a local (t1).
>>  2. Thread 2 write 42 into hashCode.
>>  3. Thread 1 reads 42 from hashCode and stores 42 into a local (t2).
>>  4. Thread 1 compares t2 (42) with 0 and does not execute the if clause.
>>  5. Thread 1 returns t1 (0).
>>
>
> But why would Thread 1 read hashCode twice before the comparison?
>
> Seems to me that would break the "as if serial" guarantee for a single thread.
> In the code sequence, the comparison is before the return, and
> therefore "happens-before" the return. I.e. step 3 "happens-before"
> step 1+5.
>
> I'm not saying Harmony should keep the current code - the suggested
> temp variable version seems better anyway - just trying to understand
> what (if anything) is currently broken.

I'm still open to counter arguments as it still seems weird. I keep
focusing on the bit that 'this.hashCode' is referenced twice as a
'read' - "if (hashCode == 0)" and "return hashCode". Since 'hashCode'
isn't final, volatile or in a syncrhonized region, the read into the
stack could be cached. As I understand it, it's not the reorder of the
Java code, it's the reorder of the generated code that just reads the
value from the heap to the stack.

I'm basing this off of the 'racy single-check' idiom that's mentioned
in Effective Java. I'd like to get a complete answer to this though.

-Nathan

>
>>  - Vijay
>>
>>
>>
>>  >
>>  > >  In any case, it does seem a pinch more efficient to only do one read
>>  > >  of hashCode ... switch up the code to be something like this.
>>  >
>>  > Agreed.
>>  >
>>  > >  public int hashCode() {
>>  > >     final int hash = hashCode;
>>  > >     if (hash == 0) {
>>  > >
>>  > >         if (count == 0) {
>>  > >             return 0;
>>  > >         }
>>  > >
>>  > >         for (int i = offset; i < count + offset; i++) {
>>  > >             hash = value[i] + ((hash << 5) - hash);
>>  > >         }
>>  > >         hashCode = hash;
>>  > >     }
>>  > >
>>  > >     return hash;
>>  > >  }
>>  > >
>>  > >  Thoughts?
>>  > >
>>  > >
>>  > >  >> where 'this.*' is always a memory reference while 'hash' is a thread
>>  > private var.
>>  > >  >>
>>  > >  >> 2. DRLVM JIT indeed does not privatize field access to threads. It
>>  > >  >>    always loads fields from memory in original order. Hence this
>>  > >  >>    potential bug does not affect DRLVM .. now. But potentially it can
>>  > >  >>    start optimizing things this way because current behavior prevents
>>  > >  >>    a bunch of high level optimizations.
>>  > >  >>
>>  > >  >> 3. Jeremy Manson, being an expert in Java Memory Model claims [1]
>>  > that
>>  > >  >>    such reordering is allowed theoretically. I.e. like this:
>>  > >  >>
>>  > >  >> int hash = this.hashCode;
>>  > >  >> if (this.hashCode == 0) {
>>  > >  >>   hash = this.hashCode = // calculate hash
>>  > >  >> }
>>  > >  >> return hash;
>>  > >  >>
>>  > >  >> This is a correct single-threaded code. What happened here is a
>>  > >  >> lengthy thread privatization of this.hashCode (again, does not happen
>>  > >  >> in DRLVM). That is incorrect in multithreaded environment and needs
>>  > >  >> extra synchronization/volatile/etc.
>>  > >  >>
>>  > >  >> 4. I do not know why a JIT would want to do that, i.e. two sequential
>>  > >  >>    reads from the same memory location. Sounds like a bit synthetic
>>  > example.
>>  > >  >
>>  > >  > ...at which point a bunch of code probably would go wrong!  So
>>  > hopefully
>>  > >  > it remains only a theoretical possibility.
>>  > >  >
>>  > >  > Regards,
>>  > >  > Tim
>>  > >  >
>>  > >  >> [1]
>>  > http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
>>  > >  >>
>>  > >  >
>>  > >
>>  >
>>
>

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Vijay Menon <vs...@google.com>.
On Sat, Dec 12, 2009 at 9:04 AM, sebb <se...@gmail.com> wrote:

> On 12/12/2009, Vijay Menon <vs...@google.com> wrote:
> > On Sat, Dec 12, 2009 at 7:34 AM, sebb <se...@gmail.com> wrote:
> >
> >  > On 12/12/2009, Nathan Beyer <nd...@apache.org> wrote:
> >  > > On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <
> t.p.ellison@gmail.com>
> >  > wrote:
> >  > >  > On 11/Dec/2009 14:32, Egor Pasko wrote:
> >  > >  >> On the 0x684 day of Apache Harmony Tim Ellison wrote:
> >  > >  >>> On 11/Dec/2009 04:09, Vijay Menon wrote:
> >  > >  >>>> Perhaps I'm missing some context, but I don't see any problem.
>  I
> >  > don't
> >  > >  >>>> believe that this:
> >  > >  >>>>
> >  > >  >>>>         if (hashCode == 0) {
> >  > >  >>>>             // calculate hash
> >  > >  >>>>             hashCode = hash;
> >  > >  >>>>         }
> >  > >  >>>>         return hashCode;
> >  > >  >>>>
> >  > >  >>>> can ever return 0 (assuming hash is non-zero), regardless of
> memory
> >  > fences.
> >  > >  >>>>  The JMM won't allow visible reordering in a single threaded
> >  > program.
> >  > >  >>> I agree.  In the multi-threaded case there can be a data race
> on the
> >  > >  >>> hashCode, with the effect that the same hashCode value is
> >  > unnecessarily,
> >  > >  >>> but harmlessly, recalculated.
> >  > >  >>
> >  > >  >> Vijay, Tim, you are not 100% correct here.
> >  > >  >>
> >  > >  >> 1. there should be an extra restriction that the part "calculate
> >  > hash"
> >  > >  >>    does not touch the hashCode field. Without that restriction
> more
> >  > >  >>    trivial races can happen as discussed in LANG-481.
> >  > >  >>
> >  > >  >> So we should assume this code:
> >  > >  >>
> >  > >  >> if (this.hashCode == 0) {
> >  > >  >>   int hash;
> >  > >  >>   if (this.hashCode == 0) {
> >  > >  >>     // Calculate using 'hash' only, not this.hashCode.
> >  > >  >>     this.hashCode = hash;
> >  > >  >>   }
> >  > >  >>   return this.hashCode;
> >  > >  >> }
> >  > >  >
> >  > >  > Yes, I guess I figured that was assumed :-)
> >  > >  >
> >  > >  > Of course, there are lots of things you could put into the
> >  > >  > "// Calculate..." section that would be unsafe.  We should stick
> with
> >  > >  > showing the non-abbreviated implementation to avoid ambiguity:
> >  > >  >
> >  > >  >    public int hashCode() {
> >  > >  >        if (hashCode == 0) {
> >  > >  >            if (count == 0) {
> >  > >  >                return 0;
> >  > >  >            }
> >  > >  >            int hash = 0;
> >  > >  >            for (int i = offset; i < count + offset; i++) {
> >  > >  >                hash = value[i] + ((hash << 5) - hash);
> >  > >  >            }
> >  > >  >            hashCode = hash;
> >  > >  >        }
> >  > >  >        return hashCode;
> >  > >  >    }
> >  > >  >
> >  > >
> >  > >
> >  > > I think I understand the concern, after some additional reading. The
> >  > >  issue seems to be that 'hashCode' is read twice and the field is
> not
> >  > >  protected by any memory barriers (synchronized, volatile, etc). As
> >  > >  such, it would be okay for the second read to be done using a
> cached
> >  > >  value, which means that both reads could return 0 in the same
> thread
> >  > >  of execution. Another way to look at it is that the write to
> >  > >  'hashCode' may or may not affect subsequent reads of 'hashCode'.
> This
> >  > >  is how I understand it.
> >  > >
> >  > >  Will that happen in practice? I have no idea. It does seem
> possible.
> >  >
> >  > The Java MM guarantees that a single thread behaves as if the code is
> >  > processed sequentially.
> >  > So if the thread writes non-zero to this.hashCode it cannot then
> >  > return zero for the value of this.hashCode if no other threads
> >  > intervene. The thread cannot ignore updates to values it has itself
> >  > cached!
> >  >
> >  > If another thread writes to this.hashCode concurrently, then this
> >  > thread may or may not see the value stored by that thread. In this
> >  > case, it's not a problem, as another thread can only write a fixed
> >  > value. So the worst that can happen is that this.hashCode is written
> >  > more than once, and the current thread may fetch the value written by
> >  > another thread. But this is the same value it wrote anyway.
> >  >
> >
> >
> > In a multithreaded setting, this code *can* break and return 0 if
> hashCode
> >  is read twice.  This is not just a performance optimization - it is a
> >  correctness issue.  The compiler / runtime / hardware is allowed to
> reorder
> >  read operations.  The following racy execution is allowable under the
> JMM:
> >
> >  1. Thread 1 reads 0 from hashCode and stores 0 into a local (t1).
> >  2. Thread 2 write 42 into hashCode.
> >  3. Thread 1 reads 42 from hashCode and stores 42 into a local (t2).
> >  4. Thread 1 compares t2 (42) with 0 and does not execute the if clause.
> >  5. Thread 1 returns t1 (0).
> >
>
> But why would Thread 1 read hashCode twice before the comparison?
>

Yeah, this confused me.  The blog Egor linked has more detail.  The key is
that the original code reads hashCode twice.  It's legal to hoist the second
read above the comparison and above the first read - as long as you add a
compensating read in the if-clause to keep the single threaded case correct.
 The compiler can legally convert the original code to this:

  t1 = this.hashCode;
  t2 = this.hashCode;
  if (t2 == 0) {
    // compute hash
    this.hashCode = hash;
    t1 = this.hashCode;
  }
  return t1;


> Seems to me that would break the "as if serial" guarantee for a single
> thread.
> In the code sequence, the comparison is before the return, and
> therefore "happens-before" the return. I.e. step 3 "happens-before"
> step 1+5.
>

The above code will always be correct in the single thread case - so this
guarantee is basically kept.  The problem is that the write from the other
thread has no happens-before relation with either read.  Essentially, the
JMM allows each read to (independently) decide whether they see the result
of the other thread's write.

- Vijay


>
> I'm not saying Harmony should keep the current code - the suggested
> temp variable version seems better anyway - just trying to understand
> what (if anything) is currently broken.
>
> >  - Vijay
> >
> >
> >
> >  >
> >  > >  In any case, it does seem a pinch more efficient to only do one
> read
> >  > >  of hashCode ... switch up the code to be something like this.
> >  >
> >  > Agreed.
> >  >
> >  > >  public int hashCode() {
> >  > >     final int hash = hashCode;
> >  > >     if (hash == 0) {
> >  > >
> >  > >         if (count == 0) {
> >  > >             return 0;
> >  > >         }
> >  > >
> >  > >         for (int i = offset; i < count + offset; i++) {
> >  > >             hash = value[i] + ((hash << 5) - hash);
> >  > >         }
> >  > >         hashCode = hash;
> >  > >     }
> >  > >
> >  > >     return hash;
> >  > >  }
> >  > >
> >  > >  Thoughts?
> >  > >
> >  > >
> >  > >  >> where 'this.*' is always a memory reference while 'hash' is a
> thread
> >  > private var.
> >  > >  >>
> >  > >  >> 2. DRLVM JIT indeed does not privatize field access to threads.
> It
> >  > >  >>    always loads fields from memory in original order. Hence this
> >  > >  >>    potential bug does not affect DRLVM .. now. But potentially
> it can
> >  > >  >>    start optimizing things this way because current behavior
> prevents
> >  > >  >>    a bunch of high level optimizations.
> >  > >  >>
> >  > >  >> 3. Jeremy Manson, being an expert in Java Memory Model claims
> [1]
> >  > that
> >  > >  >>    such reordering is allowed theoretically. I.e. like this:
> >  > >  >>
> >  > >  >> int hash = this.hashCode;
> >  > >  >> if (this.hashCode == 0) {
> >  > >  >>   hash = this.hashCode = // calculate hash
> >  > >  >> }
> >  > >  >> return hash;
> >  > >  >>
> >  > >  >> This is a correct single-threaded code. What happened here is a
> >  > >  >> lengthy thread privatization of this.hashCode (again, does not
> happen
> >  > >  >> in DRLVM). That is incorrect in multithreaded environment and
> needs
> >  > >  >> extra synchronization/volatile/etc.
> >  > >  >>
> >  > >  >> 4. I do not know why a JIT would want to do that, i.e. two
> sequential
> >  > >  >>    reads from the same memory location. Sounds like a bit
> synthetic
> >  > example.
> >  > >  >
> >  > >  > ...at which point a bunch of code probably would go wrong!  So
> >  > hopefully
> >  > >  > it remains only a theoretical possibility.
> >  > >  >
> >  > >  > Regards,
> >  > >  > Tim
> >  > >  >
> >  > >  >> [1]
> >  >
> http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
> >  > >  >>
> >  > >  >
> >  > >
> >  >
> >
>

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by sebb <se...@gmail.com>.
On 12/12/2009, Vijay Menon <vs...@google.com> wrote:
> On Sat, Dec 12, 2009 at 7:34 AM, sebb <se...@gmail.com> wrote:
>
>  > On 12/12/2009, Nathan Beyer <nd...@apache.org> wrote:
>  > > On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <t....@gmail.com>
>  > wrote:
>  > >  > On 11/Dec/2009 14:32, Egor Pasko wrote:
>  > >  >> On the 0x684 day of Apache Harmony Tim Ellison wrote:
>  > >  >>> On 11/Dec/2009 04:09, Vijay Menon wrote:
>  > >  >>>> Perhaps I'm missing some context, but I don't see any problem.  I
>  > don't
>  > >  >>>> believe that this:
>  > >  >>>>
>  > >  >>>>         if (hashCode == 0) {
>  > >  >>>>             // calculate hash
>  > >  >>>>             hashCode = hash;
>  > >  >>>>         }
>  > >  >>>>         return hashCode;
>  > >  >>>>
>  > >  >>>> can ever return 0 (assuming hash is non-zero), regardless of memory
>  > fences.
>  > >  >>>>  The JMM won't allow visible reordering in a single threaded
>  > program.
>  > >  >>> I agree.  In the multi-threaded case there can be a data race on the
>  > >  >>> hashCode, with the effect that the same hashCode value is
>  > unnecessarily,
>  > >  >>> but harmlessly, recalculated.
>  > >  >>
>  > >  >> Vijay, Tim, you are not 100% correct here.
>  > >  >>
>  > >  >> 1. there should be an extra restriction that the part "calculate
>  > hash"
>  > >  >>    does not touch the hashCode field. Without that restriction more
>  > >  >>    trivial races can happen as discussed in LANG-481.
>  > >  >>
>  > >  >> So we should assume this code:
>  > >  >>
>  > >  >> if (this.hashCode == 0) {
>  > >  >>   int hash;
>  > >  >>   if (this.hashCode == 0) {
>  > >  >>     // Calculate using 'hash' only, not this.hashCode.
>  > >  >>     this.hashCode = hash;
>  > >  >>   }
>  > >  >>   return this.hashCode;
>  > >  >> }
>  > >  >
>  > >  > Yes, I guess I figured that was assumed :-)
>  > >  >
>  > >  > Of course, there are lots of things you could put into the
>  > >  > "// Calculate..." section that would be unsafe.  We should stick with
>  > >  > showing the non-abbreviated implementation to avoid ambiguity:
>  > >  >
>  > >  >    public int hashCode() {
>  > >  >        if (hashCode == 0) {
>  > >  >            if (count == 0) {
>  > >  >                return 0;
>  > >  >            }
>  > >  >            int hash = 0;
>  > >  >            for (int i = offset; i < count + offset; i++) {
>  > >  >                hash = value[i] + ((hash << 5) - hash);
>  > >  >            }
>  > >  >            hashCode = hash;
>  > >  >        }
>  > >  >        return hashCode;
>  > >  >    }
>  > >  >
>  > >
>  > >
>  > > I think I understand the concern, after some additional reading. The
>  > >  issue seems to be that 'hashCode' is read twice and the field is not
>  > >  protected by any memory barriers (synchronized, volatile, etc). As
>  > >  such, it would be okay for the second read to be done using a cached
>  > >  value, which means that both reads could return 0 in the same thread
>  > >  of execution. Another way to look at it is that the write to
>  > >  'hashCode' may or may not affect subsequent reads of 'hashCode'. This
>  > >  is how I understand it.
>  > >
>  > >  Will that happen in practice? I have no idea. It does seem possible.
>  >
>  > The Java MM guarantees that a single thread behaves as if the code is
>  > processed sequentially.
>  > So if the thread writes non-zero to this.hashCode it cannot then
>  > return zero for the value of this.hashCode if no other threads
>  > intervene. The thread cannot ignore updates to values it has itself
>  > cached!
>  >
>  > If another thread writes to this.hashCode concurrently, then this
>  > thread may or may not see the value stored by that thread. In this
>  > case, it's not a problem, as another thread can only write a fixed
>  > value. So the worst that can happen is that this.hashCode is written
>  > more than once, and the current thread may fetch the value written by
>  > another thread. But this is the same value it wrote anyway.
>  >
>
>
> In a multithreaded setting, this code *can* break and return 0 if hashCode
>  is read twice.  This is not just a performance optimization - it is a
>  correctness issue.  The compiler / runtime / hardware is allowed to reorder
>  read operations.  The following racy execution is allowable under the JMM:
>
>  1. Thread 1 reads 0 from hashCode and stores 0 into a local (t1).
>  2. Thread 2 write 42 into hashCode.
>  3. Thread 1 reads 42 from hashCode and stores 42 into a local (t2).
>  4. Thread 1 compares t2 (42) with 0 and does not execute the if clause.
>  5. Thread 1 returns t1 (0).
>

But why would Thread 1 read hashCode twice before the comparison?

Seems to me that would break the "as if serial" guarantee for a single thread.
In the code sequence, the comparison is before the return, and
therefore "happens-before" the return. I.e. step 3 "happens-before"
step 1+5.

I'm not saying Harmony should keep the current code - the suggested
temp variable version seems better anyway - just trying to understand
what (if anything) is currently broken.

>  - Vijay
>
>
>
>  >
>  > >  In any case, it does seem a pinch more efficient to only do one read
>  > >  of hashCode ... switch up the code to be something like this.
>  >
>  > Agreed.
>  >
>  > >  public int hashCode() {
>  > >     final int hash = hashCode;
>  > >     if (hash == 0) {
>  > >
>  > >         if (count == 0) {
>  > >             return 0;
>  > >         }
>  > >
>  > >         for (int i = offset; i < count + offset; i++) {
>  > >             hash = value[i] + ((hash << 5) - hash);
>  > >         }
>  > >         hashCode = hash;
>  > >     }
>  > >
>  > >     return hash;
>  > >  }
>  > >
>  > >  Thoughts?
>  > >
>  > >
>  > >  >> where 'this.*' is always a memory reference while 'hash' is a thread
>  > private var.
>  > >  >>
>  > >  >> 2. DRLVM JIT indeed does not privatize field access to threads. It
>  > >  >>    always loads fields from memory in original order. Hence this
>  > >  >>    potential bug does not affect DRLVM .. now. But potentially it can
>  > >  >>    start optimizing things this way because current behavior prevents
>  > >  >>    a bunch of high level optimizations.
>  > >  >>
>  > >  >> 3. Jeremy Manson, being an expert in Java Memory Model claims [1]
>  > that
>  > >  >>    such reordering is allowed theoretically. I.e. like this:
>  > >  >>
>  > >  >> int hash = this.hashCode;
>  > >  >> if (this.hashCode == 0) {
>  > >  >>   hash = this.hashCode = // calculate hash
>  > >  >> }
>  > >  >> return hash;
>  > >  >>
>  > >  >> This is a correct single-threaded code. What happened here is a
>  > >  >> lengthy thread privatization of this.hashCode (again, does not happen
>  > >  >> in DRLVM). That is incorrect in multithreaded environment and needs
>  > >  >> extra synchronization/volatile/etc.
>  > >  >>
>  > >  >> 4. I do not know why a JIT would want to do that, i.e. two sequential
>  > >  >>    reads from the same memory location. Sounds like a bit synthetic
>  > example.
>  > >  >
>  > >  > ...at which point a bunch of code probably would go wrong!  So
>  > hopefully
>  > >  > it remains only a theoretical possibility.
>  > >  >
>  > >  > Regards,
>  > >  > Tim
>  > >  >
>  > >  >> [1]
>  > http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
>  > >  >>
>  > >  >
>  > >
>  >
>

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Vijay Menon <vs...@google.com>.
On Sat, Dec 12, 2009 at 7:34 AM, sebb <se...@gmail.com> wrote:

> On 12/12/2009, Nathan Beyer <nd...@apache.org> wrote:
> > On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <t....@gmail.com>
> wrote:
> >  > On 11/Dec/2009 14:32, Egor Pasko wrote:
> >  >> On the 0x684 day of Apache Harmony Tim Ellison wrote:
> >  >>> On 11/Dec/2009 04:09, Vijay Menon wrote:
> >  >>>> Perhaps I'm missing some context, but I don't see any problem.  I
> don't
> >  >>>> believe that this:
> >  >>>>
> >  >>>>         if (hashCode == 0) {
> >  >>>>             // calculate hash
> >  >>>>             hashCode = hash;
> >  >>>>         }
> >  >>>>         return hashCode;
> >  >>>>
> >  >>>> can ever return 0 (assuming hash is non-zero), regardless of memory
> fences.
> >  >>>>  The JMM won't allow visible reordering in a single threaded
> program.
> >  >>> I agree.  In the multi-threaded case there can be a data race on the
> >  >>> hashCode, with the effect that the same hashCode value is
> unnecessarily,
> >  >>> but harmlessly, recalculated.
> >  >>
> >  >> Vijay, Tim, you are not 100% correct here.
> >  >>
> >  >> 1. there should be an extra restriction that the part "calculate
> hash"
> >  >>    does not touch the hashCode field. Without that restriction more
> >  >>    trivial races can happen as discussed in LANG-481.
> >  >>
> >  >> So we should assume this code:
> >  >>
> >  >> if (this.hashCode == 0) {
> >  >>   int hash;
> >  >>   if (this.hashCode == 0) {
> >  >>     // Calculate using 'hash' only, not this.hashCode.
> >  >>     this.hashCode = hash;
> >  >>   }
> >  >>   return this.hashCode;
> >  >> }
> >  >
> >  > Yes, I guess I figured that was assumed :-)
> >  >
> >  > Of course, there are lots of things you could put into the
> >  > "// Calculate..." section that would be unsafe.  We should stick with
> >  > showing the non-abbreviated implementation to avoid ambiguity:
> >  >
> >  >    public int hashCode() {
> >  >        if (hashCode == 0) {
> >  >            if (count == 0) {
> >  >                return 0;
> >  >            }
> >  >            int hash = 0;
> >  >            for (int i = offset; i < count + offset; i++) {
> >  >                hash = value[i] + ((hash << 5) - hash);
> >  >            }
> >  >            hashCode = hash;
> >  >        }
> >  >        return hashCode;
> >  >    }
> >  >
> >
> >
> > I think I understand the concern, after some additional reading. The
> >  issue seems to be that 'hashCode' is read twice and the field is not
> >  protected by any memory barriers (synchronized, volatile, etc). As
> >  such, it would be okay for the second read to be done using a cached
> >  value, which means that both reads could return 0 in the same thread
> >  of execution. Another way to look at it is that the write to
> >  'hashCode' may or may not affect subsequent reads of 'hashCode'. This
> >  is how I understand it.
> >
> >  Will that happen in practice? I have no idea. It does seem possible.
>
> The Java MM guarantees that a single thread behaves as if the code is
> processed sequentially.
> So if the thread writes non-zero to this.hashCode it cannot then
> return zero for the value of this.hashCode if no other threads
> intervene. The thread cannot ignore updates to values it has itself
> cached!
>
> If another thread writes to this.hashCode concurrently, then this
> thread may or may not see the value stored by that thread. In this
> case, it's not a problem, as another thread can only write a fixed
> value. So the worst that can happen is that this.hashCode is written
> more than once, and the current thread may fetch the value written by
> another thread. But this is the same value it wrote anyway.
>

In a multithreaded setting, this code *can* break and return 0 if hashCode
is read twice.  This is not just a performance optimization - it is a
correctness issue.  The compiler / runtime / hardware is allowed to reorder
read operations.  The following racy execution is allowable under the JMM:

1. Thread 1 reads 0 from hashCode and stores 0 into a local (t1).
2. Thread 2 write 42 into hashCode.
3. Thread 1 reads 42 from hashCode and stores 42 into a local (t2).
4. Thread 1 compares t2 (42) with 0 and does not execute the if clause.
5. Thread 1 returns t1 (0).

- Vijay


>
> >  In any case, it does seem a pinch more efficient to only do one read
> >  of hashCode ... switch up the code to be something like this.
>
> Agreed.
>
> >  public int hashCode() {
> >     final int hash = hashCode;
> >     if (hash == 0) {
> >
> >         if (count == 0) {
> >             return 0;
> >         }
> >
> >         for (int i = offset; i < count + offset; i++) {
> >             hash = value[i] + ((hash << 5) - hash);
> >         }
> >         hashCode = hash;
> >     }
> >
> >     return hash;
> >  }
> >
> >  Thoughts?
> >
> >
> >  >> where 'this.*' is always a memory reference while 'hash' is a thread
> private var.
> >  >>
> >  >> 2. DRLVM JIT indeed does not privatize field access to threads. It
> >  >>    always loads fields from memory in original order. Hence this
> >  >>    potential bug does not affect DRLVM .. now. But potentially it can
> >  >>    start optimizing things this way because current behavior prevents
> >  >>    a bunch of high level optimizations.
> >  >>
> >  >> 3. Jeremy Manson, being an expert in Java Memory Model claims [1]
> that
> >  >>    such reordering is allowed theoretically. I.e. like this:
> >  >>
> >  >> int hash = this.hashCode;
> >  >> if (this.hashCode == 0) {
> >  >>   hash = this.hashCode = // calculate hash
> >  >> }
> >  >> return hash;
> >  >>
> >  >> This is a correct single-threaded code. What happened here is a
> >  >> lengthy thread privatization of this.hashCode (again, does not happen
> >  >> in DRLVM). That is incorrect in multithreaded environment and needs
> >  >> extra synchronization/volatile/etc.
> >  >>
> >  >> 4. I do not know why a JIT would want to do that, i.e. two sequential
> >  >>    reads from the same memory location. Sounds like a bit synthetic
> example.
> >  >
> >  > ...at which point a bunch of code probably would go wrong!  So
> hopefully
> >  > it remains only a theoretical possibility.
> >  >
> >  > Regards,
> >  > Tim
> >  >
> >  >> [1]
> http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
> >  >>
> >  >
> >
>

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by sebb <se...@gmail.com>.
On 12/12/2009, Nathan Beyer <nd...@apache.org> wrote:
> On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <t....@gmail.com> wrote:
>  > On 11/Dec/2009 14:32, Egor Pasko wrote:
>  >> On the 0x684 day of Apache Harmony Tim Ellison wrote:
>  >>> On 11/Dec/2009 04:09, Vijay Menon wrote:
>  >>>> Perhaps I'm missing some context, but I don't see any problem.  I don't
>  >>>> believe that this:
>  >>>>
>  >>>>         if (hashCode == 0) {
>  >>>>             // calculate hash
>  >>>>             hashCode = hash;
>  >>>>         }
>  >>>>         return hashCode;
>  >>>>
>  >>>> can ever return 0 (assuming hash is non-zero), regardless of memory fences.
>  >>>>  The JMM won't allow visible reordering in a single threaded program.
>  >>> I agree.  In the multi-threaded case there can be a data race on the
>  >>> hashCode, with the effect that the same hashCode value is unnecessarily,
>  >>> but harmlessly, recalculated.
>  >>
>  >> Vijay, Tim, you are not 100% correct here.
>  >>
>  >> 1. there should be an extra restriction that the part "calculate hash"
>  >>    does not touch the hashCode field. Without that restriction more
>  >>    trivial races can happen as discussed in LANG-481.
>  >>
>  >> So we should assume this code:
>  >>
>  >> if (this.hashCode == 0) {
>  >>   int hash;
>  >>   if (this.hashCode == 0) {
>  >>     // Calculate using 'hash' only, not this.hashCode.
>  >>     this.hashCode = hash;
>  >>   }
>  >>   return this.hashCode;
>  >> }
>  >
>  > Yes, I guess I figured that was assumed :-)
>  >
>  > Of course, there are lots of things you could put into the
>  > "// Calculate..." section that would be unsafe.  We should stick with
>  > showing the non-abbreviated implementation to avoid ambiguity:
>  >
>  >    public int hashCode() {
>  >        if (hashCode == 0) {
>  >            if (count == 0) {
>  >                return 0;
>  >            }
>  >            int hash = 0;
>  >            for (int i = offset; i < count + offset; i++) {
>  >                hash = value[i] + ((hash << 5) - hash);
>  >            }
>  >            hashCode = hash;
>  >        }
>  >        return hashCode;
>  >    }
>  >
>
>
> I think I understand the concern, after some additional reading. The
>  issue seems to be that 'hashCode' is read twice and the field is not
>  protected by any memory barriers (synchronized, volatile, etc). As
>  such, it would be okay for the second read to be done using a cached
>  value, which means that both reads could return 0 in the same thread
>  of execution. Another way to look at it is that the write to
>  'hashCode' may or may not affect subsequent reads of 'hashCode'. This
>  is how I understand it.
>
>  Will that happen in practice? I have no idea. It does seem possible.

The Java MM guarantees that a single thread behaves as if the code is
processed sequentially.
So if the thread writes non-zero to this.hashCode it cannot then
return zero for the value of this.hashCode if no other threads
intervene. The thread cannot ignore updates to values it has itself
cached!

If another thread writes to this.hashCode concurrently, then this
thread may or may not see the value stored by that thread. In this
case, it's not a problem, as another thread can only write a fixed
value. So the worst that can happen is that this.hashCode is written
more than once, and the current thread may fetch the value written by
another thread. But this is the same value it wrote anyway.

>  In any case, it does seem a pinch more efficient to only do one read
>  of hashCode ... switch up the code to be something like this.

Agreed.

>  public int hashCode() {
>     final int hash = hashCode;
>     if (hash == 0) {
>
>         if (count == 0) {
>             return 0;
>         }
>
>         for (int i = offset; i < count + offset; i++) {
>             hash = value[i] + ((hash << 5) - hash);
>         }
>         hashCode = hash;
>     }
>
>     return hash;
>  }
>
>  Thoughts?
>
>
>  >> where 'this.*' is always a memory reference while 'hash' is a thread private var.
>  >>
>  >> 2. DRLVM JIT indeed does not privatize field access to threads. It
>  >>    always loads fields from memory in original order. Hence this
>  >>    potential bug does not affect DRLVM .. now. But potentially it can
>  >>    start optimizing things this way because current behavior prevents
>  >>    a bunch of high level optimizations.
>  >>
>  >> 3. Jeremy Manson, being an expert in Java Memory Model claims [1] that
>  >>    such reordering is allowed theoretically. I.e. like this:
>  >>
>  >> int hash = this.hashCode;
>  >> if (this.hashCode == 0) {
>  >>   hash = this.hashCode = // calculate hash
>  >> }
>  >> return hash;
>  >>
>  >> This is a correct single-threaded code. What happened here is a
>  >> lengthy thread privatization of this.hashCode (again, does not happen
>  >> in DRLVM). That is incorrect in multithreaded environment and needs
>  >> extra synchronization/volatile/etc.
>  >>
>  >> 4. I do not know why a JIT would want to do that, i.e. two sequential
>  >>    reads from the same memory location. Sounds like a bit synthetic example.
>  >
>  > ...at which point a bunch of code probably would go wrong!  So hopefully
>  > it remains only a theoretical possibility.
>  >
>  > Regards,
>  > Tim
>  >
>  >> [1] http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
>  >>
>  >
>

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Vijay Menon <vs...@google.com>.
2010/1/9 Egor Pasko <eg...@gmail.com>

> On the 0x68E day of Apache Harmony Nathan Beyer wrote:
> > On Sun, Dec 20, 2009 at 1:42 PM, Tim Ellison <t....@gmail.com>
> wrote:
> >> On 19/Dec/2009 20:54, Nathan Beyer wrote:
> >>> On Sat, Dec 19, 2009 at 2:26 PM, Tim Ellison <t....@gmail.com>
> wrote:
> >>>> On 19/Dec/2009 18:57, Nathan Beyer wrote:
> >>>>> RE: https://issues.apache.org/jira/browse/HARMONY-6404
> >>>>>
> >>>>> I posted up two proposed patches for String, please comment on which
> >>>>> is preferable. One is the change previously mentioned, the other is a
> >>>>> slightly bigger reorganization.
> >>>> Neither.  There is nothing wrong with the original code.
> >>>>
> >>>> Any JVM or compiler that would actually move a load up above a
> >>>> conditional store to the same variable would be seriously broken all
> round.
> >>>
> >>> Does that really matter?
> >>
> >> Yes, we don't code to allow for bogus VM/compiler implementations.
> >>
> >>> Shouldn't the Java source be written such that it's as correct as
> >>> possible according to the language spec and related specs?
> >>
> >> I'm guessing you have your tongue firmly in your cheek when writing that
> >> :-)  Yes to "correct as possible" but we don't account for the never to
> >> be implemented edge cases that are possible by the spec.
> >>
> >> Put another way, we don't fix things that are not broken.  I challenge
> >> you to show an implementation that would do the read re-ordering as you
> >> suggest.
> >
> > So what's all the fuss, then? According to this guy [1] ... our code
> > is broken and he helped define the memory model. This is the guy
> > suggesting the issue. Our code is almost exactly like the same he
> > describes as broken. The only difference is we have an extra check on
> > 'count'.
> >
> > At this point, I just want to understand.
> >
> > [1]
> http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
>
> Sorry for delayed answer (was on vacation).
>
> After all the fuss I think Tim is right. JMM is a model that is close
> to real implementations except some corner cases like this. JMM allows
> slightly more than any JVM implementation would do. That's a good
> excersise to think about and be warned, but not a lot more.
>
> If our goal is to please all possible JVM implementations, we should
> bother, but that would be far from the main goal of the project. Until
> we have a meaningful implementation to make two memory loads in the
> place where only one is required, we should not take any action on
> hashCode.
>
>
I hate to belabor the point, but the code is legally broken and Nathan's
proposed fix is simple and straightforward.  Why not fix it?

I agree that it's unlikely to break on x86.  But, I think it can break on
practical implementations for other architectures.  From memory, StarJIT's
Itanium backend would do exactly the reordering Jeremy Manson mentions in
his blog post under certain profile information and scheduling constraints.

Vijay


> --
> Egor Pasko
>
>

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Nathan Beyer <nd...@apache.org>.
2010/1/9 Egor Pasko <eg...@gmail.com>:
> On the 0x68E day of Apache Harmony Nathan Beyer wrote:
>> On Sun, Dec 20, 2009 at 1:42 PM, Tim Ellison <t....@gmail.com> wrote:
>>> On 19/Dec/2009 20:54, Nathan Beyer wrote:
>>>> On Sat, Dec 19, 2009 at 2:26 PM, Tim Ellison <t....@gmail.com> wrote:
>>>>> On 19/Dec/2009 18:57, Nathan Beyer wrote:
>>>>>> RE: https://issues.apache.org/jira/browse/HARMONY-6404
>>>>>>
>>>>>> I posted up two proposed patches for String, please comment on which
>>>>>> is preferable. One is the change previously mentioned, the other is a
>>>>>> slightly bigger reorganization.
>>>>> Neither.  There is nothing wrong with the original code.
>>>>>
>>>>> Any JVM or compiler that would actually move a load up above a
>>>>> conditional store to the same variable would be seriously broken all round.
>>>>
>>>> Does that really matter?
>>>
>>> Yes, we don't code to allow for bogus VM/compiler implementations.
>>>
>>>> Shouldn't the Java source be written such that it's as correct as
>>>> possible according to the language spec and related specs?
>>>
>>> I'm guessing you have your tongue firmly in your cheek when writing that
>>> :-)  Yes to "correct as possible" but we don't account for the never to
>>> be implemented edge cases that are possible by the spec.
>>>
>>> Put another way, we don't fix things that are not broken.  I challenge
>>> you to show an implementation that would do the read re-ordering as you
>>> suggest.
>>
>> So what's all the fuss, then? According to this guy [1] ... our code
>> is broken and he helped define the memory model. This is the guy
>> suggesting the issue. Our code is almost exactly like the same he
>> describes as broken. The only difference is we have an extra check on
>> 'count'.
>>
>> At this point, I just want to understand.
>>
>> [1] http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
>
> Sorry for delayed answer (was on vacation).
>
> After all the fuss I think Tim is right. JMM is a model that is close
> to real implementations except some corner cases like this. JMM allows
> slightly more than any JVM implementation would do. That's a good
> excersise to think about and be warned, but not a lot more.
>
> If our goal is to please all possible JVM implementations, we should
> bother, but that would be far from the main goal of the project. Until
> we have a meaningful implementation to make two memory loads in the
> place where only one is required, we should not take any action on
> hashCode.
>
> --
> Egor Pasko

Since this issue doesn't have clear consensus, at least not to me, I'd
like to get an explicit vote for resolving this issue as "Wont Fix".

https://issues.apache.org/jira/browse/HARMONY-6404
+1 to close as "Wont Fix"
-1 to leave open - please accompany with a patch

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Egor Pasko <eg...@gmail.com>.
On the 0x68E day of Apache Harmony Nathan Beyer wrote:
> On Sun, Dec 20, 2009 at 1:42 PM, Tim Ellison <t....@gmail.com> wrote:
>> On 19/Dec/2009 20:54, Nathan Beyer wrote:
>>> On Sat, Dec 19, 2009 at 2:26 PM, Tim Ellison <t....@gmail.com> wrote:
>>>> On 19/Dec/2009 18:57, Nathan Beyer wrote:
>>>>> RE: https://issues.apache.org/jira/browse/HARMONY-6404
>>>>>
>>>>> I posted up two proposed patches for String, please comment on which
>>>>> is preferable. One is the change previously mentioned, the other is a
>>>>> slightly bigger reorganization.
>>>> Neither.  There is nothing wrong with the original code.
>>>>
>>>> Any JVM or compiler that would actually move a load up above a
>>>> conditional store to the same variable would be seriously broken all round.
>>>
>>> Does that really matter?
>>
>> Yes, we don't code to allow for bogus VM/compiler implementations.
>>
>>> Shouldn't the Java source be written such that it's as correct as
>>> possible according to the language spec and related specs?
>>
>> I'm guessing you have your tongue firmly in your cheek when writing that
>> :-)  Yes to "correct as possible" but we don't account for the never to
>> be implemented edge cases that are possible by the spec.
>>
>> Put another way, we don't fix things that are not broken.  I challenge
>> you to show an implementation that would do the read re-ordering as you
>> suggest.
>
> So what's all the fuss, then? According to this guy [1] ... our code
> is broken and he helped define the memory model. This is the guy
> suggesting the issue. Our code is almost exactly like the same he
> describes as broken. The only difference is we have an extra check on
> 'count'.
>
> At this point, I just want to understand.
>
> [1] http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html

Sorry for delayed answer (was on vacation).

After all the fuss I think Tim is right. JMM is a model that is close
to real implementations except some corner cases like this. JMM allows
slightly more than any JVM implementation would do. That's a good
excersise to think about and be warned, but not a lot more.

If our goal is to please all possible JVM implementations, we should
bother, but that would be far from the main goal of the project. Until
we have a meaningful implementation to make two memory loads in the
place where only one is required, we should not take any action on
hashCode.

-- 
Egor Pasko


Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Nathan Beyer <nb...@gmail.com>.
On Sun, Dec 20, 2009 at 1:42 PM, Tim Ellison <t....@gmail.com> wrote:
> On 19/Dec/2009 20:54, Nathan Beyer wrote:
>> On Sat, Dec 19, 2009 at 2:26 PM, Tim Ellison <t....@gmail.com> wrote:
>>> On 19/Dec/2009 18:57, Nathan Beyer wrote:
>>>> RE: https://issues.apache.org/jira/browse/HARMONY-6404
>>>>
>>>> I posted up two proposed patches for String, please comment on which
>>>> is preferable. One is the change previously mentioned, the other is a
>>>> slightly bigger reorganization.
>>> Neither.  There is nothing wrong with the original code.
>>>
>>> Any JVM or compiler that would actually move a load up above a
>>> conditional store to the same variable would be seriously broken all round.
>>
>> Does that really matter?
>
> Yes, we don't code to allow for bogus VM/compiler implementations.
>
>> Shouldn't the Java source be written such that it's as correct as
>> possible according to the language spec and related specs?
>
> I'm guessing you have your tongue firmly in your cheek when writing that
> :-)  Yes to "correct as possible" but we don't account for the never to
> be implemented edge cases that are possible by the spec.
>
> Put another way, we don't fix things that are not broken.  I challenge
> you to show an implementation that would do the read re-ordering as you
> suggest.

So what's all the fuss, then? According to this guy [1] ... our code
is broken and he helped define the memory model. This is the guy
suggesting the issue. Our code is almost exactly like the same he
describes as broken. The only difference is we have an extra check on
'count'.

At this point, I just want to understand.

[1] http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html

>
> In any case, if you want to make the changes, go ahead, I don't think it
> will break/fix anything.
>
>> Regardless, this isn't about ordering, its about 'this.hashCode' being
>> read multiple times -- which is the exact type of change that was just
>> fixed in the equals method [1]. How are these two methods different?
>> Why change one and not the other.
>
> They are different because the multiple loads by the same thread make a
> difference in the equals() method.  If the hashCode changes between
> loads by the same thread then the equals would answer the wrong value.
>
> However, the hashCode method is different because a thread can't return
> the wrong value by reading the hashCode twice.  There may be a data
> race, with two threads both calculating the hash, but they will simply
> compute the same value and store it, so no harm done.
>
> Regards,
> Tim
>
>> [1] https://issues.apache.org/jira/browse/HARMONY-6405
>>
>>> Regards,
>>> Tim
>>>
>>>
>>>> On Sat, Dec 19, 2009 at 4:04 AM, Egor Pasko <eg...@gmail.com> wrote:
>>>>> On the 0x68A day of Apache Harmony Tim Ellison wrote:
>>>>>> On 12/Dec/2009 10:54, Egor Pasko wrote:
>>>>>>> On the 0x685 day of Apache Harmony Nathan Beyer wrote:
>>>>>> <snip>
>>>>>>>> In any case, it does seem a pinch more efficient to only do one read
>>>>>>>> of hashCode ... switch up the code to be something like this.
>>>>>>>>
>>>>>>>> public int hashCode() {
>>>>>>>>     final int hash = hashCode;
>>>>>>>>     if (hash == 0) {
>>>>>>>>         if (count == 0) {
>>>>>>>>             return 0;
>>>>>>>>         }
>>>>>>>>         for (int i = offset; i < count + offset; i++) {
>>>>>>>>             hash = value[i] + ((hash << 5) - hash);
>>>>>>>>         }
>>>>>>>>         hashCode = hash;
>>>>>>> one more 'return hash' here, please :)
>>>>>> Why?
>>>>> argh, what was I thinking about? the original Nathan's code is great.
>>>>>
>>>>> --
>>>>> Egor Pasko
>>>>>
>>>>>
>>
>

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Tim Ellison <t....@gmail.com>.
On 19/Dec/2009 20:54, Nathan Beyer wrote:
> On Sat, Dec 19, 2009 at 2:26 PM, Tim Ellison <t....@gmail.com> wrote:
>> On 19/Dec/2009 18:57, Nathan Beyer wrote:
>>> RE: https://issues.apache.org/jira/browse/HARMONY-6404
>>>
>>> I posted up two proposed patches for String, please comment on which
>>> is preferable. One is the change previously mentioned, the other is a
>>> slightly bigger reorganization.
>> Neither.  There is nothing wrong with the original code.
>>
>> Any JVM or compiler that would actually move a load up above a
>> conditional store to the same variable would be seriously broken all round.
> 
> Does that really matter?

Yes, we don't code to allow for bogus VM/compiler implementations.

> Shouldn't the Java source be written such that it's as correct as
> possible according to the language spec and related specs?

I'm guessing you have your tongue firmly in your cheek when writing that
:-)  Yes to "correct as possible" but we don't account for the never to
be implemented edge cases that are possible by the spec.

Put another way, we don't fix things that are not broken.  I challenge
you to show an implementation that would do the read re-ordering as you
suggest.

In any case, if you want to make the changes, go ahead, I don't think it
will break/fix anything.

> Regardless, this isn't about ordering, its about 'this.hashCode' being
> read multiple times -- which is the exact type of change that was just
> fixed in the equals method [1]. How are these two methods different?
> Why change one and not the other.

They are different because the multiple loads by the same thread make a
difference in the equals() method.  If the hashCode changes between
loads by the same thread then the equals would answer the wrong value.

However, the hashCode method is different because a thread can't return
the wrong value by reading the hashCode twice.  There may be a data
race, with two threads both calculating the hash, but they will simply
compute the same value and store it, so no harm done.

Regards,
Tim

> [1] https://issues.apache.org/jira/browse/HARMONY-6405
> 
>> Regards,
>> Tim
>>
>>
>>> On Sat, Dec 19, 2009 at 4:04 AM, Egor Pasko <eg...@gmail.com> wrote:
>>>> On the 0x68A day of Apache Harmony Tim Ellison wrote:
>>>>> On 12/Dec/2009 10:54, Egor Pasko wrote:
>>>>>> On the 0x685 day of Apache Harmony Nathan Beyer wrote:
>>>>> <snip>
>>>>>>> In any case, it does seem a pinch more efficient to only do one read
>>>>>>> of hashCode ... switch up the code to be something like this.
>>>>>>>
>>>>>>> public int hashCode() {
>>>>>>>     final int hash = hashCode;
>>>>>>>     if (hash == 0) {
>>>>>>>         if (count == 0) {
>>>>>>>             return 0;
>>>>>>>         }
>>>>>>>         for (int i = offset; i < count + offset; i++) {
>>>>>>>             hash = value[i] + ((hash << 5) - hash);
>>>>>>>         }
>>>>>>>         hashCode = hash;
>>>>>> one more 'return hash' here, please :)
>>>>> Why?
>>>> argh, what was I thinking about? the original Nathan's code is great.
>>>>
>>>> --
>>>> Egor Pasko
>>>>
>>>>
> 

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Nathan Beyer <nd...@apache.org>.
On Sat, Dec 19, 2009 at 2:26 PM, Tim Ellison <t....@gmail.com> wrote:
> On 19/Dec/2009 18:57, Nathan Beyer wrote:
>> RE: https://issues.apache.org/jira/browse/HARMONY-6404
>>
>> I posted up two proposed patches for String, please comment on which
>> is preferable. One is the change previously mentioned, the other is a
>> slightly bigger reorganization.
>
> Neither.  There is nothing wrong with the original code.
>
> Any JVM or compiler that would actually move a load up above a
> conditional store to the same variable would be seriously broken all round.

Does that really matter? Shouldn't the Java source be written such
that it's as correct as possible according to the language spec and
related specs?

Regardless, this isn't about ordering, its about 'this.hashCode' being
read multiple times -- which is the exact type of change that was just
fixed in the equals method [1]. How are these two methods different?
Why change one and not the other.

-Nathan

[1] https://issues.apache.org/jira/browse/HARMONY-6405

>
> Regards,
> Tim
>
>
>> On Sat, Dec 19, 2009 at 4:04 AM, Egor Pasko <eg...@gmail.com> wrote:
>>> On the 0x68A day of Apache Harmony Tim Ellison wrote:
>>>> On 12/Dec/2009 10:54, Egor Pasko wrote:
>>>>> On the 0x685 day of Apache Harmony Nathan Beyer wrote:
>>>> <snip>
>>>>>> In any case, it does seem a pinch more efficient to only do one read
>>>>>> of hashCode ... switch up the code to be something like this.
>>>>>>
>>>>>> public int hashCode() {
>>>>>>     final int hash = hashCode;
>>>>>>     if (hash == 0) {
>>>>>>         if (count == 0) {
>>>>>>             return 0;
>>>>>>         }
>>>>>>         for (int i = offset; i < count + offset; i++) {
>>>>>>             hash = value[i] + ((hash << 5) - hash);
>>>>>>         }
>>>>>>         hashCode = hash;
>>>>> one more 'return hash' here, please :)
>>>> Why?
>>> argh, what was I thinking about? the original Nathan's code is great.
>>>
>>> --
>>> Egor Pasko
>>>
>>>
>>
>

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Tim Ellison <t....@gmail.com>.
On 19/Dec/2009 18:57, Nathan Beyer wrote:
> RE: https://issues.apache.org/jira/browse/HARMONY-6404
> 
> I posted up two proposed patches for String, please comment on which
> is preferable. One is the change previously mentioned, the other is a
> slightly bigger reorganization.

Neither.  There is nothing wrong with the original code.

Any JVM or compiler that would actually move a load up above a
conditional store to the same variable would be seriously broken all round.

Regards,
Tim


> On Sat, Dec 19, 2009 at 4:04 AM, Egor Pasko <eg...@gmail.com> wrote:
>> On the 0x68A day of Apache Harmony Tim Ellison wrote:
>>> On 12/Dec/2009 10:54, Egor Pasko wrote:
>>>> On the 0x685 day of Apache Harmony Nathan Beyer wrote:
>>> <snip>
>>>>> In any case, it does seem a pinch more efficient to only do one read
>>>>> of hashCode ... switch up the code to be something like this.
>>>>>
>>>>> public int hashCode() {
>>>>>     final int hash = hashCode;
>>>>>     if (hash == 0) {
>>>>>         if (count == 0) {
>>>>>             return 0;
>>>>>         }
>>>>>         for (int i = offset; i < count + offset; i++) {
>>>>>             hash = value[i] + ((hash << 5) - hash);
>>>>>         }
>>>>>         hashCode = hash;
>>>> one more 'return hash' here, please :)
>>> Why?
>> argh, what was I thinking about? the original Nathan's code is great.
>>
>> --
>> Egor Pasko
>>
>>
> 

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Nathan Beyer <nd...@apache.org>.
RE: https://issues.apache.org/jira/browse/HARMONY-6404

I posted up two proposed patches for String, please comment on which
is preferable. One is the change previously mentioned, the other is a
slightly bigger reorganization.

-Nathan

On Sat, Dec 19, 2009 at 4:04 AM, Egor Pasko <eg...@gmail.com> wrote:
> On the 0x68A day of Apache Harmony Tim Ellison wrote:
>> On 12/Dec/2009 10:54, Egor Pasko wrote:
>>> On the 0x685 day of Apache Harmony Nathan Beyer wrote:
>> <snip>
>>>> In any case, it does seem a pinch more efficient to only do one read
>>>> of hashCode ... switch up the code to be something like this.
>>>>
>>>> public int hashCode() {
>>>>     final int hash = hashCode;
>>>>     if (hash == 0) {
>>>>         if (count == 0) {
>>>>             return 0;
>>>>         }
>>>>         for (int i = offset; i < count + offset; i++) {
>>>>             hash = value[i] + ((hash << 5) - hash);
>>>>         }
>>>>         hashCode = hash;
>>>
>>> one more 'return hash' here, please :)
>>
>> Why?
>
> argh, what was I thinking about? the original Nathan's code is great.
>
> --
> Egor Pasko
>
>

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Egor Pasko <eg...@gmail.com>.
On the 0x68A day of Apache Harmony Tim Ellison wrote:
> On 12/Dec/2009 10:54, Egor Pasko wrote:
>> On the 0x685 day of Apache Harmony Nathan Beyer wrote:
> <snip>
>>> In any case, it does seem a pinch more efficient to only do one read
>>> of hashCode ... switch up the code to be something like this.
>>>
>>> public int hashCode() {
>>>     final int hash = hashCode;
>>>     if (hash == 0) {
>>>         if (count == 0) {
>>>             return 0;
>>>         }
>>>         for (int i = offset; i < count + offset; i++) {
>>>             hash = value[i] + ((hash << 5) - hash);
>>>         }
>>>         hashCode = hash;
>> 
>> one more 'return hash' here, please :)
>
> Why?

argh, what was I thinking about? the original Nathan's code is great.

-- 
Egor Pasko


Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Tim Ellison <t....@gmail.com>.
On 12/Dec/2009 10:54, Egor Pasko wrote:
> On the 0x685 day of Apache Harmony Nathan Beyer wrote:
<snip>
>> In any case, it does seem a pinch more efficient to only do one read
>> of hashCode ... switch up the code to be something like this.
>>
>> public int hashCode() {
>>     final int hash = hashCode;
>>     if (hash == 0) {
>>         if (count == 0) {
>>             return 0;
>>         }
>>         for (int i = offset; i < count + offset; i++) {
>>             hash = value[i] + ((hash << 5) - hash);
>>         }
>>         hashCode = hash;
> 
> one more 'return hash' here, please :)

Why?

>>     }
>>     return hash;
>> }


Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Egor Pasko <eg...@gmail.com>.
On the 0x685 day of Apache Harmony Nathan Beyer wrote:
> On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <t....@gmail.com> wrote:
>> On 11/Dec/2009 14:32, Egor Pasko wrote:
>>> On the 0x684 day of Apache Harmony Tim Ellison wrote:
>>>> On 11/Dec/2009 04:09, Vijay Menon wrote:
>>>>> Perhaps I'm missing some context, but I don't see any problem.  I don't
>>>>> believe that this:
>>>>>
>>>>>         if (hashCode == 0) {
>>>>>             // calculate hash
>>>>>             hashCode = hash;
>>>>>         }
>>>>>         return hashCode;
>>>>>
>>>>> can ever return 0 (assuming hash is non-zero), regardless of memory fences.
>>>>>  The JMM won't allow visible reordering in a single threaded program.
>>>> I agree.  In the multi-threaded case there can be a data race on the
>>>> hashCode, with the effect that the same hashCode value is unnecessarily,
>>>> but harmlessly, recalculated.
>>>
>>> Vijay, Tim, you are not 100% correct here.
>>>
>>> 1. there should be an extra restriction that the part "calculate hash"
>>>    does not touch the hashCode field. Without that restriction more
>>>    trivial races can happen as discussed in LANG-481.
>>>
>>> So we should assume this code:
>>>
>>> if (this.hashCode == 0) {
>>>   int hash;
>>>   if (this.hashCode == 0) {
>>>     // Calculate using 'hash' only, not this.hashCode.
>>>     this.hashCode = hash;
>>>   }
>>>   return this.hashCode;
>>> }
>>
>> Yes, I guess I figured that was assumed :-)
>>
>> Of course, there are lots of things you could put into the
>> "// Calculate..." section that would be unsafe.  We should stick with
>> showing the non-abbreviated implementation to avoid ambiguity:
>>
>>    public int hashCode() {
>>        if (hashCode == 0) {
>>            if (count == 0) {
>>                return 0;
>>            }
>>            int hash = 0;
>>            for (int i = offset; i < count + offset; i++) {
>>                hash = value[i] + ((hash << 5) - hash);
>>            }
>>            hashCode = hash;
>>        }
>>        return hashCode;
>>    }
>>
>
> I think I understand the concern, after some additional reading. The
> issue seems to be that 'hashCode' is read twice and the field is not
> protected by any memory barriers (synchronized, volatile, etc). As
> such, it would be okay for the second read to be done using a cached
> value, which means that both reads could return 0 in the same thread
> of execution. Another way to look at it is that the write to
> 'hashCode' may or may not affect subsequent reads of 'hashCode'. This
> is how I understand it.
>
> Will that happen in practice? I have no idea. It does seem possible.
>
> In any case, it does seem a pinch more efficient to only do one read
> of hashCode ... switch up the code to be something like this.
>
> public int hashCode() {
>     final int hash = hashCode;
>     if (hash == 0) {
>         if (count == 0) {
>             return 0;
>         }
>         for (int i = offset; i < count + offset; i++) {
>             hash = value[i] + ((hash << 5) - hash);
>         }
>         hashCode = hash;

one more 'return hash' here, please :)

>     }
>     return hash;
> }
>
> Thoughts?

-- 
Egor Pasko


Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Nathan Beyer <nd...@apache.org>.
On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <t....@gmail.com> wrote:
> On 11/Dec/2009 14:32, Egor Pasko wrote:
>> On the 0x684 day of Apache Harmony Tim Ellison wrote:
>>> On 11/Dec/2009 04:09, Vijay Menon wrote:
>>>> Perhaps I'm missing some context, but I don't see any problem.  I don't
>>>> believe that this:
>>>>
>>>>         if (hashCode == 0) {
>>>>             // calculate hash
>>>>             hashCode = hash;
>>>>         }
>>>>         return hashCode;
>>>>
>>>> can ever return 0 (assuming hash is non-zero), regardless of memory fences.
>>>>  The JMM won't allow visible reordering in a single threaded program.
>>> I agree.  In the multi-threaded case there can be a data race on the
>>> hashCode, with the effect that the same hashCode value is unnecessarily,
>>> but harmlessly, recalculated.
>>
>> Vijay, Tim, you are not 100% correct here.
>>
>> 1. there should be an extra restriction that the part "calculate hash"
>>    does not touch the hashCode field. Without that restriction more
>>    trivial races can happen as discussed in LANG-481.
>>
>> So we should assume this code:
>>
>> if (this.hashCode == 0) {
>>   int hash;
>>   if (this.hashCode == 0) {
>>     // Calculate using 'hash' only, not this.hashCode.
>>     this.hashCode = hash;
>>   }
>>   return this.hashCode;
>> }
>
> Yes, I guess I figured that was assumed :-)
>
> Of course, there are lots of things you could put into the
> "// Calculate..." section that would be unsafe.  We should stick with
> showing the non-abbreviated implementation to avoid ambiguity:
>
>    public int hashCode() {
>        if (hashCode == 0) {
>            if (count == 0) {
>                return 0;
>            }
>            int hash = 0;
>            for (int i = offset; i < count + offset; i++) {
>                hash = value[i] + ((hash << 5) - hash);
>            }
>            hashCode = hash;
>        }
>        return hashCode;
>    }
>

I think I understand the concern, after some additional reading. The
issue seems to be that 'hashCode' is read twice and the field is not
protected by any memory barriers (synchronized, volatile, etc). As
such, it would be okay for the second read to be done using a cached
value, which means that both reads could return 0 in the same thread
of execution. Another way to look at it is that the write to
'hashCode' may or may not affect subsequent reads of 'hashCode'. This
is how I understand it.

Will that happen in practice? I have no idea. It does seem possible.

In any case, it does seem a pinch more efficient to only do one read
of hashCode ... switch up the code to be something like this.

public int hashCode() {
    final int hash = hashCode;
    if (hash == 0) {
        if (count == 0) {
            return 0;
        }
        for (int i = offset; i < count + offset; i++) {
            hash = value[i] + ((hash << 5) - hash);
        }
        hashCode = hash;
    }
    return hash;
}

Thoughts?

>> where 'this.*' is always a memory reference while 'hash' is a thread private var.
>>
>> 2. DRLVM JIT indeed does not privatize field access to threads. It
>>    always loads fields from memory in original order. Hence this
>>    potential bug does not affect DRLVM .. now. But potentially it can
>>    start optimizing things this way because current behavior prevents
>>    a bunch of high level optimizations.
>>
>> 3. Jeremy Manson, being an expert in Java Memory Model claims [1] that
>>    such reordering is allowed theoretically. I.e. like this:
>>
>> int hash = this.hashCode;
>> if (this.hashCode == 0) {
>>   hash = this.hashCode = // calculate hash
>> }
>> return hash;
>>
>> This is a correct single-threaded code. What happened here is a
>> lengthy thread privatization of this.hashCode (again, does not happen
>> in DRLVM). That is incorrect in multithreaded environment and needs
>> extra synchronization/volatile/etc.
>>
>> 4. I do not know why a JIT would want to do that, i.e. two sequential
>>    reads from the same memory location. Sounds like a bit synthetic example.
>
> ...at which point a bunch of code probably would go wrong!  So hopefully
> it remains only a theoretical possibility.
>
> Regards,
> Tim
>
>> [1] http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
>>
>

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Tim Ellison <t....@gmail.com>.
On 11/Dec/2009 14:32, Egor Pasko wrote:
> On the 0x684 day of Apache Harmony Tim Ellison wrote:
>> On 11/Dec/2009 04:09, Vijay Menon wrote:
>>> Perhaps I'm missing some context, but I don't see any problem.  I don't
>>> believe that this:
>>>
>>>         if (hashCode == 0) {
>>>             // calculate hash
>>>             hashCode = hash;
>>>         }
>>>         return hashCode;
>>>
>>> can ever return 0 (assuming hash is non-zero), regardless of memory fences.
>>>  The JMM won't allow visible reordering in a single threaded program.
>> I agree.  In the multi-threaded case there can be a data race on the
>> hashCode, with the effect that the same hashCode value is unnecessarily,
>> but harmlessly, recalculated.
> 
> Vijay, Tim, you are not 100% correct here.
> 
> 1. there should be an extra restriction that the part "calculate hash"
>    does not touch the hashCode field. Without that restriction more
>    trivial races can happen as discussed in LANG-481.
> 
> So we should assume this code:
> 
> if (this.hashCode == 0) {
>   int hash;
>   if (this.hashCode == 0) {
>     // Calculate using 'hash' only, not this.hashCode.
>     this.hashCode = hash;
>   }
>   return this.hashCode;
> }

Yes, I guess I figured that was assumed :-)

Of course, there are lots of things you could put into the
"// Calculate..." section that would be unsafe.  We should stick with
showing the non-abbreviated implementation to avoid ambiguity:

    public int hashCode() {
        if (hashCode == 0) {
            if (count == 0) {
                return 0;
            }
            int hash = 0;
            for (int i = offset; i < count + offset; i++) {
                hash = value[i] + ((hash << 5) - hash);
            }
            hashCode = hash;
        }
        return hashCode;
    }

> where 'this.*' is always a memory reference while 'hash' is a thread private var.
> 
> 2. DRLVM JIT indeed does not privatize field access to threads. It
>    always loads fields from memory in original order. Hence this
>    potential bug does not affect DRLVM .. now. But potentially it can
>    start optimizing things this way because current behavior prevents
>    a bunch of high level optimizations.
> 
> 3. Jeremy Manson, being an expert in Java Memory Model claims [1] that
>    such reordering is allowed theoretically. I.e. like this:
> 
> int hash = this.hashCode;
> if (this.hashCode == 0) {
>   hash = this.hashCode = // calculate hash
> }
> return hash;
> 
> This is a correct single-threaded code. What happened here is a
> lengthy thread privatization of this.hashCode (again, does not happen
> in DRLVM). That is incorrect in multithreaded environment and needs
> extra synchronization/volatile/etc.
> 
> 4. I do not know why a JIT would want to do that, i.e. two sequential
>    reads from the same memory location. Sounds like a bit synthetic example.

...at which point a bunch of code probably would go wrong!  So hopefully
it remains only a theoretical possibility.

Regards,
Tim

> [1] http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
> 

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Egor Pasko <eg...@gmail.com>.
On the 0x684 day of Apache Harmony Tim Ellison wrote:
> On 11/Dec/2009 04:09, Vijay Menon wrote:
>> Perhaps I'm missing some context, but I don't see any problem.  I don't
>> believe that this:
>> 
>>         if (hashCode == 0) {
>>             // calculate hash
>>             hashCode = hash;
>>         }
>>         return hashCode;
>> 
>> can ever return 0 (assuming hash is non-zero), regardless of memory fences.
>>  The JMM won't allow visible reordering in a single threaded program.
>
> I agree.  In the multi-threaded case there can be a data race on the
> hashCode, with the effect that the same hashCode value is unnecessarily,
> but harmlessly, recalculated.

Vijay, Tim, you are not 100% correct here.

1. there should be an extra restriction that the part "calculate hash"
   does not touch the hashCode field. Without that restriction more
   trivial races can happen as discussed in LANG-481.

So we should assume this code:

if (this.hashCode == 0) {
  int hash;
  if (this.hashCode == 0) {
    // Calculate using 'hash' only, not this.hashCode.
    this.hashCode = hash;
  }
  return this.hashCode;
}

where 'this.*' is always a memory reference while 'hash' is a thread private var.

2. DRLVM JIT indeed does not privatize field access to threads. It
   always loads fields from memory in original order. Hence this
   potential bug does not affect DRLVM .. now. But potentially it can
   start optimizing things this way because current behavior prevents
   a bunch of high level optimizations.

3. Jeremy Manson, being an expert in Java Memory Model claims [1] that
   such reordering is allowed theoretically. I.e. like this:

int hash = this.hashCode;
if (this.hashCode == 0) {
  hash = this.hashCode = // calculate hash
}
return hash;

This is a correct single-threaded code. What happened here is a
lengthy thread privatization of this.hashCode (again, does not happen
in DRLVM). That is incorrect in multithreaded environment and needs
extra synchronization/volatile/etc.

4. I do not know why a JIT would want to do that, i.e. two sequential
   reads from the same memory location. Sounds like a bit synthetic example.


[1] http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html

-- 
Egor Pasko


Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by sebb <se...@gmail.com>.
On 11/12/2009, Tim Ellison <t....@gmail.com> wrote:
> On 11/Dec/2009 04:09, Vijay Menon wrote:
>  > Perhaps I'm missing some context, but I don't see any problem.  I don't
>  > believe that this:
>  >
>  >         if (hashCode == 0) {
>  >             // calculate hash
>  >             hashCode = hash;
>  >         }
>  >         return hashCode;
>  >
>  > can ever return 0 (assuming hash is non-zero), regardless of memory fences.
>  >  The JMM won't allow visible reordering in a single threaded program.

However hashCode is shared between threads, so without knowing what
other threads do to hashCode (which is not shown above), it's possible
the hashCode returned by the code might be different from the hash
calculated in this thread. That does not involve re-ordering.

For example, if the calculation stored intermediate results in
hashCode, the return value could be different from the calculated
value.

As far as I can tell, this isn't the case here.

>
> I agree.  In the multi-threaded case there can be a data race on the
>  hashCode, with the effect that the same hashCode value is unnecessarily,
>  but harmlessly, recalculated.

However, it would be safer if the code fetched the hashCode and
returned its local copy - and it might be slightly quicker, as there
would be fewer accesses to the instance variable.

>  Regards,
>
> Tim
>

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Tim Ellison <t....@gmail.com>.
On 11/Dec/2009 04:09, Vijay Menon wrote:
> Perhaps I'm missing some context, but I don't see any problem.  I don't
> believe that this:
> 
>         if (hashCode == 0) {
>             // calculate hash
>             hashCode = hash;
>         }
>         return hashCode;
> 
> can ever return 0 (assuming hash is non-zero), regardless of memory fences.
>  The JMM won't allow visible reordering in a single threaded program.

I agree.  In the multi-threaded case there can be a data race on the
hashCode, with the effect that the same hashCode value is unnecessarily,
but harmlessly, recalculated.

Regards,
Tim

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Posted by Vijay Menon <vs...@google.com>.
Perhaps I'm missing some context, but I don't see any problem.  I don't
believe that this:

        if (hashCode == 0) {
            // calculate hash
            hashCode = hash;
        }
        return hashCode;

can ever return 0 (assuming hash is non-zero), regardless of memory fences.
 The JMM won't allow visible reordering in a single threaded program.

- Vijay

On Thu, Dec 10, 2009 at 7:36 PM, Regis <xu...@gmail.com> wrote:

> On 2009-12-11 4:30, Boris (JIRA) wrote:
>
>> possible data-reordering in some hashCode-methods (e.g. String or URL)
>> ----------------------------------------------------------------------
>>
>>                  Key: HARMONY-6404
>>                  URL: https://issues.apache.org/jira/browse/HARMONY-6404
>>              Project: Harmony
>>           Issue Type: Bug
>>             Reporter: Boris
>>
>>
>> First I have to say that I don't know if the Java Memory Model is relevant
>> for Harmony, so please reject the bug if this is not the case.
>>
>> The current implementation of several of Harmony's classes that try to
>> cache hashCode in an optimized way are affected by a reordering-bug that can
>> occur because of the way the hashCode is cached and retrieved.
>> The problem is that the method contains no memory barriers, so that the VM
>> may reorder the data-reads at its own will. With an unlucky reordering the
>> method could return 0 although the actual hashCode is NOT 0.
>>
>> Or to speak in code: This actual code:
>>         if (hashCode == 0) {
>>             // calculate hash
>>             hashCode = hash;
>>         }
>>         return hashCode;
>> could be reordered to something like
>>         return hashCode;
>>         if (hashCode == 0) {
>>             // calculate hash
>>             hashCode = hash;
>>         }
>> (which is of course no valid Java-code, but that is what the VM might do
>> internally).
>>
>> One common solution is to assign the field but then return the temp-value
>> (in this case the variable "hash") and NOT the field itself. So that the
>> read can not be reordered. (This way might be even faster because it may be
>> one memory-read less)
>> Another solution would be to make hashCode volatile or to not cache the
>> hashCode, but these have a performance cost.
>>
>> I'll attach a patch I wrote. I could not get harmony to compile, so these
>> are untested.
>> BTW: This fix also fixes a more likely bug in BigInteger and BigDecimal:
>> Callers of hashCode might have seen half-constructed hashCodes there.
>>
>> (This bug was found via the entry LANG-481 see there for some more
>> details)
>>
>>
> It's interesting topic. In my understand, re-order would be a issue only if
> hashCode is object reference. Because when "retrun hasCode" the value of
> "hasCode" pushed to stack, if hasCode is primitive type I don't think vm
> would to such re-order - the return value would be always 0, that's
> incorrect.
>
> --
> Best Regards,
> Regis.
>