You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@river.apache.org by Patricia Shanahan <pa...@acm.org> on 2010/11/28 20:09:38 UTC

Possible multi-threading bug

I've found something I think is a problem in 
com.sun.jini.test.impl.outrigger.matching.StressTest, but it does not 
seem to be the problem, or at least not the only problem, causing the 
test hang I'm investigating. It's difficult to test, so I'd like a 
review of my reasoning. This is a question for those who are familiar 
with the Java memory model.

Incidentally, if we went to 1.5 as the oldest supported release, this 
could be replaced by an AtomicInteger.

In the following inner class, I think the methods reset and getCount 
should be synchronized. As the comments note, the operations they 
perform are atomic. My concern is the lack of a happens-before 
relationship between those two methods and the increment method. As far 
as I can tell, there is nothing forcing the change in the counter due to 
an increment to become visible to a getCount call in another thread.

     private class Counter {

         /**
          * Internal counter variable.
          */
         private int _count = 0;

         /**
          * Constructor. Declared to enforce protection level.
          */
         Counter() {

             // Do nothing.
         }

         /**
          * Resets internal counter to zero.
          */
         void reset() {

             // Integer assignment is atomic.
             _count = 0;
         }

         /**
          * Increments internal counter by one.
          */
         synchronized void increment() {
             ++_count;
         }

         /**
          * Returns current value of this <code>Counter</code> object.
          */
         int getCount() {

             // Returning an integer is atomic.
             return _count;
         }
     }

Re: Possible multi-threading bug

Posted by Peter Firmstone <ji...@zeus.net.au>.

+1 Peter.

Patricia Shanahan wrote:
> As I said in the original e-mail "I think the methods reset and getCount
> should be synchronized.".
>
> Either synchronizing both those methods or making _count volatile would
> fix the problem. I don't think performance matters on this class, so the
> simplest solution should win. The increment method is, and has to be,
> synchronized, so we might was well apply the same solution throughout.
>
> If performance did matter, I would still prefer making all the methods
> synchronized. Almost all calls are to increment. Making _count volatile
> would tend to slow down every call to increment by adding memory
> barriers. Making the other methods synchronized only affects increment
> in the rare case of contention with one of the others.
>
> As I've worked on the other problem, I've continued to build empirical
> evidence that this is a real problem. Since I made the changes to
> Counter, I have run the test dozens of times, without seeing the failure
> mode in which everything works but the test fails to notice it has
> finished. Before the fix, that was the commoner failure mode.
>
> The _count variable is part of a much bigger issue. I've been assuming
> that I should not make style fixes as part of bug fixes. Although I
> would have called the variable "count" myself, at least it does not get
> in the way of being able to read the code. I have more trouble with
> inconsistent indentation, to the extent that I sometimes reformat files
> I know I will revert before the next check-in.
>
> Maybe the roadmap should include a minor release at which we just fix
> style, including indentation?
>
> Patricia
>
>
> On 11/29/2010 1:55 AM, Tom Hobbs wrote:
>> Serious comment:
>>
>> Shouldn't reset also be synchronized?  Also, +1 for making _count 
>> volatile.
>>
>> Frivolous comment:
>>
>> Can we drop the underscore prefix for member variables?  It's against
>> the official Sun/Oracle/Whatever conventions.  Also, it drives me up
>> the wall...
>>
>> http://www.oracle.com/technetwork/java/codeconventions-135099.html#367
>>
>>
>> On Sun, Nov 28, 2010 at 11:16 PM, Patricia Shanahan<pa...@acm.org>  
>> wrote:
>>> In the way in which this is used, I expect most of the calls to be to
>>> increment. It has to be synchronized, so it seems simplest to 
>>> synchronize
>>> all the methods.
>>>
>>> I've done some more experiments, and decided this is a real problem. 
>>> As part
>>> of my debug effort, I increased the number of threads in the stress 
>>> test, so
>>> that it fails much more often. I also added some debug printouts, 
>>> one of
>>> which was showing up in conjunction with some but not all failures, 
>>> so I
>>> thought it was irrelevant.
>>>
>>> With the additional synchronization, the debug message shows up in all
>>> failures. I think I actually had two forms of failure, one of which 
>>> is fixed
>>> by the synchronization.  In the failure case that has been fixed, 
>>> everything
>>> works, no debug messages, but the test never admits to having finished,
>>> exactly the symptom I would expect from this issue.
>>>
>>> I plan to check in the enhanced test as well as the fixes, because 
>>> it only
>>> takes a few minutes longer than the current size, and is much better at
>>> finding bugs.
>>>
>>> Patricia
>>>
>>>
>>> On 11/28/2010 2:52 PM, Peter Firmstone wrote:
>>>>
>>>> Well, at the absolute minimum the variable should be volatile, so any
>>>> changes are visible among all threads.
>>>>
>>>> Since increment is the only mutating method, this must be 
>>>> synchronized.
>>>>
>>>> This is a cheap form of multi read, single write threading, 
>>>> although one
>>>> must be careful, as this only works on primitives and immutable object
>>>> references, since only the reference itself is being updated.
>>>>
>>>> If it was a reference to a mutable object (or long), then all methods
>>>> would need to be synchronized.
>>>>
>>>> Cheers,
>>>>
>>>> Peter.
>>>>
>>>> Patricia Shanahan wrote:
>>>>>
>>>>> I've found something I think is a problem in
>>>>> com.sun.jini.test.impl.outrigger.matching.StressTest, but it does not
>>>>> seem to be the problem, or at least not the only problem, causing the
>>>>> test hang I'm investigating. It's difficult to test, so I'd like a
>>>>> review of my reasoning. This is a question for those who are familiar
>>>>> with the Java memory model.
>>>>>
>>>>> Incidentally, if we went to 1.5 as the oldest supported release, this
>>>>> could be replaced by an AtomicInteger.
>>>>>
>>>>> In the following inner class, I think the methods reset and getCount
>>>>> should be synchronized. As the comments note, the operations they
>>>>> perform are atomic. My concern is the lack of a happens-before
>>>>> relationship between those two methods and the increment method. As
>>>>> far as I can tell, there is nothing forcing the change in the counter
>>>>> due to an increment to become visible to a getCount call in another
>>>>> thread.
>>>>>
>>>>> private class Counter {
>>>>>
>>>>> /**
>>>>> * Internal counter variable.
>>>>> */
>>>>> private int _count = 0;
>>>>>
>>>>> /**
>>>>> * Constructor. Declared to enforce protection level.
>>>>> */
>>>>> Counter() {
>>>>>
>>>>> // Do nothing.
>>>>> }
>>>>>
>>>>> /**
>>>>> * Resets internal counter to zero.
>>>>> */
>>>>> void reset() {
>>>>>
>>>>> // Integer assignment is atomic.
>>>>> _count = 0;
>>>>> }
>>>>>
>>>>> /**
>>>>> * Increments internal counter by one.
>>>>> */
>>>>> synchronized void increment() {
>>>>> ++_count;
>>>>> }
>>>>>
>>>>> /**
>>>>> * Returns current value of this<code>Counter</code>  object.
>>>>> */
>>>>> int getCount() {
>>>>>
>>>>> // Returning an integer is atomic.
>>>>> return _count;
>>>>> }
>>>>> }
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>

Re: Possible multi-threading bug

Posted by Sim IJskes - QCG <si...@qcg.nl>.

On 11/29/2010 11:54 AM, Tom Hobbs wrote:
> and is no means perfect in all cases.  Synchronizing the methods makes
> this less-difficult to make mistakes with, so given that performance
> isn't an issue I'd got for that rather than synchronizing on _count.

Even if performance is an issue, "less-difficult to make mistakes with" 
has a very high priority for me.

Re: Possible multi-threading bug

Posted by Tom Hobbs <tv...@googlemail.com>.

> As I said in the original e-mail "I think the methods reset and getCount
> should be synchronized.".

Yes, you're right.  Sorry, my mistake.

I've just finished "Java Concurrency in Practice" which means I can
remember just enough to be dangerous, but not quite enough to
fool-proof.  "volatile" is referred to as a 'poor man synchronisation'
and is no means perfect in all cases.  Synchronizing the methods makes
this less-difficult to make mistakes with, so given that performance
isn't an issue I'd got for that rather than synchronizing on _count.

Regarding format and layout, I agree that the poor formatting makes
reading source files far more difficult than the odd underscore
creeping in.  My approach is that when I come across such a file,
before I do anything else to it, I reformat and commit the file back
in with a "reformat" comment.  Most diff tools can be convinced to
ignore whitespace when comparing different versions.  I find the only
time changing the layout becomes a problem is when a single commit
contains both layout and code changes.

That's just my opinion though.

On Mon, Nov 29, 2010 at 10:45 AM, Patricia Shanahan <pa...@acm.org> wrote:
> As I said in the original e-mail "I think the methods reset and getCount
> should be synchronized.".
>
> Either synchronizing both those methods or making _count volatile would
> fix the problem. I don't think performance matters on this class, so the
> simplest solution should win. The increment method is, and has to be,
> synchronized, so we might was well apply the same solution throughout.
>
> If performance did matter, I would still prefer making all the methods
> synchronized. Almost all calls are to increment. Making _count volatile
> would tend to slow down every call to increment by adding memory
> barriers. Making the other methods synchronized only affects increment
> in the rare case of contention with one of the others.
>
> As I've worked on the other problem, I've continued to build empirical
> evidence that this is a real problem. Since I made the changes to
> Counter, I have run the test dozens of times, without seeing the failure
> mode in which everything works but the test fails to notice it has
> finished. Before the fix, that was the commoner failure mode.
>
> The _count variable is part of a much bigger issue. I've been assuming
> that I should not make style fixes as part of bug fixes. Although I
> would have called the variable "count" myself, at least it does not get
> in the way of being able to read the code. I have more trouble with
> inconsistent indentation, to the extent that I sometimes reformat files
> I know I will revert before the next check-in.
>
> Maybe the roadmap should include a minor release at which we just fix
> style, including indentation?
>
> Patricia
>
>
> On 11/29/2010 1:55 AM, Tom Hobbs wrote:
>>
>> Serious comment:
>>
>> Shouldn't reset also be synchronized?  Also, +1 for making _count
>> volatile.
>>
>> Frivolous comment:
>>
>> Can we drop the underscore prefix for member variables?  It's against
>> the official Sun/Oracle/Whatever conventions.  Also, it drives me up
>> the wall...
>>
>> http://www.oracle.com/technetwork/java/codeconventions-135099.html#367
>>
>>
>> On Sun, Nov 28, 2010 at 11:16 PM, Patricia Shanahan<pa...@acm.org>  wrote:
>>>
>>> In the way in which this is used, I expect most of the calls to be to
>>> increment. It has to be synchronized, so it seems simplest to synchronize
>>> all the methods.
>>>
>>> I've done some more experiments, and decided this is a real problem. As
>>> part
>>> of my debug effort, I increased the number of threads in the stress test,
>>> so
>>> that it fails much more often. I also added some debug printouts, one of
>>> which was showing up in conjunction with some but not all failures, so I
>>> thought it was irrelevant.
>>>
>>> With the additional synchronization, the debug message shows up in all
>>> failures. I think I actually had two forms of failure, one of which is
>>> fixed
>>> by the synchronization.  In the failure case that has been fixed,
>>> everything
>>> works, no debug messages, but the test never admits to having finished,
>>> exactly the symptom I would expect from this issue.
>>>
>>> I plan to check in the enhanced test as well as the fixes, because it
>>> only
>>> takes a few minutes longer than the current size, and is much better at
>>> finding bugs.
>>>
>>> Patricia
>>>
>>>
>>> On 11/28/2010 2:52 PM, Peter Firmstone wrote:
>>>>
>>>> Well, at the absolute minimum the variable should be volatile, so any
>>>> changes are visible among all threads.
>>>>
>>>> Since increment is the only mutating method, this must be synchronized.
>>>>
>>>> This is a cheap form of multi read, single write threading, although one
>>>> must be careful, as this only works on primitives and immutable object
>>>> references, since only the reference itself is being updated.
>>>>
>>>> If it was a reference to a mutable object (or long), then all methods
>>>> would need to be synchronized.
>>>>
>>>> Cheers,
>>>>
>>>> Peter.
>>>>
>>>> Patricia Shanahan wrote:
>>>>>
>>>>> I've found something I think is a problem in
>>>>> com.sun.jini.test.impl.outrigger.matching.StressTest, but it does not
>>>>> seem to be the problem, or at least not the only problem, causing the
>>>>> test hang I'm investigating. It's difficult to test, so I'd like a
>>>>> review of my reasoning. This is a question for those who are familiar
>>>>> with the Java memory model.
>>>>>
>>>>> Incidentally, if we went to 1.5 as the oldest supported release, this
>>>>> could be replaced by an AtomicInteger.
>>>>>
>>>>> In the following inner class, I think the methods reset and getCount
>>>>> should be synchronized. As the comments note, the operations they
>>>>> perform are atomic. My concern is the lack of a happens-before
>>>>> relationship between those two methods and the increment method. As
>>>>> far as I can tell, there is nothing forcing the change in the counter
>>>>> due to an increment to become visible to a getCount call in another
>>>>> thread.
>>>>>
>>>>> private class Counter {
>>>>>
>>>>> /**
>>>>> * Internal counter variable.
>>>>> */
>>>>> private int _count = 0;
>>>>>
>>>>> /**
>>>>> * Constructor. Declared to enforce protection level.
>>>>> */
>>>>> Counter() {
>>>>>
>>>>> // Do nothing.
>>>>> }
>>>>>
>>>>> /**
>>>>> * Resets internal counter to zero.
>>>>> */
>>>>> void reset() {
>>>>>
>>>>> // Integer assignment is atomic.
>>>>> _count = 0;
>>>>> }
>>>>>
>>>>> /**
>>>>> * Increments internal counter by one.
>>>>> */
>>>>> synchronized void increment() {
>>>>> ++_count;
>>>>> }
>>>>>
>>>>> /**
>>>>> * Returns current value of this<code>Counter</code>  object.
>>>>> */
>>>>> int getCount() {
>>>>>
>>>>> // Returning an integer is atomic.
>>>>> return _count;
>>>>> }
>>>>> }
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>

Re: Possible multi-threading bug

Posted by Patricia Shanahan <pa...@acm.org>.

As I said in the original e-mail "I think the methods reset and getCount
should be synchronized.".

Either synchronizing both those methods or making _count volatile would
fix the problem. I don't think performance matters on this class, so the
simplest solution should win. The increment method is, and has to be,
synchronized, so we might was well apply the same solution throughout.

If performance did matter, I would still prefer making all the methods
synchronized. Almost all calls are to increment. Making _count volatile
would tend to slow down every call to increment by adding memory
barriers. Making the other methods synchronized only affects increment
in the rare case of contention with one of the others.

As I've worked on the other problem, I've continued to build empirical
evidence that this is a real problem. Since I made the changes to
Counter, I have run the test dozens of times, without seeing the failure
mode in which everything works but the test fails to notice it has
finished. Before the fix, that was the commoner failure mode.

The _count variable is part of a much bigger issue. I've been assuming
that I should not make style fixes as part of bug fixes. Although I
would have called the variable "count" myself, at least it does not get
in the way of being able to read the code. I have more trouble with
inconsistent indentation, to the extent that I sometimes reformat files
I know I will revert before the next check-in.

Maybe the roadmap should include a minor release at which we just fix
style, including indentation?

Patricia


On 11/29/2010 1:55 AM, Tom Hobbs wrote:
> Serious comment:
>
> Shouldn't reset also be synchronized?  Also, +1 for making _count volatile.
>
> Frivolous comment:
>
> Can we drop the underscore prefix for member variables?  It's against
> the official Sun/Oracle/Whatever conventions.  Also, it drives me up
> the wall...
>
> http://www.oracle.com/technetwork/java/codeconventions-135099.html#367
>
>
> On Sun, Nov 28, 2010 at 11:16 PM, Patricia Shanahan<pa...@acm.org>  wrote:
>> In the way in which this is used, I expect most of the calls to be to
>> increment. It has to be synchronized, so it seems simplest to synchronize
>> all the methods.
>>
>> I've done some more experiments, and decided this is a real problem. As part
>> of my debug effort, I increased the number of threads in the stress test, so
>> that it fails much more often. I also added some debug printouts, one of
>> which was showing up in conjunction with some but not all failures, so I
>> thought it was irrelevant.
>>
>> With the additional synchronization, the debug message shows up in all
>> failures. I think I actually had two forms of failure, one of which is fixed
>> by the synchronization.  In the failure case that has been fixed, everything
>> works, no debug messages, but the test never admits to having finished,
>> exactly the symptom I would expect from this issue.
>>
>> I plan to check in the enhanced test as well as the fixes, because it only
>> takes a few minutes longer than the current size, and is much better at
>> finding bugs.
>>
>> Patricia
>>
>>
>> On 11/28/2010 2:52 PM, Peter Firmstone wrote:
>>>
>>> Well, at the absolute minimum the variable should be volatile, so any
>>> changes are visible among all threads.
>>>
>>> Since increment is the only mutating method, this must be synchronized.
>>>
>>> This is a cheap form of multi read, single write threading, although one
>>> must be careful, as this only works on primitives and immutable object
>>> references, since only the reference itself is being updated.
>>>
>>> If it was a reference to a mutable object (or long), then all methods
>>> would need to be synchronized.
>>>
>>> Cheers,
>>>
>>> Peter.
>>>
>>> Patricia Shanahan wrote:
>>>>
>>>> I've found something I think is a problem in
>>>> com.sun.jini.test.impl.outrigger.matching.StressTest, but it does not
>>>> seem to be the problem, or at least not the only problem, causing the
>>>> test hang I'm investigating. It's difficult to test, so I'd like a
>>>> review of my reasoning. This is a question for those who are familiar
>>>> with the Java memory model.
>>>>
>>>> Incidentally, if we went to 1.5 as the oldest supported release, this
>>>> could be replaced by an AtomicInteger.
>>>>
>>>> In the following inner class, I think the methods reset and getCount
>>>> should be synchronized. As the comments note, the operations they
>>>> perform are atomic. My concern is the lack of a happens-before
>>>> relationship between those two methods and the increment method. As
>>>> far as I can tell, there is nothing forcing the change in the counter
>>>> due to an increment to become visible to a getCount call in another
>>>> thread.
>>>>
>>>> private class Counter {
>>>>
>>>> /**
>>>> * Internal counter variable.
>>>> */
>>>> private int _count = 0;
>>>>
>>>> /**
>>>> * Constructor. Declared to enforce protection level.
>>>> */
>>>> Counter() {
>>>>
>>>> // Do nothing.
>>>> }
>>>>
>>>> /**
>>>> * Resets internal counter to zero.
>>>> */
>>>> void reset() {
>>>>
>>>> // Integer assignment is atomic.
>>>> _count = 0;
>>>> }
>>>>
>>>> /**
>>>> * Increments internal counter by one.
>>>> */
>>>> synchronized void increment() {
>>>> ++_count;
>>>> }
>>>>
>>>> /**
>>>> * Returns current value of this<code>Counter</code>  object.
>>>> */
>>>> int getCount() {
>>>>
>>>> // Returning an integer is atomic.
>>>> return _count;
>>>> }
>>>> }
>>>>
>>>
>>>
>>
>>
>

Re: Possible multi-threading bug

Posted by Sim IJskes - QCG <si...@qcg.nl>.

On 11/29/2010 10:55 AM, Tom Hobbs wrote:
> Can we drop the underscore prefix for member variables?  It's against
> the official Sun/Oracle/Whatever conventions.  Also, it drives me up
> the wall...

+1

Re: Possible multi-threading bug

Posted by Patricia Shanahan <pa...@acm.org>.

Real world integrator with the highest performance expectations is 
exactly the sort of person I think we should be talking with about 
getting real world data on River bottlenecks.

Any input? Possibilities for e.g. running with performance-specific logging?

Patricia


Mike McGrady wrote:
> I just want to add that as a "real world" integrator with the highest
> performance expectations, I rely more on the architecture and design
> of systems to solve scalability problems.  A poor design can be made
> to scale but there are two scaling ceilings: not only performance but
> also cost.
> 
> Sent from my iPhone
> 
> Michael McGrady Principal investigator AF081_028 SBIR Chief Architect
>  Topia Technology, Inc Work 1.253.572.9712 Cel 1.253.720.3365
> 
> On Nov 30, 2010, at 3:59 AM, Patricia Shanahan <pa...@acm.org> wrote:
> 
>> On 11/30/2010 1:43 AM, Peter Firmstone wrote:
>>> Patricia Shanahan wrote:
>>>> Tom Hobbs wrote:
>>>>> Yes, you're right.
>>>>> 
>>>>> I knew about the non-atomicity of ++, my concern was a call
>>>>> to reset creeping in between the two parts of that operation.
>>>>> 
>>>> That is a very good point.
>>>> 
>>>> Leaving reset unsynchronized, even with volatile, would lead to
>>>>  results that would not be possible with full synchronization.
>>>> Suppose thread A is going to do a reset, and thread B is going
>>>> to do an increment, and everybody agrees the count is currently
>>>> 10.
>> ....
>>> Ah yes, correct, my mistake, easy to stuff up isn't it? ;)
>>> 
>>> In essence I agree, unfortunately we don't know when we need the 
>>> performance, because we can't test scalability. I've only got 4
>>> threads!
>> I've only got 8 threads on my largest system.
>> 
>> I am very, very strongly opposed to attempting performance tuning
>> without measurement. I've seen it tried many times over several
>> decades, and it is always a disaster. I've compared e.g. estimates
>> of where bottlenecks will be in an operating system prepared by the
>> OS developers to actual measurements, and I've yet to see the
>> developers get it right. That includes my own efforts at guessing
>> bottlenecks, before I learned that it is futile.
>> 
>> Without measurement, you get either get focused effort in entirely
>> the wrong places or diffuse effort spread over the whole system.
>> What is really needed is focused effort on a few real bottlenecks
>> that will go way beyond any set of rules that could reasonably be
>> applied throughout the system.
>> 
>> I believe the solution is to establish a working relationship with
>> users of River who have larger systems. They have a vested interest
>> in helping us make it more scalable.
>> 
>>> Here are some simple tips I've found useful:
>> Have you measured the effectiveness of these tips on real world
>> scalability? What sort of gain do you see?
>> 
>> There may also be opportunities in the area of data structure and
>> algorithm scalability. For example, TaskManager uses an ArrayList
>> to represent something that is basically a FIFO with some
>> reordering. That is a good idea if, and only if, the queue lengths
>> are always very short even on large systems. However, I have no
>> idea whether TaskManager queue lengths tend to increase on large
>> systems or not.
>> 
>>> 1. If all mutators are atomic and don't depend on previous state,
>>> a volatile reference or field may be sufficient, but now we have 
>>> concurrency utilities, why not use an atomic reference or field 
>>> instead? Then if we find we later need a method based on previous
>>>  state, it's easily proven correct.
>> Do we have the concurrency utilities? The java.util.concurrent
>> packages are all "Since 1.5", and some of their classes are "Since
>> 1.6". We can only use them if we are abandoning any chance of River
>> running with a 1.4 rt.jar.
>> 
>> To my mind, the advances in java.util and its sub-packages are a
>> really strong motivation for getting to 1.5 or, better, 1.6 ASAP.
>> 
>> Currently, a quick grep indicates java.util.concurrent is only used
>> in ./qa/src/com/sun/jini/qa/harness/HeartOfTheMachine.java, which
>> is part of the QA infrastructure, not the production system.
>> 
>> ...
>> 
>>> Have you got any examples of a formal proof of correctness? Just
>>> out of curiosity?
>> Unfortunately, the proofs of correctness I've written that related
>> to concurrency were based on confidential data and written on the
>> job when I was working as a performance architect for Cray Research
>> and Sun Microsystems.
>> 
>> I do have some coursework proofs from the UCSD Design of Algorithms
>> graduate course. I'll dig out an example to send to you directly.
>> 
>> Proof of correctness of large systems is a research topic. I'm
>> talking about looking very narrowly at a class that has been shown
>> to be performance-critical and is being subjected to special
>> performance tuning that uses risky techniques such as volatile.
>> 
>> Patricia
>> 
>

Re: Possible multi-threading bug

Posted by Mike McGrady <mm...@topiatechnology.com>.

I just want to add that as a "real world" integrator with the highest performance expectations, I rely more on the architecture and design of systems to solve scalability problems.  A poor design can be made to scale but there are two scaling ceilings: not only performance but also cost.

Sent from my iPhone

Michael McGrady
Principal investigator AF081_028 SBIR
Chief Architect
Topia Technology, Inc
Work 1.253.572.9712
Cel 1.253.720.3365

On Nov 30, 2010, at 3:59 AM, Patricia Shanahan <pa...@acm.org> wrote:

> On 11/30/2010 1:43 AM, Peter Firmstone wrote:
>> Patricia Shanahan wrote:
>>> Tom Hobbs wrote:
>>>> Yes, you're right.
>>>> 
>>>> I knew about the non-atomicity of ++, my concern was a call to reset
>>>> creeping in between the two parts of that operation.
>>> 
>>> That is a very good point.
>>> 
>>> Leaving reset unsynchronized, even with volatile, would lead to
>>> results that would not be possible with full synchronization. Suppose
>>> thread A is going to do a reset, and thread B is going to do an
>>> increment, and everybody agrees the count is currently 10.
> ....
>> Ah yes, correct, my mistake, easy to stuff up isn't it? ;)
>> 
>> In essence I agree, unfortunately we don't know when we need the
>> performance, because we can't test scalability. I've only got 4 threads!
> 
> I've only got 8 threads on my largest system.
> 
> I am very, very strongly opposed to attempting performance tuning without measurement. I've seen it tried many times over several decades, and it is always a disaster. I've compared e.g. estimates of where bottlenecks will be in an operating system prepared by the OS developers to actual measurements, and I've yet to see the developers get it right. That includes my own efforts at guessing bottlenecks, before I learned that it is futile.
> 
> Without measurement, you get either get focused effort in entirely the wrong places or diffuse effort spread over the whole system. What is really needed is focused effort on a few real bottlenecks that will go way beyond any set of rules that could reasonably be applied throughout the system.
> 
> I believe the solution is to establish a working relationship with users of River who have larger systems. They have a vested interest in helping us make it more scalable.
> 
>> Here are some simple tips I've found useful:
> 
> Have you measured the effectiveness of these tips on real world scalability? What sort of gain do you see?
> 
> There may also be opportunities in the area of data structure and algorithm scalability. For example, TaskManager uses an ArrayList to represent something that is basically a FIFO with some reordering. That is a good idea if, and only if, the queue lengths are always very short even on large systems. However, I have no idea whether TaskManager queue lengths tend to increase on large systems or not.
> 
>>  1. If all mutators are atomic and don't depend on previous state, a
>>     volatile reference or field may be sufficient, but now we have
>>     concurrency utilities, why not use an atomic reference or field
>>     instead? Then if we find we later need a method based on previous
>>     state, it's easily proven correct.
> 
> Do we have the concurrency utilities? The java.util.concurrent packages are all "Since 1.5", and some of their classes are "Since 1.6". We can only use them if we are abandoning any chance of River running with a 1.4 rt.jar.
> 
> To my mind, the advances in java.util and its sub-packages are a really strong motivation for getting to 1.5 or, better, 1.6 ASAP.
> 
> Currently, a quick grep indicates java.util.concurrent is only used in ./qa/src/com/sun/jini/qa/harness/HeartOfTheMachine.java, which is part of the QA infrastructure, not the production system.
> 
> ...
> 
>> Have you got any examples of a formal proof of correctness? Just out of
>> curiosity?
> 
> Unfortunately, the proofs of correctness I've written that related to concurrency were based on confidential data and written on the job when I was working as a performance architect for Cray Research and Sun Microsystems.
> 
> I do have some coursework proofs from the UCSD Design of Algorithms graduate course. I'll dig out an example to send to you directly.
> 
> Proof of correctness of large systems is a research topic. I'm talking about looking very narrowly at a class that has been shown to be performance-critical and is being subjected to special performance tuning that uses risky techniques such as volatile.
> 
> Patricia
>

Re: Possible multi-threading bug

Posted by Mike McGrady <mm...@topiatechnology.com>.

Still checking but stopped to note a potential further restriction in our case is the use of Java RT, which is presently, the last I heard, at Java 1.5.

Sent from my iPhone

Michael McGrady
Principal investigator AF081_028 SBIR
Chief Architect
Topia Technology, Inc
Work 1.253.572.9712
Cel 1.253.720.3365

On Nov 30, 2010, at 7:24 AM, Dennis Reedy <de...@gmail.com> wrote:

> AFAIK, 1.6 is on every machine (Army Windows Gold master release and RHEL as well). Additionally 1.5 seems to be the way out
> 
> HTH
> 
> Dennis
> 
> On Nov 30, 2010, at 954AM, Patricia Shanahan wrote:
> 
>> Mike McGrady wrote:
>>> I think that in large scale military systems that Java is only 1,5
>>> approved, but I need to recheck that.  If so, this is becoming the
>>> next new large customer, in addition to financial systems, I think.
>> 
>> In any case, that would cause some software developers who might have products based on River to stick to 1.5, just in case they have an opportunity for a large scale military contract.
>> 
>> Can you check, and report back to the mailing list?
>> 
>> Thanks,
>> 
>> Patricia
>

Re: Possible multi-threading bug

Posted by Dennis Reedy <de...@gmail.com>.

AFAIK, 1.6 is on every machine (Army Windows Gold master release and RHEL as well). Additionally 1.5 seems to be the way out

HTH

Dennis

On Nov 30, 2010, at 954AM, Patricia Shanahan wrote:

> Mike McGrady wrote:
>> I think that in large scale military systems that Java is only 1,5
>> approved, but I need to recheck that.  If so, this is becoming the
>> next new large customer, in addition to financial systems, I think.
> 
> In any case, that would cause some software developers who might have products based on River to stick to 1.5, just in case they have an opportunity for a large scale military contract.
> 
> Can you check, and report back to the mailing list?
> 
> Thanks,
> 
> Patricia

Re: Possible multi-threading bug

Posted by Mike McGrady <mm...@topiatechnology.com>.

I will

Sent from my iPhone

Michael McGrady
Principal investigator AF081_028 SBIR
Chief Architect
Topia Technology, Inc
Work 1.253.572.9712
Cel 1.253.720.3365

On Nov 30, 2010, at 6:54 AM, Patricia Shanahan <pa...@acm.org> wrote:

> Mike McGrady wrote:
>> I think that in large scale military systems that Java is only 1,5
>> approved, but I need to recheck that.  If so, this is becoming the
>> next new large customer, in addition to financial systems, I think.
> 
> In any case, that would cause some software developers who might have products based on River to stick to 1.5, just in case they have an opportunity for a large scale military contract.
> 
> Can you check, and report back to the mailing list?
> 
> Thanks,
> 
> Patricia

Re: Possible multi-threading bug

Posted by Patricia Shanahan <pa...@acm.org>.

Mike McGrady wrote:
> I think that in large scale military systems that Java is only 1,5
> approved, but I need to recheck that.  If so, this is becoming the
> next new large customer, in addition to financial systems, I think.

In any case, that would cause some software developers who might have 
products based on River to stick to 1.5, just in case they have an 
opportunity for a large scale military contract.

Can you check, and report back to the mailing list?

Thanks,

Patricia

Re: Possible multi-threading bug

Posted by Mike McGrady <mm...@topiatechnology.com>.

I think that in large scale military systems that Java is only 1,5 approved, but I need to recheck that.  If so, this is becoming the next new large customer, in addition to financial systems, I think.  

Sent from my iPhone

Michael McGrady
Principal investigator AF081_028 SBIR
Chief Architect
Topia Technology, Inc
Work 1.253.572.9712
Cel 1.253.720.3365

On Nov 30, 2010, at 3:59 AM, Patricia Shanahan <pa...@acm.org> wrote:

> On 11/30/2010 1:43 AM, Peter Firmstone wrote:
>> Patricia Shanahan wrote:
>>> Tom Hobbs wrote:
>>>> Yes, you're right.
>>>> 
>>>> I knew about the non-atomicity of ++, my concern was a call to reset
>>>> creeping in between the two parts of that operation.
>>> 
>>> That is a very good point.
>>> 
>>> Leaving reset unsynchronized, even with volatile, would lead to
>>> results that would not be possible with full synchronization. Suppose
>>> thread A is going to do a reset, and thread B is going to do an
>>> increment, and everybody agrees the count is currently 10.
> ....
>> Ah yes, correct, my mistake, easy to stuff up isn't it? ;)
>> 
>> In essence I agree, unfortunately we don't know when we need the
>> performance, because we can't test scalability. I've only got 4 threads!
> 
> I've only got 8 threads on my largest system.
> 
> I am very, very strongly opposed to attempting performance tuning without measurement. I've seen it tried many times over several decades, and it is always a disaster. I've compared e.g. estimates of where bottlenecks will be in an operating system prepared by the OS developers to actual measurements, and I've yet to see the developers get it right. That includes my own efforts at guessing bottlenecks, before I learned that it is futile.
> 
> Without measurement, you get either get focused effort in entirely the wrong places or diffuse effort spread over the whole system. What is really needed is focused effort on a few real bottlenecks that will go way beyond any set of rules that could reasonably be applied throughout the system.
> 
> I believe the solution is to establish a working relationship with users of River who have larger systems. They have a vested interest in helping us make it more scalable.
> 
>> Here are some simple tips I've found useful:
> 
> Have you measured the effectiveness of these tips on real world scalability? What sort of gain do you see?
> 
> There may also be opportunities in the area of data structure and algorithm scalability. For example, TaskManager uses an ArrayList to represent something that is basically a FIFO with some reordering. That is a good idea if, and only if, the queue lengths are always very short even on large systems. However, I have no idea whether TaskManager queue lengths tend to increase on large systems or not.
> 
>>  1. If all mutators are atomic and don't depend on previous state, a
>>     volatile reference or field may be sufficient, but now we have
>>     concurrency utilities, why not use an atomic reference or field
>>     instead? Then if we find we later need a method based on previous
>>     state, it's easily proven correct.
> 
> Do we have the concurrency utilities? The java.util.concurrent packages are all "Since 1.5", and some of their classes are "Since 1.6". We can only use them if we are abandoning any chance of River running with a 1.4 rt.jar.
> 
> To my mind, the advances in java.util and its sub-packages are a really strong motivation for getting to 1.5 or, better, 1.6 ASAP.
> 
> Currently, a quick grep indicates java.util.concurrent is only used in ./qa/src/com/sun/jini/qa/harness/HeartOfTheMachine.java, which is part of the QA infrastructure, not the production system.
> 
> ...
> 
>> Have you got any examples of a formal proof of correctness? Just out of
>> curiosity?
> 
> Unfortunately, the proofs of correctness I've written that related to concurrency were based on confidential data and written on the job when I was working as a performance architect for Cray Research and Sun Microsystems.
> 
> I do have some coursework proofs from the UCSD Design of Algorithms graduate course. I'll dig out an example to send to you directly.
> 
> Proof of correctness of large systems is a research topic. I'm talking about looking very narrowly at a class that has been shown to be performance-critical and is being subjected to special performance tuning that uses risky techniques such as volatile.
> 
> Patricia
>

Re: Possible multi-threading bug

Posted by Patricia Shanahan <pa...@acm.org>.

On 11/30/2010 1:43 AM, Peter Firmstone wrote:
> Patricia Shanahan wrote:
>> Tom Hobbs wrote:
>>> Yes, you're right.
>>>
>>> I knew about the non-atomicity of ++, my concern was a call to reset
>>> creeping in between the two parts of that operation.
>>
>> That is a very good point.
>>
>> Leaving reset unsynchronized, even with volatile, would lead to
>> results that would not be possible with full synchronization. Suppose
>> thread A is going to do a reset, and thread B is going to do an
>> increment, and everybody agrees the count is currently 10.
....
> Ah yes, correct, my mistake, easy to stuff up isn't it? ;)
>
> In essence I agree, unfortunately we don't know when we need the
> performance, because we can't test scalability. I've only got 4 threads!

I've only got 8 threads on my largest system.

I am very, very strongly opposed to attempting performance tuning 
without measurement. I've seen it tried many times over several decades, 
and it is always a disaster. I've compared e.g. estimates of where 
bottlenecks will be in an operating system prepared by the OS developers 
to actual measurements, and I've yet to see the developers get it right. 
That includes my own efforts at guessing bottlenecks, before I learned 
that it is futile.

Without measurement, you get either get focused effort in entirely the 
wrong places or diffuse effort spread over the whole system. What is 
really needed is focused effort on a few real bottlenecks that will go 
way beyond any set of rules that could reasonably be applied throughout 
the system.

I believe the solution is to establish a working relationship with users 
of River who have larger systems. They have a vested interest in helping 
us make it more scalable.

> Here are some simple tips I've found useful:

Have you measured the effectiveness of these tips on real world 
scalability? What sort of gain do you see?

There may also be opportunities in the area of data structure and 
algorithm scalability. For example, TaskManager uses an ArrayList to 
represent something that is basically a FIFO with some reordering. That 
is a good idea if, and only if, the queue lengths are always very short 
even on large systems. However, I have no idea whether TaskManager queue 
lengths tend to increase on large systems or not.

>   1. If all mutators are atomic and don't depend on previous state, a
>      volatile reference or field may be sufficient, but now we have
>      concurrency utilities, why not use an atomic reference or field
>      instead? Then if we find we later need a method based on previous
>      state, it's easily proven correct.

Do we have the concurrency utilities? The java.util.concurrent packages 
are all "Since 1.5", and some of their classes are "Since 1.6". We can 
only use them if we are abandoning any chance of River running with a 
1.4 rt.jar.

To my mind, the advances in java.util and its sub-packages are a really 
strong motivation for getting to 1.5 or, better, 1.6 ASAP.

Currently, a quick grep indicates java.util.concurrent is only used in 
./qa/src/com/sun/jini/qa/harness/HeartOfTheMachine.java, which is part 
of the QA infrastructure, not the production system.

...

> Have you got any examples of a formal proof of correctness? Just out of
> curiosity?

Unfortunately, the proofs of correctness I've written that related to 
concurrency were based on confidential data and written on the job when 
I was working as a performance architect for Cray Research and Sun 
Microsystems.

I do have some coursework proofs from the UCSD Design of Algorithms 
graduate course. I'll dig out an example to send to you directly.

Proof of correctness of large systems is a research topic. I'm talking 
about looking very narrowly at a class that has been shown to be 
performance-critical and is being subjected to special performance 
tuning that uses risky techniques such as volatile.

Patricia

Re: Possible multi-threading bug

Posted by Peter Firmstone <ji...@zeus.net.au>.

Patricia Shanahan wrote:
> Tom Hobbs wrote:
>> Yes, you're right.
>>
>> I knew about the non-atomicity of ++, my concern was a call to reset
>> creeping in between the two parts of that operation.
>
> That is a very good point.
>
> Leaving reset unsynchronized, even with volatile, would lead to 
> results that would not be possible with full synchronization. Suppose 
> thread A is going to do a reset, and thread B is going to do an 
> increment, and everybody agrees the count is currently 10.
>
> With synchronization, after both operations have completed, count is 
> either 1 (A:reset, B:increment) or 0 (B:increment, A:reset).
>
> With only increment synchronized, even if the the count is volatile, 
> we add the possible outcome 11, which cannot be reached by executing 
> the two methods atomically:
>
> B: volatile read, result 10
> A: volatile write of 0
> B: volatile write of 11
>
> I think it is best to stick to synchronization and the 
> java.util.concurrent classes, ignoring volatile, except in really 
> important performance-critical situations. If we do find we need to 
> use volatile, I would like to take the time to do a formal 
> proof-of-correctness based on the JLS rules.
>
>> I forgot about the "Get a global lock on the variable" that volatile
>> would require as mentioned here;
>> http://www.javaperformancetuning.com/news/qotm051.shtml
>
> I don't think we can depend on anything other than the JLS, or 
> possibly "Java Concurrency in Practice" which seems to tend towards 
> safety if it simplifies the situation at all.
>
> For example, I would be surprised if a SPARC v.9 TSO implementation of 
> volatile needed to do more than a "membar #StoreLoad" between a store 
> to a volatile variable and the next load of anything. Even on a system 
> that needed to do more, any locking would treat the volatile read and 
> the volatile write of the ++ as two separate operations.
>
> Patricia
>

Ah yes, correct, my mistake, easy to stuff up isn't it? ;)

In essence I agree, unfortunately we don't know when we need the 
performance, because we can't test scalability.  I've only got 4 threads!

So we can just use the concurrency utilities if were worried about 
synchronization performance.

Here are some simple tips I've found useful:

   1. If all mutators are atomic and don't depend on previous state, a
      volatile reference or field may be sufficient, but now we have
      concurrency utilities, why not use an atomic reference or field
      instead? Then if we find we later need a method based on previous
      state, it's easily proven correct.
   2. If any mutator depends on previous state, all mutator methods need
      to synchronized on the same lock, volatile is acceptable for
      reads, but now we have ReentrantReadWrite lock, why bother?
   3. If using implicit synchronization, make all methods synchronized,
      otherwise use explicit synchronization instead, it's much clearer
      what your lock protects.
   4. Builders, Immutable objects and atomic references are good for
      keeping mutable state isolated to a single thread, publish the
      immutable object.  Eg StringBuilder and String.
   5. Care must also be taken to distinguish between synchronization on
      a reference and the object it references.
   6. When using ConcurrentHashMap, always check the result of put if
      absent and update your local reference if you need to.
   7. We need also to be careful of livelock or deadlock due to excess
      synchronization.
   8. Sometimes its better not to use synchronization at all, just don't
      share the object between threads, pass it and don't retain a
      reference.
   9. Don't call external methods from inside a synchronization block,
      this can lead to deadlock.


Peer review is good for this reason, it's easy to get it wrong.

Keeping Objects simple makes peer review easier.

Shared mutable state is tough.

Well spotted Patricia.

Have you got any examples of a formal proof of correctness?  Just out of 
curiosity?

Cheers,

Peter.

Re: Possible multi-threading bug

Posted by Patricia Shanahan <pa...@acm.org>.

Tom Hobbs wrote:
> Yes, you're right.
> 
> I knew about the non-atomicity of ++, my concern was a call to reset
> creeping in between the two parts of that operation.

That is a very good point.

Leaving reset unsynchronized, even with volatile, would lead to results 
that would not be possible with full synchronization. Suppose thread A 
is going to do a reset, and thread B is going to do an increment, and 
everybody agrees the count is currently 10.

With synchronization, after both operations have completed, count is 
either 1 (A:reset, B:increment) or 0 (B:increment, A:reset).

With only increment synchronized, even if the the count is volatile, we 
add the possible outcome 11, which cannot be reached by executing the 
two methods atomically:

B: volatile read, result 10
A: volatile write of 0
B: volatile write of 11

I think it is best to stick to synchronization and the 
java.util.concurrent classes, ignoring volatile, except in really 
important performance-critical situations. If we do find we need to use 
volatile, I would like to take the time to do a formal 
proof-of-correctness based on the JLS rules.

> I forgot about the "Get a global lock on the variable" that volatile
> would require as mentioned here;
> http://www.javaperformancetuning.com/news/qotm051.shtml

I don't think we can depend on anything other than the JLS, or possibly 
"Java Concurrency in Practice" which seems to tend towards safety if it 
simplifies the situation at all.

For example, I would be surprised if a SPARC v.9 TSO implementation of 
volatile needed to do more than a "membar #StoreLoad" between a store to 
a volatile variable and the next load of anything. Even on a system that 
needed to do more, any locking would treat the volatile read and the 
volatile write of the ++ as two separate operations.

Patricia

Re: Possible multi-threading bug

Posted by Patricia Shanahan <pa...@acm.org>.

I've taken a quick look at the source code, and it appears to me to be
somewhere in between.

It is segmented, and the Segment class extends ReentrantLock. However,
the lock is obtained only for methods that modify the segment. The code
uses volatile variables to permit pure read operations to go through
without any locking.

Patricia


Carfield Yim wrote:
> I've read some article which saying that CHM is still locked, it just
> divid the hashmap into few small hashmap internally so that the chance
> of lock-wait is lesser, is that wrong?
> 
> On Fri, Dec 3, 2010 at 10:19 PM, Gregg Wonderly <gr...@wonderly.org> wrote:
>> On 12/2/2010 3:48 PM, Patricia Shanahan wrote:
>>> Gregg Wonderly wrote:
>>> ...
>>>> A second issue is that if you are designing a "container" class that
>>>> others
>>>> might use in multiplicity such that it could end up in a container, and
>>>> perhaps as a key data structure, it can be better to create an internal
>>>> locking object to perform synchronization on so that if the user of your
>>>> class
>>>> starts using your class as a locking object, the internal locking does
>>>> not
>>>> impact unrelated code paths.
>>> ...
>>>
>>> For container objects, I would give the opposite advice, and recommend
>>> making
>>> the container itself the lock object.
>> There are cases for both I believe.  ConcurrentHashMap, for example has no
>> locks performed on the instance.   All locking is internal using mechanisms
>> other than synchronized.
>>
>> The primary issue with using an internal lock, for me, is that I've found
>> that I end up with less opportunity for circular wait if I am in complete
>> control of when a lock is asserted.
>>
>> For things where atomic views need to maintained through multiple operations
>> on the same instance, then a single global view lock will be important.  As
>> the user of the container instance, you are more able to understand that
>> need in your application, so locking is your responsibility.
>>
>> Many people have complained about CHM not being "the lock" itself, but it
>> really comes down to the fact that since it does not use synchronized
>> actions internally, it can not make the object be "the lock".
>>
>> With this in mind, I've started trying to internalize more locking so that
>> for things that I do need to "adjust internal performance" on, I don't have
>> to go everywhere else and change locking use to be in line with the change
>> from synchronized to Lock.
>>
>> Locking with instances via synchronized was convenient, but I'd suggest that
>> not doing that provides a more plastic model that you can more readily
>> adjust.
>>
>> I also find that I will put more application operations into the APIs to
>> help me control and optimize those usage patterns.  This seems to help me
>> create more relevant/useful APIs and fewer "containers".  I.e. more simple
>> algorithms end up in the class implementation instead of strewn about the
>> application.
>>
>> Gregg Wonderly
>>
>

Re: Possible multi-threading bug

Posted by Carfield Yim <ca...@carfield.com.hk>.

I've read some article which saying that CHM is still locked, it just
divid the hashmap into few small hashmap internally so that the chance
of lock-wait is lesser, is that wrong?

On Fri, Dec 3, 2010 at 10:19 PM, Gregg Wonderly <gr...@wonderly.org> wrote:
> On 12/2/2010 3:48 PM, Patricia Shanahan wrote:
>>
>> Gregg Wonderly wrote:
>> ...
>>>
>>> A second issue is that if you are designing a "container" class that
>>> others
>>> might use in multiplicity such that it could end up in a container, and
>>> perhaps as a key data structure, it can be better to create an internal
>>> locking object to perform synchronization on so that if the user of your
>>> class
>>> starts using your class as a locking object, the internal locking does
>>> not
>>> impact unrelated code paths.
>>
>> ...
>>
>> For container objects, I would give the opposite advice, and recommend
>> making
>> the container itself the lock object.
>
> There are cases for both I believe.  ConcurrentHashMap, for example has no
> locks performed on the instance.   All locking is internal using mechanisms
> other than synchronized.
>
> The primary issue with using an internal lock, for me, is that I've found
> that I end up with less opportunity for circular wait if I am in complete
> control of when a lock is asserted.
>
> For things where atomic views need to maintained through multiple operations
> on the same instance, then a single global view lock will be important.  As
> the user of the container instance, you are more able to understand that
> need in your application, so locking is your responsibility.
>
> Many people have complained about CHM not being "the lock" itself, but it
> really comes down to the fact that since it does not use synchronized
> actions internally, it can not make the object be "the lock".
>
> With this in mind, I've started trying to internalize more locking so that
> for things that I do need to "adjust internal performance" on, I don't have
> to go everywhere else and change locking use to be in line with the change
> from synchronized to Lock.
>
> Locking with instances via synchronized was convenient, but I'd suggest that
> not doing that provides a more plastic model that you can more readily
> adjust.
>
> I also find that I will put more application operations into the APIs to
> help me control and optimize those usage patterns.  This seems to help me
> create more relevant/useful APIs and fewer "containers".  I.e. more simple
> algorithms end up in the class implementation instead of strewn about the
> application.
>
> Gregg Wonderly
>

Re: Possible multi-threading bug

Posted by Gregg Wonderly <gr...@wonderly.org>.

On 12/2/2010 3:48 PM, Patricia Shanahan wrote:
> Gregg Wonderly wrote:
> ...
>> A second issue is that if you are designing a "container" class that others
>> might use in multiplicity such that it could end up in a container, and
>> perhaps as a key data structure, it can be better to create an internal
>> locking object to perform synchronization on so that if the user of your class
>> starts using your class as a locking object, the internal locking does not
>> impact unrelated code paths.
> ...
>
> For container objects, I would give the opposite advice, and recommend making
> the container itself the lock object.

There are cases for both I believe.  ConcurrentHashMap, for example has no locks 
performed on the instance.   All locking is internal using mechanisms other than 
synchronized.

The primary issue with using an internal lock, for me, is that I've found that I 
end up with less opportunity for circular wait if I am in complete control of 
when a lock is asserted.

For things where atomic views need to maintained through multiple operations on 
the same instance, then a single global view lock will be important.  As the 
user of the container instance, you are more able to understand that need in 
your application, so locking is your responsibility.

Many people have complained about CHM not being "the lock" itself, but it really 
comes down to the fact that since it does not use synchronized actions 
internally, it can not make the object be "the lock".

With this in mind, I've started trying to internalize more locking so that for 
things that I do need to "adjust internal performance" on, I don't have to go 
everywhere else and change locking use to be in line with the change from 
synchronized to Lock.

Locking with instances via synchronized was convenient, but I'd suggest that not 
doing that provides a more plastic model that you can more readily adjust.

I also find that I will put more application operations into the APIs to help me 
control and optimize those usage patterns.  This seems to help me create more 
relevant/useful APIs and fewer "containers".  I.e. more simple algorithms end up 
in the class implementation instead of strewn about the application.

Gregg Wonderly

Re: Possible multi-threading bug

Posted by Michał Kłeczek <mi...@xpro.biz>.

I would also suggest taking a look at Google Code Guava project (formelly
Google Collections) - it introduces a concept of a concurrent computing map
which lazily computes a value when asked for non-existent key and a thread
safe memoizer which is a lazily computed value. In many cases it really
makes multithreaded code simpler and less error prone.

Michal

On Fri, Dec 3, 2010 at 10:17 AM, Carfield Yim <ca...@carfield.com.hk>wrote:

> How about using copyonwritearraylist?
> On Dec 3, 2010 5:49 AM, "Patricia Shanahan" <pa...@acm.org> wrote:
> > Gregg Wonderly wrote:
> > ...
> >> A second issue is that if you are designing a "container" class that
> >> others might use in multiplicity such that it could end up in a
> >> container, and perhaps as a key data structure, it can be better to
> >> create an internal locking object to perform synchronization on so that
> >> if the user of your class starts using your class as a locking object,
> >> the internal locking does not impact unrelated code paths.
> > ...
> >
> > For container objects, I would give the opposite advice, and recommend
> > making the container itself the lock object.
> >
> > The reason is code that performs multiple operations on the container to
> > do one related logical operation.
> >
> > For example, I've been looking at some test code that needs to remove a
> > random element from a List. It first uses a random number generator to
> > pick an int n in the range 0 <= n < list.size(), followed by
> list.remove(n).
> >
> > If done using a list that synchronizes on itself, that block of code can
> > be marked synchronized(list), but code that does a single operation on
> > list does not need to be synchronized externally. If the list
> > synchronizes on an internal locking object, *all* access to the list
> > will need to be synchronized in the calling code on some other object.
> >
> > Using the container itself is also consistent with what is done by
> > Collections.synchronizedList etc.
> >
> > Patricia
> >
>

Re: Possible multi-threading bug

Posted by Carfield Yim <ca...@carfield.com.hk>.

How about using copyonwritearraylist?
On Dec 3, 2010 5:49 AM, "Patricia Shanahan" <pa...@acm.org> wrote:
> Gregg Wonderly wrote:
> ...
>> A second issue is that if you are designing a "container" class that
>> others might use in multiplicity such that it could end up in a
>> container, and perhaps as a key data structure, it can be better to
>> create an internal locking object to perform synchronization on so that
>> if the user of your class starts using your class as a locking object,
>> the internal locking does not impact unrelated code paths.
> ...
>
> For container objects, I would give the opposite advice, and recommend
> making the container itself the lock object.
>
> The reason is code that performs multiple operations on the container to
> do one related logical operation.
>
> For example, I've been looking at some test code that needs to remove a
> random element from a List. It first uses a random number generator to
> pick an int n in the range 0 <= n < list.size(), followed by
list.remove(n).
>
> If done using a list that synchronizes on itself, that block of code can
> be marked synchronized(list), but code that does a single operation on
> list does not need to be synchronized externally. If the list
> synchronizes on an internal locking object, *all* access to the list
> will need to be synchronized in the calling code on some other object.
>
> Using the container itself is also consistent with what is done by
> Collections.synchronizedList etc.
>
> Patricia
>

Re: Possible multi-threading bug

Posted by Patricia Shanahan <pa...@acm.org>.

Gregg Wonderly wrote:
...
> A second issue is that if you are designing a "container" class that 
> others might use in multiplicity such that it could end up in a 
> container, and perhaps as a key data structure, it can be better to 
> create an internal locking object to perform synchronization on so that 
> if the user of your class starts using your class as a locking object, 
> the internal locking does not impact unrelated code paths.
...

For container objects, I would give the opposite advice, and recommend 
making the container itself the lock object.

The reason is code that performs multiple operations on the container to 
do one related logical operation.

For example, I've been looking at some test code that needs to remove a 
random element from a List. It first uses a random number generator to 
pick an int n in the range 0 <= n < list.size(), followed by list.remove(n).

If done using a list that synchronizes on itself, that block of code can 
be marked synchronized(list), but code that does a single operation on 
list does not need to be synchronized externally. If the list 
synchronizes on an internal locking object, *all* access to the list 
will need to be synchronized in the calling code on some other object.

Using the container itself is also consistent with what is done by 
Collections.synchronizedList etc.

Patricia

Re: Possible multi-threading bug

Posted by Gregg Wonderly <gr...@wonderly.org>.

In this kind of software design, where we have a reset(), get() and mutate() 
kind of method, the importance of volatile performance vs synchronized control 
is pretty much a part of what mutate() does.  If mutate() is really set(), then, 
you just have and assignment (typically) and thus there is no atomic view 
problem between get() and set().  get() will return the new value of set() at 
some point in the future, so volatile is enough in the class implementation.

However, in our case where mutate() involves, indirectly, a get() operation, 
then synchronized is required on all mutations because the total order of the 
involved operations is important, as Peter said in his post.  It can be that the 
references also need to be synchronized if there is an intermediate value that 
might be set() before the final mutate() action occurs.

Because of all of these considerations, it is usually far better to pick 
synchronized first until volatile can be proven to be more desirable.  That way, 
any future changes to the reference and mutator methods will not expose a new 
bug related to operation ordering or intermediate values.

Sometimes, people write methods like

	public void increment() {
		if( ++count > MAX ) {
			count = 0;
		}
	}

and there is thus two "set" operations and the final result is really all that 
you want to expose.  So synchronized here is a good idea.

A second issue is that if you are designing a "container" class that others 
might use in multiplicity such that it could end up in a container, and perhaps 
as a key data structure, it can be better to create an internal locking object 
to perform synchronization on so that if the user of your class starts using 
your class as a locking object, the internal locking does not impact unrelated 
code paths.

Gregg Wonderly


On 11/29/2010 5:45 AM, Tom Hobbs wrote:
> Yes, you're right.
>
> I knew about the non-atomicity of ++, my concern was a call to reset
> creeping in between the two parts of that operation.
>
> I forgot about the "Get a global lock on the variable" that volatile
> would require as mentioned here;
> http://www.javaperformancetuning.com/news/qotm051.shtml
>
> On Mon, Nov 29, 2010 at 10:50 AM, Peter Firmstone<ji...@zeus.net.au>  wrote:
>> Tom Hobbs wrote:
>>>
>>> Serious comment:
>>>
>>> Shouldn't reset also be synchronized?
>>
>> You've identified that reset is also a mutator method, I guess I gave
>> mutation as my reason and I wasn't being entirely accurate, sorry.
>>
>> To explain a bit more, reset doesn't have to be synchronized when _count is
>> volatile and increment synchronized, the reason, it's a single operation
>> mutator, it just resets the counter, it doesn't check it first.
>>
>> The increment method is at least two operations _count++ first gets _count,
>> then increments it and sets _count to the new value, if it wasn't
>> synchronized, another increment could get _count at the same time... Or a
>> reset operation could occur in between and the reset fail.
>>
>> The ++ operator isn't atomic.
>>
>> If in doubt use synchronization, so by making reset also synchronized you
>> wouldn't be wrong and it probably wouldn't hurt performance either.
>>
>> The secret to concurrency is, all operations must happen atomically.
>>   Synchronization creates atomicity when it wouldn't otherwise exist.
>>   Synchronization also effects visibility, as does volatile.
>>
>> Cheers,
>>
>> Peter.
>>
>>>   Also, +1 for making _count volatile.
>>>
>>> Frivolous comment:
>>>
>>> Can we drop the underscore prefix for member variables?  It's against
>>> the official Sun/Oracle/Whatever conventions.  Also, it drives me up
>>> the wall...
>>>
>>> http://www.oracle.com/technetwork/java/codeconventions-135099.html#367
>>>
>>>
>>> On Sun, Nov 28, 2010 at 11:16 PM, Patricia Shanahan<pa...@acm.org>  wrote:
>>>
>>>>
>>>> In the way in which this is used, I expect most of the calls to be to
>>>> increment. It has to be synchronized, so it seems simplest to synchronize
>>>> all the methods.
>>>>
>>>> I've done some more experiments, and decided this is a real problem. As
>>>> part
>>>> of my debug effort, I increased the number of threads in the stress test,
>>>> so
>>>> that it fails much more often. I also added some debug printouts, one of
>>>> which was showing up in conjunction with some but not all failures, so I
>>>> thought it was irrelevant.
>>>>
>>>> With the additional synchronization, the debug message shows up in all
>>>> failures. I think I actually had two forms of failure, one of which is
>>>> fixed
>>>> by the synchronization.  In the failure case that has been fixed,
>>>> everything
>>>> works, no debug messages, but the test never admits to having finished,
>>>> exactly the symptom I would expect from this issue.
>>>>
>>>> I plan to check in the enhanced test as well as the fixes, because it
>>>> only
>>>> takes a few minutes longer than the current size, and is much better at
>>>> finding bugs.
>>>>
>>>> Patricia
>>>>
>>>>
>>>> On 11/28/2010 2:52 PM, Peter Firmstone wrote:
>>>>
>>>>>
>>>>> Well, at the absolute minimum the variable should be volatile, so any
>>>>> changes are visible among all threads.
>>>>>
>>>>> Since increment is the only mutating method, this must be synchronized.
>>>>>
>>>>> This is a cheap form of multi read, single write threading, although one
>>>>> must be careful, as this only works on primitives and immutable object
>>>>> references, since only the reference itself is being updated.
>>>>>
>>>>> If it was a reference to a mutable object (or long), then all methods
>>>>> would need to be synchronized.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Peter.
>>>>>
>>>>> Patricia Shanahan wrote:
>>>>>
>>>>>>
>>>>>> I've found something I think is a problem in
>>>>>> com.sun.jini.test.impl.outrigger.matching.StressTest, but it does not
>>>>>> seem to be the problem, or at least not the only problem, causing the
>>>>>> test hang I'm investigating. It's difficult to test, so I'd like a
>>>>>> review of my reasoning. This is a question for those who are familiar
>>>>>> with the Java memory model.
>>>>>>
>>>>>> Incidentally, if we went to 1.5 as the oldest supported release, this
>>>>>> could be replaced by an AtomicInteger.
>>>>>>
>>>>>> In the following inner class, I think the methods reset and getCount
>>>>>> should be synchronized. As the comments note, the operations they
>>>>>> perform are atomic. My concern is the lack of a happens-before
>>>>>> relationship between those two methods and the increment method. As
>>>>>> far as I can tell, there is nothing forcing the change in the counter
>>>>>> due to an increment to become visible to a getCount call in another
>>>>>> thread.
>>>>>>
>>>>>> private class Counter {
>>>>>>
>>>>>> /**
>>>>>> * Internal counter variable.
>>>>>> */
>>>>>> private int _count = 0;
>>>>>>
>>>>>> /**
>>>>>> * Constructor. Declared to enforce protection level.
>>>>>> */
>>>>>> Counter() {
>>>>>>
>>>>>> // Do nothing.
>>>>>> }
>>>>>>
>>>>>> /**
>>>>>> * Resets internal counter to zero.
>>>>>> */
>>>>>> void reset() {
>>>>>>
>>>>>> // Integer assignment is atomic.
>>>>>> _count = 0;
>>>>>> }
>>>>>>
>>>>>> /**
>>>>>> * Increments internal counter by one.
>>>>>> */
>>>>>> synchronized void increment() {
>>>>>> ++_count;
>>>>>> }
>>>>>>
>>>>>> /**
>>>>>> * Returns current value of this<code>Counter</code>  object.
>>>>>> */
>>>>>> int getCount() {
>>>>>>
>>>>>> // Returning an integer is atomic.
>>>>>> return _count;
>>>>>> }
>>>>>> }
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Re: Possible multi-threading bug

Posted by Tom Hobbs <tv...@googlemail.com>.

Yes, you're right.

I knew about the non-atomicity of ++, my concern was a call to reset
creeping in between the two parts of that operation.

I forgot about the "Get a global lock on the variable" that volatile
would require as mentioned here;
http://www.javaperformancetuning.com/news/qotm051.shtml

On Mon, Nov 29, 2010 at 10:50 AM, Peter Firmstone <ji...@zeus.net.au> wrote:
> Tom Hobbs wrote:
>>
>> Serious comment:
>>
>> Shouldn't reset also be synchronized?
>
> You've identified that reset is also a mutator method, I guess I gave
> mutation as my reason and I wasn't being entirely accurate, sorry.
>
> To explain a bit more, reset doesn't have to be synchronized when _count is
> volatile and increment synchronized, the reason, it's a single operation
> mutator, it just resets the counter, it doesn't check it first.
>
> The increment method is at least two operations _count++ first gets _count,
> then increments it and sets _count to the new value, if it wasn't
> synchronized, another increment could get _count at the same time... Or a
> reset operation could occur in between and the reset fail.
>
> The ++ operator isn't atomic.
>
> If in doubt use synchronization, so by making reset also synchronized you
> wouldn't be wrong and it probably wouldn't hurt performance either.
>
> The secret to concurrency is, all operations must happen atomically.
>  Synchronization creates atomicity when it wouldn't otherwise exist.
>  Synchronization also effects visibility, as does volatile.
>
> Cheers,
>
> Peter.
>
>>  Also, +1 for making _count volatile.
>>
>> Frivolous comment:
>>
>> Can we drop the underscore prefix for member variables?  It's against
>> the official Sun/Oracle/Whatever conventions.  Also, it drives me up
>> the wall...
>>
>> http://www.oracle.com/technetwork/java/codeconventions-135099.html#367
>>
>>
>> On Sun, Nov 28, 2010 at 11:16 PM, Patricia Shanahan <pa...@acm.org> wrote:
>>
>>>
>>> In the way in which this is used, I expect most of the calls to be to
>>> increment. It has to be synchronized, so it seems simplest to synchronize
>>> all the methods.
>>>
>>> I've done some more experiments, and decided this is a real problem. As
>>> part
>>> of my debug effort, I increased the number of threads in the stress test,
>>> so
>>> that it fails much more often. I also added some debug printouts, one of
>>> which was showing up in conjunction with some but not all failures, so I
>>> thought it was irrelevant.
>>>
>>> With the additional synchronization, the debug message shows up in all
>>> failures. I think I actually had two forms of failure, one of which is
>>> fixed
>>> by the synchronization.  In the failure case that has been fixed,
>>> everything
>>> works, no debug messages, but the test never admits to having finished,
>>> exactly the symptom I would expect from this issue.
>>>
>>> I plan to check in the enhanced test as well as the fixes, because it
>>> only
>>> takes a few minutes longer than the current size, and is much better at
>>> finding bugs.
>>>
>>> Patricia
>>>
>>>
>>> On 11/28/2010 2:52 PM, Peter Firmstone wrote:
>>>
>>>>
>>>> Well, at the absolute minimum the variable should be volatile, so any
>>>> changes are visible among all threads.
>>>>
>>>> Since increment is the only mutating method, this must be synchronized.
>>>>
>>>> This is a cheap form of multi read, single write threading, although one
>>>> must be careful, as this only works on primitives and immutable object
>>>> references, since only the reference itself is being updated.
>>>>
>>>> If it was a reference to a mutable object (or long), then all methods
>>>> would need to be synchronized.
>>>>
>>>> Cheers,
>>>>
>>>> Peter.
>>>>
>>>> Patricia Shanahan wrote:
>>>>
>>>>>
>>>>> I've found something I think is a problem in
>>>>> com.sun.jini.test.impl.outrigger.matching.StressTest, but it does not
>>>>> seem to be the problem, or at least not the only problem, causing the
>>>>> test hang I'm investigating. It's difficult to test, so I'd like a
>>>>> review of my reasoning. This is a question for those who are familiar
>>>>> with the Java memory model.
>>>>>
>>>>> Incidentally, if we went to 1.5 as the oldest supported release, this
>>>>> could be replaced by an AtomicInteger.
>>>>>
>>>>> In the following inner class, I think the methods reset and getCount
>>>>> should be synchronized. As the comments note, the operations they
>>>>> perform are atomic. My concern is the lack of a happens-before
>>>>> relationship between those two methods and the increment method. As
>>>>> far as I can tell, there is nothing forcing the change in the counter
>>>>> due to an increment to become visible to a getCount call in another
>>>>> thread.
>>>>>
>>>>> private class Counter {
>>>>>
>>>>> /**
>>>>> * Internal counter variable.
>>>>> */
>>>>> private int _count = 0;
>>>>>
>>>>> /**
>>>>> * Constructor. Declared to enforce protection level.
>>>>> */
>>>>> Counter() {
>>>>>
>>>>> // Do nothing.
>>>>> }
>>>>>
>>>>> /**
>>>>> * Resets internal counter to zero.
>>>>> */
>>>>> void reset() {
>>>>>
>>>>> // Integer assignment is atomic.
>>>>> _count = 0;
>>>>> }
>>>>>
>>>>> /**
>>>>> * Increments internal counter by one.
>>>>> */
>>>>> synchronized void increment() {
>>>>> ++_count;
>>>>> }
>>>>>
>>>>> /**
>>>>> * Returns current value of this <code>Counter</code> object.
>>>>> */
>>>>> int getCount() {
>>>>>
>>>>> // Returning an integer is atomic.
>>>>> return _count;
>>>>> }
>>>>> }
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Re: Possible multi-threading bug

Posted by Peter Firmstone <ji...@zeus.net.au>.

Tom Hobbs wrote:
> Serious comment:
>
> Shouldn't reset also be synchronized?
You've identified that reset is also a mutator method, I guess I gave 
mutation as my reason and I wasn't being entirely accurate, sorry.

To explain a bit more, reset doesn't have to be synchronized when _count 
is volatile and increment synchronized, the reason, it's a single 
operation mutator, it just resets the counter, it doesn't check it first.

The increment method is at least two operations _count++ first gets 
_count, then increments it and sets _count to the new value, if it 
wasn't synchronized, another increment could get _count at the same 
time... Or a reset operation could occur in between and the reset fail.

The ++ operator isn't atomic.

If in doubt use synchronization, so by making reset also synchronized 
you wouldn't be wrong and it probably wouldn't hurt performance either.

The secret to concurrency is, all operations must happen atomically.  
Synchronization creates atomicity when it wouldn't otherwise exist.  
Synchronization also effects visibility, as does volatile.

Cheers,

Peter.

>   Also, +1 for making _count volatile.
>
> Frivolous comment:
>
> Can we drop the underscore prefix for member variables?  It's against
> the official Sun/Oracle/Whatever conventions.  Also, it drives me up
> the wall...
>
> http://www.oracle.com/technetwork/java/codeconventions-135099.html#367
>
>
> On Sun, Nov 28, 2010 at 11:16 PM, Patricia Shanahan <pa...@acm.org> wrote:
>   
>> In the way in which this is used, I expect most of the calls to be to
>> increment. It has to be synchronized, so it seems simplest to synchronize
>> all the methods.
>>
>> I've done some more experiments, and decided this is a real problem. As part
>> of my debug effort, I increased the number of threads in the stress test, so
>> that it fails much more often. I also added some debug printouts, one of
>> which was showing up in conjunction with some but not all failures, so I
>> thought it was irrelevant.
>>
>> With the additional synchronization, the debug message shows up in all
>> failures. I think I actually had two forms of failure, one of which is fixed
>> by the synchronization.  In the failure case that has been fixed, everything
>> works, no debug messages, but the test never admits to having finished,
>> exactly the symptom I would expect from this issue.
>>
>> I plan to check in the enhanced test as well as the fixes, because it only
>> takes a few minutes longer than the current size, and is much better at
>> finding bugs.
>>
>> Patricia
>>
>>
>> On 11/28/2010 2:52 PM, Peter Firmstone wrote:
>>     
>>> Well, at the absolute minimum the variable should be volatile, so any
>>> changes are visible among all threads.
>>>
>>> Since increment is the only mutating method, this must be synchronized.
>>>
>>> This is a cheap form of multi read, single write threading, although one
>>> must be careful, as this only works on primitives and immutable object
>>> references, since only the reference itself is being updated.
>>>
>>> If it was a reference to a mutable object (or long), then all methods
>>> would need to be synchronized.
>>>
>>> Cheers,
>>>
>>> Peter.
>>>
>>> Patricia Shanahan wrote:
>>>       
>>>> I've found something I think is a problem in
>>>> com.sun.jini.test.impl.outrigger.matching.StressTest, but it does not
>>>> seem to be the problem, or at least not the only problem, causing the
>>>> test hang I'm investigating. It's difficult to test, so I'd like a
>>>> review of my reasoning. This is a question for those who are familiar
>>>> with the Java memory model.
>>>>
>>>> Incidentally, if we went to 1.5 as the oldest supported release, this
>>>> could be replaced by an AtomicInteger.
>>>>
>>>> In the following inner class, I think the methods reset and getCount
>>>> should be synchronized. As the comments note, the operations they
>>>> perform are atomic. My concern is the lack of a happens-before
>>>> relationship between those two methods and the increment method. As
>>>> far as I can tell, there is nothing forcing the change in the counter
>>>> due to an increment to become visible to a getCount call in another
>>>> thread.
>>>>
>>>> private class Counter {
>>>>
>>>> /**
>>>> * Internal counter variable.
>>>> */
>>>> private int _count = 0;
>>>>
>>>> /**
>>>> * Constructor. Declared to enforce protection level.
>>>> */
>>>> Counter() {
>>>>
>>>> // Do nothing.
>>>> }
>>>>
>>>> /**
>>>> * Resets internal counter to zero.
>>>> */
>>>> void reset() {
>>>>
>>>> // Integer assignment is atomic.
>>>> _count = 0;
>>>> }
>>>>
>>>> /**
>>>> * Increments internal counter by one.
>>>> */
>>>> synchronized void increment() {
>>>> ++_count;
>>>> }
>>>>
>>>> /**
>>>> * Returns current value of this <code>Counter</code> object.
>>>> */
>>>> int getCount() {
>>>>
>>>> // Returning an integer is atomic.
>>>> return _count;
>>>> }
>>>> }
>>>>
>>>>         
>>>       
>>     
>
>

Re: Possible multi-threading bug

Posted by Tom Hobbs <tv...@googlemail.com>.

Serious comment:

Shouldn't reset also be synchronized?  Also, +1 for making _count volatile.

Frivolous comment:

Can we drop the underscore prefix for member variables?  It's against
the official Sun/Oracle/Whatever conventions.  Also, it drives me up
the wall...

http://www.oracle.com/technetwork/java/codeconventions-135099.html#367


On Sun, Nov 28, 2010 at 11:16 PM, Patricia Shanahan <pa...@acm.org> wrote:
> In the way in which this is used, I expect most of the calls to be to
> increment. It has to be synchronized, so it seems simplest to synchronize
> all the methods.
>
> I've done some more experiments, and decided this is a real problem. As part
> of my debug effort, I increased the number of threads in the stress test, so
> that it fails much more often. I also added some debug printouts, one of
> which was showing up in conjunction with some but not all failures, so I
> thought it was irrelevant.
>
> With the additional synchronization, the debug message shows up in all
> failures. I think I actually had two forms of failure, one of which is fixed
> by the synchronization.  In the failure case that has been fixed, everything
> works, no debug messages, but the test never admits to having finished,
> exactly the symptom I would expect from this issue.
>
> I plan to check in the enhanced test as well as the fixes, because it only
> takes a few minutes longer than the current size, and is much better at
> finding bugs.
>
> Patricia
>
>
> On 11/28/2010 2:52 PM, Peter Firmstone wrote:
>>
>> Well, at the absolute minimum the variable should be volatile, so any
>> changes are visible among all threads.
>>
>> Since increment is the only mutating method, this must be synchronized.
>>
>> This is a cheap form of multi read, single write threading, although one
>> must be careful, as this only works on primitives and immutable object
>> references, since only the reference itself is being updated.
>>
>> If it was a reference to a mutable object (or long), then all methods
>> would need to be synchronized.
>>
>> Cheers,
>>
>> Peter.
>>
>> Patricia Shanahan wrote:
>>>
>>> I've found something I think is a problem in
>>> com.sun.jini.test.impl.outrigger.matching.StressTest, but it does not
>>> seem to be the problem, or at least not the only problem, causing the
>>> test hang I'm investigating. It's difficult to test, so I'd like a
>>> review of my reasoning. This is a question for those who are familiar
>>> with the Java memory model.
>>>
>>> Incidentally, if we went to 1.5 as the oldest supported release, this
>>> could be replaced by an AtomicInteger.
>>>
>>> In the following inner class, I think the methods reset and getCount
>>> should be synchronized. As the comments note, the operations they
>>> perform are atomic. My concern is the lack of a happens-before
>>> relationship between those two methods and the increment method. As
>>> far as I can tell, there is nothing forcing the change in the counter
>>> due to an increment to become visible to a getCount call in another
>>> thread.
>>>
>>> private class Counter {
>>>
>>> /**
>>> * Internal counter variable.
>>> */
>>> private int _count = 0;
>>>
>>> /**
>>> * Constructor. Declared to enforce protection level.
>>> */
>>> Counter() {
>>>
>>> // Do nothing.
>>> }
>>>
>>> /**
>>> * Resets internal counter to zero.
>>> */
>>> void reset() {
>>>
>>> // Integer assignment is atomic.
>>> _count = 0;
>>> }
>>>
>>> /**
>>> * Increments internal counter by one.
>>> */
>>> synchronized void increment() {
>>> ++_count;
>>> }
>>>
>>> /**
>>> * Returns current value of this <code>Counter</code> object.
>>> */
>>> int getCount() {
>>>
>>> // Returning an integer is atomic.
>>> return _count;
>>> }
>>> }
>>>
>>
>>
>
>

Re: Possible multi-threading bug

Posted by Patricia Shanahan <pa...@acm.org>.

In the way in which this is used, I expect most of the calls to be to 
increment. It has to be synchronized, so it seems simplest to 
synchronize all the methods.

I've done some more experiments, and decided this is a real problem. As 
part of my debug effort, I increased the number of threads in the stress 
test, so that it fails much more often. I also added some debug 
printouts, one of which was showing up in conjunction with some but not 
all failures, so I thought it was irrelevant.

With the additional synchronization, the debug message shows up in all 
failures. I think I actually had two forms of failure, one of which is 
fixed by the synchronization.  In the failure case that has been fixed, 
everything works, no debug messages, but the test never admits to having 
finished, exactly the symptom I would expect from this issue.

I plan to check in the enhanced test as well as the fixes, because it 
only takes a few minutes longer than the current size, and is much 
better at finding bugs.

Patricia

On 11/28/2010 2:52 PM, Peter Firmstone wrote:
> Well, at the absolute minimum the variable should be volatile, so any
> changes are visible among all threads.
>
> Since increment is the only mutating method, this must be synchronized.
>
> This is a cheap form of multi read, single write threading, although one
> must be careful, as this only works on primitives and immutable object
> references, since only the reference itself is being updated.
>
> If it was a reference to a mutable object (or long), then all methods
> would need to be synchronized.
>
> Cheers,
>
> Peter.
>
> Patricia Shanahan wrote:
>> I've found something I think is a problem in
>> com.sun.jini.test.impl.outrigger.matching.StressTest, but it does not
>> seem to be the problem, or at least not the only problem, causing the
>> test hang I'm investigating. It's difficult to test, so I'd like a
>> review of my reasoning. This is a question for those who are familiar
>> with the Java memory model.
>>
>> Incidentally, if we went to 1.5 as the oldest supported release, this
>> could be replaced by an AtomicInteger.
>>
>> In the following inner class, I think the methods reset and getCount
>> should be synchronized. As the comments note, the operations they
>> perform are atomic. My concern is the lack of a happens-before
>> relationship between those two methods and the increment method. As
>> far as I can tell, there is nothing forcing the change in the counter
>> due to an increment to become visible to a getCount call in another
>> thread.
>>
>> private class Counter {
>>
>> /**
>> * Internal counter variable.
>> */
>> private int _count = 0;
>>
>> /**
>> * Constructor. Declared to enforce protection level.
>> */
>> Counter() {
>>
>> // Do nothing.
>> }
>>
>> /**
>> * Resets internal counter to zero.
>> */
>> void reset() {
>>
>> // Integer assignment is atomic.
>> _count = 0;
>> }
>>
>> /**
>> * Increments internal counter by one.
>> */
>> synchronized void increment() {
>> ++_count;
>> }
>>
>> /**
>> * Returns current value of this <code>Counter</code> object.
>> */
>> int getCount() {
>>
>> // Returning an integer is atomic.
>> return _count;
>> }
>> }
>>
>
>

Re: Possible multi-threading bug

Posted by Peter Firmstone <ji...@zeus.net.au>.

Well, at the absolute minimum the variable should be volatile, so any 
changes are visible among all threads.

Since increment is the only mutating method, this must be synchronized.

This is a cheap form of multi read, single write threading, although one 
must be careful, as this only works on primitives and immutable object 
references, since only the reference itself is being updated.

If it was a reference to a mutable object (or long), then all methods 
would need to be synchronized.

Cheers,

Peter.

Patricia Shanahan wrote:
> I've found something I think is a problem in 
> com.sun.jini.test.impl.outrigger.matching.StressTest, but it does not 
> seem to be the problem, or at least not the only problem, causing the 
> test hang I'm investigating. It's difficult to test, so I'd like a 
> review of my reasoning. This is a question for those who are familiar 
> with the Java memory model.
>
> Incidentally, if we went to 1.5 as the oldest supported release, this 
> could be replaced by an AtomicInteger.
>
> In the following inner class, I think the methods reset and getCount 
> should be synchronized. As the comments note, the operations they 
> perform are atomic. My concern is the lack of a happens-before 
> relationship between those two methods and the increment method. As 
> far as I can tell, there is nothing forcing the change in the counter 
> due to an increment to become visible to a getCount call in another 
> thread.
>
>     private class Counter {
>
>         /**
>          * Internal counter variable.
>          */
>         private int _count = 0;
>
>         /**
>          * Constructor. Declared to enforce protection level.
>          */
>         Counter() {
>
>             // Do nothing.
>         }
>
>         /**
>          * Resets internal counter to zero.
>          */
>         void reset() {
>
>             // Integer assignment is atomic.
>             _count = 0;
>         }
>
>         /**
>          * Increments internal counter by one.
>          */
>         synchronized void increment() {
>             ++_count;
>         }
>
>         /**
>          * Returns current value of this <code>Counter</code> object.
>          */
>         int getCount() {
>
>             // Returning an integer is atomic.
>             return _count;
>         }
>     }
>