You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@river.apache.org by Peter Firmstone <ji...@zeus.net.au> on 2014/01/04 10:59:59 UTC

Development Process - Experimental Testing, Theoretical analysis; code auditing and Static Analysis

Please provide your thoughts on the following:

How do we develop code for River?

Do we only use experimental based development, or do we also allow 
theoretical development also?
Do we only fix bugs that can be demonstrated with a test case, or do we 
fix bugs identified by FindBugs and manual code auditing as well?

Should we allow theoretical development based on standards like the Java 
Memory Model with visual auditing and static analysis with FindBugs, or 
should we prohibit fixing bugs that don't include a test case 
demonstrating the failure?

Regards,

Peter.


On 4/01/2014 6:58 PM, Patricia Shanahan wrote:
> Just before Christmas, you were discussing whether to fix concurrency 
> problems based on theoretical analysis, or to only fix those problems 
> for which there is experimental evidence.
>
> I believe the PMC will be at cross-purposes until you resolve that 
> issue, and strongly advise discussing and voting on it.
>
> This is an example of a question whose answer would be obvious and 
> non-controversial if you had agreement, either way, on that general 
> issue. "When do you claim that this happens?  And what currently 
> happens now that is unacceptable?  What is the concrete, observable 
> problem that you’re trying to solve, that justifies introducing 
> failures that require further work?" is a valid, and important, set of 
> questions if you are only going to fix concurrency bugs for which 
> there is experimental evidence. It is irrelevant if you are going to 
> fix concurrency bugs based on theoretical analysis.
>
> Patricia
>
> On 1/3/2014 10:14 PM, Greg Trasuk wrote:
>>
>> On Jan 4, 2014, at 12:52 AM, Peter Firmstone <ji...@zeus.net.au> wrote:
>>
>>> On 4/01/2014 3:18 PM, Greg Trasuk wrote:
>>>> I’ll also point out Patricia’s recent statement that TaskManager 
>>>> should be reasonably efficient for small task queues, but less 
>>>> efficient for larger task queues.  We don’t have solid evidence 
>>>> that the task queues ever get large.  Hence, the assertion that 
>>>> “TaskManager doesn’t scale” is meaningless.
>>>
>>> No, it's not about scalability, it's about the window of time when a 
>>> task is removed from the queue in TaskManager for execution but 
>>> fails and needs to be retried later.  Task.runAfter doesn't contain 
>>> the task that "should have executed" so dependant tasks proceed 
>>> before their depenencies.
>>>
>>> This code comment from ServiceDiscoveryManager might help:
>>>
>>>        /** This task class, when executed, first registers to receive
>>>          *  ServiceEvents from the given ServiceRegistrar. If the 
>>> registration
>>>          *  process succeeds (no RemoteExceptions), it then executes 
>>> the
>>>          *  LookupTask to query the given ServiceRegistrar for a 
>>> "snapshot"
>>>          *  of its current state with respect to services that match 
>>> the
>>>          *  given template.
>>>          *
>>>          *  Note that the order of execution of the two tasks is 
>>> important.
>>>          *  That is, the LookupTask must be executed only after 
>>> registration
>>>          *  for events has completed. This is because when an entity 
>>> registers
>>>          *  with the event mechanism of a ServiceRegistrar, the 
>>> entity will
>>>          *  only receive notification of events that occur "in the 
>>> future",
>>>          *  after the registration is made. The entity will not 
>>> receive events
>>>          *  about changes to the state of the ServiceRegistrar that 
>>> may have
>>>          *  occurred before or during the registration process.
>>>          *
>>>          *  Thus, if the order of these tasks were reversed and the 
>>> LookupTask
>>>          *  were to be executed prior to the RegisterListenerTask, 
>>> then the
>>>          *  possibility exists for the occurrence of a change in the
>>>          *  ServiceRegistrar's state between the time the LookupTask 
>>> retrieves
>>>          *  a snapshot of that state, and the time the event 
>>> registration
>>>          *  process has completed, resulting in an incorrect view of 
>>> the
>>>          *  current state of the ServiceRegistrar.
>>>
>>
>> When do you claim that this happens?  And what currently happens now 
>> that is unacceptable?  What is the concrete, observable problem that 
>> you’re trying to solve, that justifies introducing failures that 
>> require further work?
>
>

Re: Development Process - Experimental Testing, Theoretical analysis; code auditing and Static Analysis

Posted by Peter <ji...@zeus.net.au>.

To clarify the number of test failures by platform:

JDK6 Linux x64 - no test failures.
JDK7 Linux ARM - no test failures.
JDK7 Linux x64 - 1 occassional test failure.
JDK7 Windows Server 2008 x64 - 18 test failures, all but 1 related to SDM.
JDK7 Windows 7 x64 - no test failures
JDK7 Solaris x64 - 2 test failures.

Concurrency bugs are harder to flush out than Greg's comments suggest, I haven't experienced them recently on my hardware, nor have I been able to reproduce them despite my best attempts, does that mean they don't exist?

I really do wish it was as simple as, I fix this bug, now I have 35 test failures to debug.

Regards,

Peter.

----- Original message -----
> ----- Original message -----
> > 
> > @Patricia - I understand your interpretation, and I think I’ve misled
> > you by including the words “concrete, observable” in my question.    I’m
> > not talking about concurrency bugs here.    I’m perfectly OK with making
> > changes based on analysis that shows a possible race condition.
> > However, I spent most of yesterday reading the Java Language
> > Specification and I’m now more convinced than ever that statements like
> > “we’re using final variables, therefore all our code has to change”
> > (paraphrasing) are no substitute for reasoned analysis.  
> 
> You don't agree that final field usage should comply with the JMM?
> 
> Please also read the Java Memory Model.
> 
>   Basically, I’m
> > asserting the very common professional view that changes to existing
> > code should be traceable to a requirement or a problem.    Further,
> > those changes should be carefully considered and designed to minimize
> > the chance of breaking anything else.    I don’t find the argument that
> > “I fixed a concurrency problem, so it’s expected to have 35 other
> > failures” convincing.    From the outside, that looks a lot like
> > introducing failures in an attempt to fix a poorly understood or
> > poorly stated problem.
> 
> You're misinterpreting my development effort, I haven't yet fixed
> ServiceDiscoveryManager, but I have fixed other classes, until only a
> few commits ago ServiceDiscoveryManager was not failing (at least not
> often).   Although ServiceDiscoveryManager hasn't changed since then, it
> is now causing multiple test failures.
> 
> So while the other classes that use ExecutorService aren't failing,
> ServiceDiscoveryManager is failing precisely because we're using
> TaskManager's fundamentally broken runAfter method.
> 
> We have previously focused on the problem that runAfter was designed to
> address and found this was not a good solution to the original problem.
> 
> 
> > 
> > I teach programming.    I see this all the time.    When people make
> > changes based on what they “think” “might” be happening, it’s always a
> > disaster.
> 
> You're assuming FindBugs,   JMM non-compliance and common issues
> described by Concurrency in Practise, when clearly and easily identified
> fall into the same category as someone who's learning to code for the
> first time?
> 
> You're advocating a completely experimental driven approach, like
> someone who's learning to program experiences.   It is very good approach
> for learning...
> 
> Do you not teach your students how to safely use final fields?
> 
> > 
> > @Peter - You’re asking the wrong question.    The right question is
> > whether we should have a review-then-commit policy for existing code
> > that traces a well-defined (and ideally limited) package of changes to
> > a JIRA issue, 
> 
> This would prevent a move from experimental to include a more
> theoretical development approach.   Test cases are still required to pass
> with a theoretical approach.
> 
> and includes the community in decisions around how to fix
> > the problem that is stated. 
> 
> And when a simple discussion about a simple issue veers off topic with
> heated arguments, what then?   You propose I stop committing to skunk,
> until each change is discussed at length on the list?   But we can't even
> agree on final field usage?
> 
> We need to make a decision whether to include theoretical driven
> development that includes compliance to standards like the JMM.
> 
> Code has been committed to skunk, for people to review, in public, they
> might misinterpret my suggestions on the mail list, but they will have a
> better understanding after reading the code, I welcome their
> participation.
> 
> I am open to other solutions, but I don't like the suggestion that non
> compliant code is preferred until a test case that proves failure can be
> found.   What happens when the test case only fails on one platform, but
> is not easily repeatable?   What if the fix just changes timing to mask
> the original problem?   
> 
> Testing doesn't prove the absence of errors?   Reducing errors further
> requires additonal tools like FindBugs and public open peer review.
> 
> Trunk development was halted because of unrelated test failures.
> 
> If you try to fix those test failures it will cause a cascading effect
> of additional new test failures.   I am still fixing those bugs in
> qa-refactor.
> 
> If the outcome of this discussion is to exclude theoretical driven
> development (FindBugs, JMM compliance), then the only option will be to
> revert trunk and exclude code that causes unrelated test failures.
> 
> The java platform is also undergoing development, this too will cause
> timing changes resulting in test failures.   What do we do then when we
> can't fix them because each fix causes other unrelated test failures?
> 
> On that note, Java 7 support is not well tested, 2.2 has only been
> tested on Linux x64, no results for the jtreg tests have been posted and
> I have previously seen discovery event related test failures in the 2.2
> branch.
> 
> The qa-refactor branch is relatively stable on Linux x64.
> 
> The test failures we're discussing occur on Windows x64, perhaps they
> also occur with 2.2 on windows, should we test 2.2 on windows too?
> 
> If 2.2 fails on windows that gives us a provable test case doesn't it?   
> The test failures are then proven to be only indirectly related to
> changes in qa-refactor?
> 
> Regards,
> 
> Peter.
> 
>     And I will suggest pre-emptively that a
> > JIRA issue that says “concurrency problems exist” is not specific
> > enough to drive development.
> > 
> > Greg.
> > 
> > On Jan 4, 2014, at 4:59 AM, Peter Firmstone <ji...@zeus.net.au> wrote:
> > 
> > > Please provide your thoughts on the following:
> > > 
> > > How do we develop code for River?
> > > 
> > > Do we only use experimental based development, or do we also allow
> > > theoretical development also? Do we only fix bugs that can be
> > > demonstrated with a test case, or do we fix bugs identified by
> > > FindBugs and manual code auditing as well?
> > > 
> > > Should we allow theoretical development based on standards like the
> > > Java Memory Model with visual auditing and static analysis with
> > > FindBugs, or should we prohibit fixing bugs that don't include a test
> > > case demonstrating the failure?
> > > 
> > > Regards,
> > > 
> > > Peter.
> > > 
> > > 
> > > On 4/01/2014 6:58 PM, Patricia Shanahan wrote:
> > > > Just before Christmas, you were discussing whether to fix
> > > > concurrency problems based on theoretical analysis, or to only fix
> > > > those problems for which there is experimental evidence.
> > > > 
> > > > I believe the PMC will be at cross-purposes until you resolve that
> > > > issue, and strongly advise discussing and voting on it.
> > > > 
> > > > This is an example of a question whose answer would be obvious and
> > > > non-controversial if you had agreement, either way, on that general
> > > > issue. "When do you claim that this happens?    And what currently
> > > > happens now that is unacceptable?    What is the concrete,
> > > > observable problem that you’re trying to solve, that justifies
> > > > introducing failures that require further work?" is a valid, and
> > > > important, set of questions if you are only going to fix
> > > > concurrency bugs for which there is experimental evidence. It is
> > > > irrelevant if you are going to fix concurrency bugs based on
> > > > theoretical analysis.
> > > > 
> > > > Patricia
> > > > 
> > > > On 1/3/2014 10:14 PM, Greg Trasuk wrote:
> > > > > 
> > > > > On Jan 4, 2014, at 12:52 AM, Peter Firmstone <ji...@zeus.net.au>
> > > > > wrote:
> > > > > 
> > > > > > On 4/01/2014 3:18 PM, Greg Trasuk wrote:
> > > > > > > I’ll also point out Patricia’s recent statement that
> > > > > > > TaskManager should be reasonably efficient for small task
> > > > > > > queues, but less efficient for larger task queues.    We don’t
> > > > > > > have solid evidence that the task queues ever get large. 
> > > > > > > Hence, the assertion that “TaskManager doesn’t scale” is
> > > > > > > meaningless.
> > > > > > 
> > > > > > No, it's not about scalability, it's about the window of time
> > > > > > when a task is removed from the queue in TaskManager for
> > > > > > execution but fails and needs to be retried later. 
> > > > > > Task.runAfter doesn't contain the task that "should have
> > > > > > executed" so dependant tasks proceed before their depenencies.
> > > > > > 
> > > > > > This code comment from ServiceDiscoveryManager might help:
> > > > > > 
> > > > > > /** This task class, when executed, first registers to receive
> > > > > > *    ServiceEvents from the given ServiceRegistrar. If the
> > > > > > registration *    process succeeds (no RemoteExceptions), it
> > > > > > then executes the *    LookupTask to query the given
> > > > > > ServiceRegistrar for a "snapshot" *    of its current state
> > > > > > with respect to services that match the *    given template.
> > > > > > *
> > > > > > *    Note that the order of execution of the two tasks is
> > > > > > important. *    That is, the LookupTask must be executed only
> > > > > > after registration *    for events has completed. This is
> > > > > > because when an entity registers *    with the event mechanism
> > > > > > of a ServiceRegistrar, the entity will *    only receive
> > > > > > notification of events that occur "in the future", *    after
> > > > > > the registration is made. The entity will not receive events *
> > > > > >     about changes to the state of the ServiceRegistrar that may
> > > > > > have *    occurred before or during the registration process. *
> > > > > > *    Thus, if the order of these tasks were reversed and the
> > > > > > LookupTask *    were to be executed prior to the
> > > > > > RegisterListenerTask, then the *    possibility exists for the
> > > > > > occurrence of a change in the *    ServiceRegistrar's state
> > > > > > between the time the LookupTask retrieves *    a snapshot of
> > > > > > that state, and the time the event registration *    process has
> > > > > > completed, resulting in an incorrect view of the *    current
> > > > > > state of the ServiceRegistrar.
> > > > > > 
> > > > > 
> > > > > When do you claim that this happens?    And what currently happens
> > > > > now that is unacceptable?    What is the concrete, observable
> > > > > problem that you’re trying to solve, that justifies introducing
> > > > > failures that require further work?
> > > > 
> > > > 
> > > 
> > 
>

RE CONCERNS: [WAS] Re: Development Process - Experimental Theoretical

Posted by Peter Firmstone <ji...@zeus.net.au>.

Rather than distract the questions posed in the previous thread, I've 
created a new one to discuss concerns.

>
>> As I’ve said before, my concern is not theoretical vs experimental.
>> It’s “what is this change, why was it made, what are the effects, and
>> are its effects justifiable?”

Example 1: Startable

What is this change?
     An interface for service implementations to implement to export and 
start threads after construction.
Why was it made?
     So service implementations with final fields can freeze their final 
fields before the services reference is shared.
What are the effects?
     Service implementations can delay exporting and starting threads 
until after their constructor completes.
     Ensures that other threads see the fully constructed state of final 
fields and their referents and not the default initialisation value.
Are its effects justifiable?
     Considering the minimal effort required to implement, it allows 
correct and safe use of final in services, this is a big win in my opinion.


Example 2: Replace TaskManager with ExecutorService

What is this change?
     Replace TaskManager with ExecutorService
Why was it made?
     Task.runAfter(List tasks, int position) is fundamentally broken.
     ExecutorService implementations have far superior throughput.
     ExecutorService implementations are maintained by others, reducing 
our maintenance burden.
What are the effects?
     RandomStressTest only uses 188 threads instead of 1000 and APPEARS 
to increase throughput by 270 times (although other changes to the 
codebase may also be contributing).
     Customers can fine tune and change the ExecutorService 
implementations to best suit their environment if they so choose.
Are its effects justifiable?
     Ooh yeah.

There are many more changes of course and that's your main concern.

>>> Code has been committed to skunk, for people to review, in public,
>>> they might misinterpret my suggestions on the mail list, but they will
>>> have a better understanding after reading the code, I welcome their
>>> participation.
>>>
>> There are 1557 files modified between 2.2.2 and qa_refactor.   62
>> deleted.   214 added.   How practical is it to review?

Hmm, four years work, when was the last time you reviewed Linux before 
you used it?  Don't get me wrong, I want you to review the code, if we 
divide review among 5 people, it's only 300 files each.

The problem with incremental change & release is we would have to 
release with known failures, rather than with the most stable build 
possible, this process is not something I want, it is something I'm 
forced to do until the build stabilises.  Releasing with known failures 
risks exposing more bugs than we can handle in deployment.


>>> I am open to other solutions, but I don't like the suggestion that non
>>> compliant code is preferred until a test case that proves failure can
>>> be found.   What happens when the test case only fails on one platform,
>>> but is not easily repeatable?   What if the fix just changes timing to
>>> mask the original problem? 
>>>
>>> Testing doesn't prove the absence of errors?   Reducing errors further
>>> requires additonal tools like FindBugs and public open peer review.
>>>
>>> Trunk development was halted because of unrelated test failures.
>> It was halted because the community got nervous and started asking for
>> review-then-commit, so you moved over to qa_refactor.

No, it was halted because River is riddled with concurrency bugs (some 
won't acknowledge) and the build was starting to experience failures 
because of those bugs.  Prior to that event, the majority of changes 
were made in skunk and merged with trunk after successful testing in 
skunk.  Until the most minor of changes caused test failures the 
community was working together, developing.  Once these bugs are fixed 
we won't face the development destabilisation challenges they present.

Those bugs are obvious if you diff qa refactor's files while browsing svn.

The 2.2 branch has numerous bugs, its test suite just experienced 47 
failures on Windows and I haven't seen any jtreg test results, it hasn't 
been tested properly on Java 7 prior to release.  Don't get me wrong 
your release efforts are definitely appreciated, I think you need to cut 
me some slack and show me the same patience as I've shown you.

The qa refactoring build has had 1000's of test runs, been tested on as 
many different OS and CPU arch as possible to deliberately expose bugs 
and fix them.  There isn't even a beta release candidate yet.

Please start new threads for your issues as I have done here, yes your 
view and opinions are valid and must be given due consideration, but try 
to make any corporate procedure requests achievable for volunteers, we 
are not a corporation with deep pockets and endless resources.

>>     From 2.2.2 to
>> trunk - 1225 files modified, 32 deleted, 186 added.   Same problem, which
>> we discussed last year.
>>
>> Related email threads:
>>
>> http://mail-archives.apache.org/mod_mbox/river-dev/201211.mbox/%3C50B49395.30408%40qcg.nl%3E
>> http://mail-archives.apache.org/mod_mbox/river-dev/201211.mbox/%3CCAMPUXz9z%2Bk1XBgzfCONmRXh8y5yBtMjJNeUeTR03YEepe-60Zg%40mail.gmail.com%3E
>>
>> In the end, you’re right - qa_refactor is a skunk branch, so you can go
>> ahead and make all the changes you want. I just think that if your
>> intention is to get at least two other people to sign off on it as a
>> release, you’d do well to make it easy to review, and possibly even
>> include the community in the decision-making along the way.

The possiblity of failure exists in every project, it's how we overcome 
challenges that makes the difference.  If customers see an improvement 
in stability and scalability, adoption will increase.  I won't be making 
a beta release until the build has stabilised, so there's plenty of time 
for people to review the code in the mean time.

Get of your bums people, lets review the code, find mistakes I've made, 
help me fix them.  You can't eat an Elephant with one bite, pick off 
something you know and review it.

Regards,

Peter

>> Regards,
>>
>> Greg.
>>
>>
>>
>>
>

Re: Development Process - Experimental Testing, Theoretical analysis; code auditing and Static Analysis

Posted by Peter <ji...@zeus.net.au>.

You have no objections to Theoretical analysis?

Now you've had a chance to review the JMM, has it changed your thoughts on Startable?

Regards,

Peter.

----- Original message -----
>
> On Jan 4, 2014, at 6:15 PM, Peter <ji...@zeus.net.au> wrote:
>
> > >
> > > “we’re using final variables, therefore all our code has to change”
> > > (paraphrasing) are no substitute for reasoned analysis.   
> >
> > You don't agree that final field usage should comply with the JMM?
> >
> > Please also read the Java Memory Model.
> >
>
> JMM is part of the JLS now. I read it.   I read with particular interest
> the parts about the many possibilities for when the values of final
> variables are frozen.
>
> > >
> > > I teach programming.     I see this all the time.     When people make
> > > changes based on what they “think” “might” be happening, it’s always
> > > a disaster.
> >
> > You're assuming FindBugs,   JMM non-compliance and common issues
> > described by Concurrency in Practise, when clearly and easily
> > identified fall into the same category as someone who's learning to
> > code for the first time?
> >
> > You're advocating a completely experimental driven approach, like
> > someone who's learning to program experiences.   It is very good
> > approach for learning…
> >
>
> Please don’t put words in my mouth.   I’m not talking about experimental
> vs theoretical.   I’m talking about diagnostic techniques.   You need to
> have a theory about what a problem is, gather information to support or
> disprove that theory (that could be either analysis or experiments), and
> only then make changes to the code.   Then gather information that proves
> your fix actually made a difference (again, could be analysis or
> experiments).
>
> >
> > This would prevent a move from experimental to include a more
> > theoretical development approach.   Test cases are still required to
> > pass with a theoretical approach.
> >
>
> Not at all.   “Findbugs reports a possible data race in TaskManager” is a
> perfectly good JIRA issue that is specific enough to be actionable and
> should be limited enough in scope that peers can review the solution.
>
> > and includes the community in decisions around how to fix
> > > the problem that is stated.
> >
> > And when a simple discussion about a simple issue veers off topic with
> > heated arguments, what then?   You propose I stop committing to skunk,
> > until each change is discussed at length on the list?   But we can't
> > even agree on final field usage?
> >
> > We need to make a decision whether to include theoretical driven
> > development that includes compliance to standards like the JMM.
> >
>
> As I’ve said before, my concern is not theoretical vs experimental.
> It’s “what is this change, why was it made, what are the effects, and
> are its effects justifiable?”
>
> > Code has been committed to skunk, for people to review, in public,
> > they might misinterpret my suggestions on the mail list, but they will
> > have a better understanding after reading the code, I welcome their
> > participation.
> >
>
> There are 1557 files modified between 2.2.2 and qa_refactor.   62
> deleted.   214 added.   How practical is it to review?
>
> > I am open to other solutions, but I don't like the suggestion that non
> > compliant code is preferred until a test case that proves failure can
> > be found.   What happens when the test case only fails on one platform,
> > but is not easily repeatable?   What if the fix just changes timing to
> > mask the original problem?   
> >
> > Testing doesn't prove the absence of errors?   Reducing errors further
> > requires additonal tools like FindBugs and public open peer review.
> >
> > Trunk development was halted because of unrelated test failures.
>
> It was halted because the community got nervous and started asking for
> review-then-commit, so you moved over to qa_refactor.   From 2.2.2 to
> trunk - 1225 files modified, 32 deleted, 186 added.   Same problem, which
> we discussed last year.
>
> Related email threads:
>
> http://mail-archives.apache.org/mod_mbox/river-dev/201211.mbox/%3C50B49395.30408%40qcg.nl%3E
> http://mail-archives.apache.org/mod_mbox/river-dev/201211.mbox/%3CCAMPUXz9z%2Bk1XBgzfCONmRXh8y5yBtMjJNeUeTR03YEepe-60Zg%40mail.gmail.com%3E
>
> In the end, you’re right - qa_refactor is a skunk branch, so you can go
> ahead and make all the changes you want. I just think that if your
> intention is to get at least two other people to sign off on it as a
> release, you’d do well to make it easy to review, and possibly even
> include the community in the decision-making along the way.
>
> Regards,
>
> Greg.
>
>
>
>

Re: Development Process - Experimental Testing, Theoretical analysis; code auditing and Static Analysis

Posted by Greg Trasuk <tr...@stratuscom.com>.

On Jan 4, 2014, at 6:15 PM, Peter <ji...@zeus.net.au> wrote:

>> 
>> “we’re using final variables, therefore all our code has to change”
>> (paraphrasing) are no substitute for reasoned analysis.  
> 
> You don't agree that final field usage should comply with the JMM?
> 
> Please also read the Java Memory Model.
> 

JMM is part of the JLS now. I read it.  I read with particular interest the parts about the many possibilities for when the values of final variables are frozen.

>> 
>> I teach programming.   I see this all the time.   When people make changes
>> based on what they “think” “might” be happening, it’s always a disaster.
> 
> You're assuming FindBugs,  JMM non-compliance and common issues described by Concurrency in Practise, when clearly and easily identified fall into the same category as someone who's learning to code for the first time?
> 
> You're advocating a completely experimental driven approach, like someone who's learning to program experiences.  It is very good approach for learning…
> 

Please don’t put words in my mouth.  I’m not talking about experimental vs theoretical.  I’m talking about diagnostic techniques.  You need to have a theory about what a problem is, gather information to support or disprove that theory (that could be either analysis or experiments), and only then make changes to the code.  Then gather information that proves your fix actually made a difference (again, could be analysis or experiments).

> 
> This would prevent a move from experimental to include a more theoretical development approach.  Test cases are still required to pass with a theoretical approach.
> 

Not at all.  “Findbugs reports a possible data race in TaskManager” is a perfectly good JIRA issue that is specific enough to be actionable and should be limited enough in scope that peers can review the solution.

> and includes the community in decisions around how to fix
>> the problem that is stated. 
> 
> And when a simple discussion about a simple issue veers off topic with heated arguments, what then?  You propose I stop committing to skunk, until each change is discussed at length on the list?  But we can't even agree on final field usage?
> 
> We need to make a decision whether to include theoretical driven development that includes compliance to standards like the JMM.
> 

As I’ve said before, my concern is not theoretical vs experimental.  It’s “what is this change, why was it made, what are the effects, and are its effects justifiable?”

> Code has been committed to skunk, for people to review, in public, they might misinterpret my suggestions on the mail list, but they will have a better understanding after reading the code, I welcome their participation.
> 

There are 1557 files modified between 2.2.2 and qa_refactor.  62 deleted.  214 added.  How practical is it to review?

> I am open to other solutions, but I don't like the suggestion that non compliant code is preferred until a test case that proves failure can be found.  What happens when the test case only fails on one platform, but is not easily repeatable?  What if the fix just changes timing to mask the original problem?  
> 
> Testing doesn't prove the absence of errors?  Reducing errors further requires additonal tools like FindBugs and public open peer review.
> 
> Trunk development was halted because of unrelated test failures.

It was halted because the community got nervous and started asking for review-then-commit, so you moved over to qa_refactor.  From 2.2.2 to trunk - 1225 files modified, 32 deleted, 186 added.  Same problem, which we discussed last year.

Related email threads: 

http://mail-archives.apache.org/mod_mbox/river-dev/201211.mbox/%3C50B49395.30408%40qcg.nl%3E
http://mail-archives.apache.org/mod_mbox/river-dev/201211.mbox/%3CCAMPUXz9z%2Bk1XBgzfCONmRXh8y5yBtMjJNeUeTR03YEepe-60Zg%40mail.gmail.com%3E

In the end, you’re right - qa_refactor is a skunk branch, so you can go ahead and make all the changes you want. I just think that if your intention is to get at least two other people to sign off on it as a release, you’d do well to make it easy to review, and possibly even include the community in the decision-making along the way.

Regards,

Greg.

Re: Development Process - Experimental Testing, Theoretical analysis; code auditing and Static Analysis

Posted by Peter <ji...@zeus.net.au>.

----- Original message -----
> 
> @Patricia - I understand your interpretation, and I think I’ve misled
> you by including the words “concrete, observable” in my question.   I’m
> not talking about concurrency bugs here.   I’m perfectly OK with making
> changes based on analysis that shows a possible race condition.
> However, I spent most of yesterday reading the Java Language
> Specification and I’m now more convinced than ever that statements like
> “we’re using final variables, therefore all our code has to change”
> (paraphrasing) are no substitute for reasoned analysis.  

You don't agree that final field usage should comply with the JMM?

Please also read the Java Memory Model.

 Basically, I’m
> asserting the very common professional view that changes to existing
> code should be traceable to a requirement or a problem.   Further, those
> changes should be carefully considered and designed to minimize the
> chance of breaking anything else.   I don’t find the argument that “I
> fixed a concurrency problem, so it’s expected to have 35 other failures”
> convincing.   From the outside, that looks a lot like introducing
> failures in an attempt to fix a poorly understood or poorly stated
> problem.

You're misinterpreting my development effort, I haven't yet fixed ServiceDiscoveryManager, but I have fixed other classes, until only a few commits ago ServiceDiscoveryManager was not failing (at least not often).  Although ServiceDiscoveryManager hasn't changed since then, it is now causing multiple test failures.

So while the other classes that use ExecutorService aren't failing, ServiceDiscoveryManager is failing precisely because we're using TaskManager's fundamentally broken runAfter method.

We have previously focused on the problem that runAfter was designed to address and found this was not a good solution to the original problem.

> 
> I teach programming.   I see this all the time.   When people make changes
> based on what they “think” “might” be happening, it’s always a disaster.

You're assuming FindBugs,  JMM non-compliance and common issues described by Concurrency in Practise, when clearly and easily identified fall into the same category as someone who's learning to code for the first time?

You're advocating a completely experimental driven approach, like someone who's learning to program experiences.  It is very good approach for learning...

Do you not teach your students how to safely use final fields?

> 
> @Peter - You’re asking the wrong question.   The right question is
> whether we should have a review-then-commit policy for existing code
> that traces a well-defined (and ideally limited) package of changes to a
> JIRA issue, 

This would prevent a move from experimental to include a more theoretical development approach.  Test cases are still required to pass with a theoretical approach.

and includes the community in decisions around how to fix
> the problem that is stated. 

And when a simple discussion about a simple issue veers off topic with heated arguments, what then?  You propose I stop committing to skunk, until each change is discussed at length on the list?  But we can't even agree on final field usage?

We need to make a decision whether to include theoretical driven development that includes compliance to standards like the JMM.

Code has been committed to skunk, for people to review, in public, they might misinterpret my suggestions on the mail list, but they will have a better understanding after reading the code, I welcome their participation.

I am open to other solutions, but I don't like the suggestion that non compliant code is preferred until a test case that proves failure can be found.  What happens when the test case only fails on one platform, but is not easily repeatable?  What if the fix just changes timing to mask the original problem?  

Testing doesn't prove the absence of errors?  Reducing errors further requires additonal tools like FindBugs and public open peer review.

Trunk development was halted because of unrelated test failures.

If you try to fix those test failures it will cause a cascading effect of additional new test failures.  I am still fixing those bugs in qa-refactor.

If the outcome of this discussion is to exclude theoretical driven development (FindBugs, JMM compliance), then the only option will be to revert trunk and exclude code that causes unrelated test failures.

The java platform is also undergoing development, this too will cause timing changes resulting in test failures.  What do we do then when we can't fix them because each fix causes other unrelated test failures?

On that note, Java 7 support is not well tested, 2.2 has only been tested on Linux x64, no results for the jtreg tests have been posted and I have previously seen discovery event related test failures in the 2.2 branch.

The qa-refactor branch is relatively stable on Linux x64.

The test failures we're discussing occur on Windows x64, perhaps they also occur with 2.2 on windows, should we test 2.2 on windows too?

If 2.2 fails on windows that gives us a provable test case doesn't it?   The test failures are then proven to be only indirectly related to changes in qa-refactor?

Regards,

Peter.

  And I will suggest pre-emptively that a
> JIRA issue that says “concurrency problems exist” is not specific enough
> to drive development.
> 
> Greg.
> 
> On Jan 4, 2014, at 4:59 AM, Peter Firmstone <ji...@zeus.net.au> wrote:
> 
> > Please provide your thoughts on the following:
> > 
> > How do we develop code for River?
> > 
> > Do we only use experimental based development, or do we also allow
> > theoretical development also? Do we only fix bugs that can be
> > demonstrated with a test case, or do we fix bugs identified by
> > FindBugs and manual code auditing as well?
> > 
> > Should we allow theoretical development based on standards like the
> > Java Memory Model with visual auditing and static analysis with
> > FindBugs, or should we prohibit fixing bugs that don't include a test
> > case demonstrating the failure?
> > 
> > Regards,
> > 
> > Peter.
> > 
> > 
> > On 4/01/2014 6:58 PM, Patricia Shanahan wrote:
> > > Just before Christmas, you were discussing whether to fix
> > > concurrency problems based on theoretical analysis, or to only fix
> > > those problems for which there is experimental evidence.
> > > 
> > > I believe the PMC will be at cross-purposes until you resolve that
> > > issue, and strongly advise discussing and voting on it.
> > > 
> > > This is an example of a question whose answer would be obvious and
> > > non-controversial if you had agreement, either way, on that general
> > > issue. "When do you claim that this happens?   And what currently
> > > happens now that is unacceptable?   What is the concrete, observable
> > > problem that you’re trying to solve, that justifies introducing
> > > failures that require further work?" is a valid, and important, set
> > > of questions if you are only going to fix concurrency bugs for which
> > > there is experimental evidence. It is irrelevant if you are going to
> > > fix concurrency bugs based on theoretical analysis.
> > > 
> > > Patricia
> > > 
> > > On 1/3/2014 10:14 PM, Greg Trasuk wrote:
> > > > 
> > > > On Jan 4, 2014, at 12:52 AM, Peter Firmstone <ji...@zeus.net.au>
> > > > wrote:
> > > > 
> > > > > On 4/01/2014 3:18 PM, Greg Trasuk wrote:
> > > > > > I’ll also point out Patricia’s recent statement that
> > > > > > TaskManager should be reasonably efficient for small task
> > > > > > queues, but less efficient for larger task queues.   We don’t
> > > > > > have solid evidence that the task queues ever get large. 
> > > > > > Hence, the assertion that “TaskManager doesn’t scale” is
> > > > > > meaningless.
> > > > > 
> > > > > No, it's not about scalability, it's about the window of time
> > > > > when a task is removed from the queue in TaskManager for
> > > > > execution but fails and needs to be retried later. 
> > > > > Task.runAfter doesn't contain the task that "should have
> > > > > executed" so dependant tasks proceed before their depenencies.
> > > > > 
> > > > > This code comment from ServiceDiscoveryManager might help:
> > > > > 
> > > > > /** This task class, when executed, first registers to receive
> > > > > *   ServiceEvents from the given ServiceRegistrar. If the
> > > > > registration *   process succeeds (no RemoteExceptions), it then
> > > > > executes the *   LookupTask to query the given ServiceRegistrar
> > > > > for a "snapshot" *   of its current state with respect to
> > > > > services that match the *   given template.
> > > > > *
> > > > > *   Note that the order of execution of the two tasks is
> > > > > important. *   That is, the LookupTask must be executed only
> > > > > after registration *   for events has completed. This is because
> > > > > when an entity registers *   with the event mechanism of a
> > > > > ServiceRegistrar, the entity will *   only receive notification
> > > > > of events that occur "in the future", *   after the registration
> > > > > is made. The entity will not receive events *   about changes to
> > > > > the state of the ServiceRegistrar that may have *   occurred
> > > > > before or during the registration process. *
> > > > > *   Thus, if the order of these tasks were reversed and the
> > > > > LookupTask *   were to be executed prior to the
> > > > > RegisterListenerTask, then the *   possibility exists for the
> > > > > occurrence of a change in the *   ServiceRegistrar's state
> > > > > between the time the LookupTask retrieves *   a snapshot of that
> > > > > state, and the time the event registration *   process has
> > > > > completed, resulting in an incorrect view of the *   current
> > > > > state of the ServiceRegistrar.
> > > > > 
> > > > 
> > > > When do you claim that this happens?   And what currently happens
> > > > now that is unacceptable?   What is the concrete, observable
> > > > problem that you’re trying to solve, that justifies introducing
> > > > failures that require further work?
> > > 
> > > 
> > 
>

Re: Development Process - Experimental Testing, Theoretical analysis; code auditing and Static Analysis

Posted by Greg Trasuk <tr...@stratuscom.com>.

@Patricia - I understand your interpretation, and I think I’ve misled you by including the words “concrete, observable” in my question.  I’m not talking about concurrency bugs here.  I’m perfectly OK with making changes based on analysis that shows a possible race condition.  However, I spent most of yesterday reading the Java Language Specification and I’m now more convinced than ever that statements like “we’re using final variables, therefore all our code has to change” (paraphrasing) are no substitute for reasoned analysis.  Basically, I’m asserting the very common professional view that changes to existing code should be traceable to a requirement or a problem.  Further, those changes should be carefully considered and designed to minimize the chance of breaking anything else.  I don’t find the argument that “I fixed a concurrency problem, so it’s expected to have 35 other failures” convincing.  From the outside, that looks a lot like introducing failures in an attempt to fix a poorly understood or poorly stated problem.

I teach programming.  I see this all the time.  When people make changes based on what they “think” “might” be happening, it’s always a disaster.

@Peter - You’re asking the wrong question.  The right question is whether we should have a review-then-commit policy for existing code that traces a well-defined (and ideally limited) package of changes to a JIRA issue, and includes the community in decisions around how to fix the problem that is stated.  And I will suggest pre-emptively that a JIRA issue that says “concurrency problems exist” is not specific enough to drive development.

Greg.

On Jan 4, 2014, at 4:59 AM, Peter Firmstone <ji...@zeus.net.au> wrote:

> Please provide your thoughts on the following:
> 
> How do we develop code for River?
> 
> Do we only use experimental based development, or do we also allow theoretical development also?
> Do we only fix bugs that can be demonstrated with a test case, or do we fix bugs identified by FindBugs and manual code auditing as well?
> 
> Should we allow theoretical development based on standards like the Java Memory Model with visual auditing and static analysis with FindBugs, or should we prohibit fixing bugs that don't include a test case demonstrating the failure?
> 
> Regards,
> 
> Peter.
> 
> 
> On 4/01/2014 6:58 PM, Patricia Shanahan wrote:
>> Just before Christmas, you were discussing whether to fix concurrency problems based on theoretical analysis, or to only fix those problems for which there is experimental evidence.
>> 
>> I believe the PMC will be at cross-purposes until you resolve that issue, and strongly advise discussing and voting on it.
>> 
>> This is an example of a question whose answer would be obvious and non-controversial if you had agreement, either way, on that general issue. "When do you claim that this happens?  And what currently happens now that is unacceptable?  What is the concrete, observable problem that you’re trying to solve, that justifies introducing failures that require further work?" is a valid, and important, set of questions if you are only going to fix concurrency bugs for which there is experimental evidence. It is irrelevant if you are going to fix concurrency bugs based on theoretical analysis.
>> 
>> Patricia
>> 
>> On 1/3/2014 10:14 PM, Greg Trasuk wrote:
>>> 
>>> On Jan 4, 2014, at 12:52 AM, Peter Firmstone <ji...@zeus.net.au> wrote:
>>> 
>>>> On 4/01/2014 3:18 PM, Greg Trasuk wrote:
>>>>> I’ll also point out Patricia’s recent statement that TaskManager should be reasonably efficient for small task queues, but less efficient for larger task queues.  We don’t have solid evidence that the task queues ever get large.  Hence, the assertion that “TaskManager doesn’t scale” is meaningless.
>>>> 
>>>> No, it's not about scalability, it's about the window of time when a task is removed from the queue in TaskManager for execution but fails and needs to be retried later.  Task.runAfter doesn't contain the task that "should have executed" so dependant tasks proceed before their depenencies.
>>>> 
>>>> This code comment from ServiceDiscoveryManager might help:
>>>> 
>>>>       /** This task class, when executed, first registers to receive
>>>>         *  ServiceEvents from the given ServiceRegistrar. If the registration
>>>>         *  process succeeds (no RemoteExceptions), it then executes the
>>>>         *  LookupTask to query the given ServiceRegistrar for a "snapshot"
>>>>         *  of its current state with respect to services that match the
>>>>         *  given template.
>>>>         *
>>>>         *  Note that the order of execution of the two tasks is important.
>>>>         *  That is, the LookupTask must be executed only after registration
>>>>         *  for events has completed. This is because when an entity registers
>>>>         *  with the event mechanism of a ServiceRegistrar, the entity will
>>>>         *  only receive notification of events that occur "in the future",
>>>>         *  after the registration is made. The entity will not receive events
>>>>         *  about changes to the state of the ServiceRegistrar that may have
>>>>         *  occurred before or during the registration process.
>>>>         *
>>>>         *  Thus, if the order of these tasks were reversed and the LookupTask
>>>>         *  were to be executed prior to the RegisterListenerTask, then the
>>>>         *  possibility exists for the occurrence of a change in the
>>>>         *  ServiceRegistrar's state between the time the LookupTask retrieves
>>>>         *  a snapshot of that state, and the time the event registration
>>>>         *  process has completed, resulting in an incorrect view of the
>>>>         *  current state of the ServiceRegistrar.
>>>> 
>>> 
>>> When do you claim that this happens?  And what currently happens now that is unacceptable?  What is the concrete, observable problem that you’re trying to solve, that justifies introducing failures that require further work?
>> 
>> 
>