You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ofbiz.apache.org by Adam Heath <do...@brainfood.com> on 2012/05/02 23:32:12 UTC

Re: framework/base tests failing

[trying to stop this thread a bit]

If the root cause *is* a time-slice issue, the fix is to *not*
increase the timeout.  Dead stop.  Changing the timeout will just
cause it to again fail at some point in the future when computers get
faster yet again.

The correct fix is to make the timeout *not matter*.  However, writing
such test cases is *extremely* hard; that's why the current tests use
timeouts, because they are much simpler to write, and understand by
mortals.

I can fix the tests, but it will take a *long* time; and sometimes it
requires adding correct non-polling/event-dispatch to tons of other
classes.

On 04/30/2012 07:48 AM, Jacopo Cappellato wrote:
> Pierre,
> 
> please also consider that the dev list should be used by OFBiz committers to discuss about development and project related tasks; we are happy if non committers follow the discussions and participate to votes (non binding votes) but they should limit the number of posts in the dev list and most of all avoid to argue with committers (to avoid confusion and waste of time of expert resources).
> 
> Kind regards,
> 
> Jacopo
> 
> On Apr 30, 2012, at 2:36 PM, Adrian Crum wrote:
> 
>> "whereby end-users can tweak this in there own environment (by e.g. a configuration setting)"
>>
>> There has been plenty of discussion on this already. Please read the previous replies, and the Jira issue mentioned in the replies.
>>
>> -Adrian
>>
>> On 4/30/2012 1:33 PM, Pierre Smits wrote:
>>> Is it so difficult to answer the questions?
>>>
>>> I did not state that it should be a configuration setting. I was just
>>> asking a few civilized questions in order to understand it more.
>>>
>>> Regards,
>>>
>>> Pierre
>>>
>>>
>>> 2012/4/30 Adrian Crum<ad...@sandglass-software.com>
>>>
>>>> This is NOT a configuration issue. Please stop trying to turn it into one.
>>>>
>>>> -Adrian
>>>>
>>>>
>>>> On 4/30/2012 1:23 PM, Pierre Smits wrote:
>>>>
>>>>> Adrian,
>>>>>
>>>>> I accept that there is a difference, but using vastly is an exaggeration.
>>>>>
>>>>> Are we going to provide a fix for this issue, whereby end-users can tweak
>>>>> this in there own environment (by e.g. a configuration setting), or are we
>>>>> just trying to find an optimal number so that these test don't fail
>>>>> anymore?
>>>>>
>>>>> How dependent on the environment is OFBiz regarding these unit test?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Pierre
>>>>>
>>>>> 2012/4/30 Adrian Crum<ad...@sandglass-software.com>
>>>>>  The two are vastly different. Configuring ports is something the end user
>>>>>> is responsible for. Cache unit tests that are failing need to be fixed.
>>>>>> Configuration != failed unit tests.
>>>>>>
>>>>>> -Adrian
>>>>>>
>>>>>>
>>>>>> On 4/30/2012 12:58 PM, Pierre Smits wrote:
>>>>>>
>>>>>>  This issue seems to be a same kind of problem as the change of test
>>>>>>> ports
>>>>>>> in trunk.
>>>>>>>
>>>>>>> Why are we so adament that end users should and must apply patches in
>>>>>>> their
>>>>>>> own test environment regarding test ports, while we - on the other hand
>>>>>>> -
>>>>>>> are trying to fix something in trunk that is along the same line?
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Pierre
>>>>>>>
>>>>>>> 2012/4/30 Adrian Crum<adrian.crum@sandglass-**s**oftware.com<http://software.com>
>>>>>>> <ad...@sandglass-software.com>
>>>>>>>  I will give it a try, but it will have to wait until tomorrow.
>>>>>>>
>>>>>>>> -Adrian
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/30/2012 12:42 PM, Jacopo Cappellato wrote:
>>>>>>>>
>>>>>>>>  If, as Adam mentioned, it is an issue caused by the time-slice in your
>>>>>>>>
>>>>>>>>> box, then setting a greater timeout may fix the issue... if you will
>>>>>>>>> be
>>>>>>>>> able to make it work with, let's say 600 ms (or even 1s) then I would
>>>>>>>>> like
>>>>>>>>> to commit the change to make the test a bit more robust (even if it
>>>>>>>>> will be
>>>>>>>>> slower).
>>>>>>>>>
>>>>>>>>> Jacopo
>>>>>>>>>
>>>>>>>>> On Apr 30, 2012, at 12:17 PM, Adrian Crum wrote:
>>>>>>>>>
>>>>>>>>>  On 4/30/2012 10:27 AM, Jacopo Cappellato wrote:
>>>>>>>>>
>>>>>>>>>  On Apr 23, 2012, at 3:47 PM, Adrian Crum wrote:
>>>>>>>>>>>  I tried experimenting with the sleep timing and I also replaced the
>>>>>>>>>>>
>>>>>>>>>>>  Thread.sleep call with a safer version, but the tests still failed.
>>>>>>>>>>>>  interesting... but if you change the Thread.sleep timeout from 200
>>>>>>>>>>>>
>>>>>>>>>>> to
>>>>>>>>>>> 2000 it works, right?
>>>>>>>>>>>
>>>>>>>>>>>  I changed it to 300. By the way, the test finally passed for the
>>>>>>>>>>>
>>>>>>>>>> first
>>>>>>>>>> time when I had another non-OFBiz process running at the same time
>>>>>>>>>> that was
>>>>>>>>>> making heavy use of the hard disk.
>>>>>>>>>>
>>>>>>>>>> -Adrian
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>

Re: framework/base tests failing

Posted by Adam Heath <do...@brainfood.com>.

On 05/03/2012 04:44 AM, Adrian Crum wrote:
> I agree that it is a workaround and not a solution. The problem is,
> the tests fail 100% of the time on my development machine, and they
> fail intermittently on the ASF CI machine and our (1Tech) CI machine.
> So, the workaround is needed to get various CI systems working again.
> 
> Increasing the timeout fixed two of the three failing tests.

All the times in that file are finely tuned tho, to not be too large,
and the test takes a long time, but still be the logical values, based
on ttls in the code being tested.

I'll queue this up to look at, after the kek stuff is done(so very
close on this), and before backporting the view-condition new-feature.

Re: framework/base tests failing

Posted by Adrian Crum <ad...@sandglass-software.com>.

I agree that it is a workaround and not a solution. The problem is, the 
tests fail 100% of the time on my development machine, and they fail 
intermittently on the ASF CI machine and our (1Tech) CI machine. So, the 
workaround is needed to get various CI systems working again.

Increasing the timeout fixed two of the three failing tests.

-Adrian


On 5/2/2012 10:32 PM, Adam Heath wrote:
> [trying to stop this thread a bit]
>
> If the root cause *is* a time-slice issue, the fix is to *not*
> increase the timeout.  Dead stop.  Changing the timeout will just
> cause it to again fail at some point in the future when computers get
> faster yet again.
>
> The correct fix is to make the timeout *not matter*.  However, writing
> such test cases is *extremely* hard; that's why the current tests use
> timeouts, because they are much simpler to write, and understand by
> mortals.
>
> I can fix the tests, but it will take a *long* time; and sometimes it
> requires adding correct non-polling/event-dispatch to tons of other
> classes.
>
> On 04/30/2012 07:48 AM, Jacopo Cappellato wrote:
>> Pierre,
>>
>> please also consider that the dev list should be used by OFBiz committers to discuss about development and project related tasks; we are happy if non committers follow the discussions and participate to votes (non binding votes) but they should limit the number of posts in the dev list and most of all avoid to argue with committers (to avoid confusion and waste of time of expert resources).
>>
>> Kind regards,
>>
>> Jacopo
>>
>> On Apr 30, 2012, at 2:36 PM, Adrian Crum wrote:
>>
>>> "whereby end-users can tweak this in there own environment (by e.g. a configuration setting)"
>>>
>>> There has been plenty of discussion on this already. Please read the previous replies, and the Jira issue mentioned in the replies.
>>>
>>> -Adrian
>>>
>>> On 4/30/2012 1:33 PM, Pierre Smits wrote:
>>>> Is it so difficult to answer the questions?
>>>>
>>>> I did not state that it should be a configuration setting. I was just
>>>> asking a few civilized questions in order to understand it more.
>>>>
>>>> Regards,
>>>>
>>>> Pierre
>>>>
>>>>
>>>> 2012/4/30 Adrian Crum<ad...@sandglass-software.com>
>>>>
>>>>> This is NOT a configuration issue. Please stop trying to turn it into one.
>>>>>
>>>>> -Adrian
>>>>>
>>>>>
>>>>> On 4/30/2012 1:23 PM, Pierre Smits wrote:
>>>>>
>>>>>> Adrian,
>>>>>>
>>>>>> I accept that there is a difference, but using vastly is an exaggeration.
>>>>>>
>>>>>> Are we going to provide a fix for this issue, whereby end-users can tweak
>>>>>> this in there own environment (by e.g. a configuration setting), or are we
>>>>>> just trying to find an optimal number so that these test don't fail
>>>>>> anymore?
>>>>>>
>>>>>> How dependent on the environment is OFBiz regarding these unit test?
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Pierre
>>>>>>
>>>>>> 2012/4/30 Adrian Crum<ad...@sandglass-software.com>
>>>>>>   The two are vastly different. Configuring ports is something the end user
>>>>>>> is responsible for. Cache unit tests that are failing need to be fixed.
>>>>>>> Configuration != failed unit tests.
>>>>>>>
>>>>>>> -Adrian
>>>>>>>
>>>>>>>
>>>>>>> On 4/30/2012 12:58 PM, Pierre Smits wrote:
>>>>>>>
>>>>>>>   This issue seems to be a same kind of problem as the change of test
>>>>>>>> ports
>>>>>>>> in trunk.
>>>>>>>>
>>>>>>>> Why are we so adament that end users should and must apply patches in
>>>>>>>> their
>>>>>>>> own test environment regarding test ports, while we - on the other hand
>>>>>>>> -
>>>>>>>> are trying to fix something in trunk that is along the same line?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Pierre
>>>>>>>>
>>>>>>>> 2012/4/30 Adrian Crum<adrian.crum@sandglass-**s**oftware.com<http://software.com>
>>>>>>>> <ad...@sandglass-software.com>
>>>>>>>>   I will give it a try, but it will have to wait until tomorrow.
>>>>>>>>
>>>>>>>>> -Adrian
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 4/30/2012 12:42 PM, Jacopo Cappellato wrote:
>>>>>>>>>
>>>>>>>>>   If, as Adam mentioned, it is an issue caused by the time-slice in your
>>>>>>>>>
>>>>>>>>>> box, then setting a greater timeout may fix the issue... if you will
>>>>>>>>>> be
>>>>>>>>>> able to make it work with, let's say 600 ms (or even 1s) then I would
>>>>>>>>>> like
>>>>>>>>>> to commit the change to make the test a bit more robust (even if it
>>>>>>>>>> will be
>>>>>>>>>> slower).
>>>>>>>>>>
>>>>>>>>>> Jacopo
>>>>>>>>>>
>>>>>>>>>> On Apr 30, 2012, at 12:17 PM, Adrian Crum wrote:
>>>>>>>>>>
>>>>>>>>>>   On 4/30/2012 10:27 AM, Jacopo Cappellato wrote:
>>>>>>>>>>
>>>>>>>>>>   On Apr 23, 2012, at 3:47 PM, Adrian Crum wrote:
>>>>>>>>>>>>   I tried experimenting with the sleep timing and I also replaced the
>>>>>>>>>>>>
>>>>>>>>>>>>   Thread.sleep call with a safer version, but the tests still failed.
>>>>>>>>>>>>>   interesting... but if you change the Thread.sleep timeout from 200
>>>>>>>>>>>>>
>>>>>>>>>>>> to
>>>>>>>>>>>> 2000 it works, right?
>>>>>>>>>>>>
>>>>>>>>>>>>   I changed it to 300. By the way, the test finally passed for the
>>>>>>>>>>>>
>>>>>>>>>>> first
>>>>>>>>>>> time when I had another non-OFBiz process running at the same time
>>>>>>>>>>> that was
>>>>>>>>>>> making heavy use of the hard disk.
>>>>>>>>>>>
>>>>>>>>>>> -Adrian
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>

Re: framework/base tests failing

Posted by Adam Heath <do...@brainfood.com>.

On 05/02/2012 05:26 PM, Jacopo Cappellato wrote:
> 
> On May 2, 2012, at 11:32 PM, Adam Heath wrote:
> 
>> If the root cause *is* a time-slice issue, the fix is to *not*
>> increase the timeout.  Dead stop.  Changing the timeout will just
>> cause it to again fail at some point in the future when computers get
>> faster yet again.
> 
> I would instead increase the timeout and then revisit the issue when we will run OFBiz in the computer of the future... but if you are willing to fix this without this tweak then you are most welcome.

Increasing timeouts to fix a race condition just shows there might
actually be a very rare race condition that happen in production cases.

Who here likes fixing race conditions that only happen when OOM is
occuring?  It's not that there is a mem-leak, or that the problem is
OOM, but that only when an object is garbage-collected just before OOM
is thrown, and other nearby-code doesn't like the object going away,
and the race then occurs.

Re: framework/base tests failing

Posted by Jacopo Cappellato <ja...@hotwaxmedia.com>.

On May 2, 2012, at 11:32 PM, Adam Heath wrote:

> If the root cause *is* a time-slice issue, the fix is to *not*
> increase the timeout.  Dead stop.  Changing the timeout will just
> cause it to again fail at some point in the future when computers get
> faster yet again.

I would instead increase the timeout and then revisit the issue when we will run OFBiz in the computer of the future... but if you are willing to fix this without this tweak then you are most welcome.

Jacopo