You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Alexios Giotis <al...@gmail.com> on 2012/03/01 02:13:59 UTC

Re: Bugzilla #46962 - Deadlock in PropertyCache

Hi Vincent,

The unit tests where added by Mehdi but I could happily update them to Junit 4 and create the separate methods.

Related to the NullPointerException that you found, I tried many times but unfortunately I can't reproduce them on my Macbook running MacOS X 10.7.3  with any JRE and have not seen it in any heavily loaded server. What is the environment that you tested it ? I could test it on different servers/OS or try to write a test of ConcurrentHashMap that can trigger it.


Related to what is happening, this is tricky. The ConcurrentHashMap does not allow putting null values, so one would not expect to get back null values. But in the cleanReclaimedMapEntries() we are getting the values using an iterator over the entrySet. As the javadoc says, the view's iterator are "weakly consistent", never throw ConcurrentModificationException and they are actually designed to be used by only one thread at a time. So this could explain why a null value could be returned. 

Depending on how this is reproduced, we could either avoid this by just checking for a null value or we could put the iteration in a block synchronized on the map instance. This loop is done once per 10000 items added, so it should not be slower. But before making this change, I need a way to reproduce it so that we are sure about it.


Alex Giotis




On Feb 29, 2012, at 11:42 PM, Vincent Hennebert wrote:

> Sorry for the delay about this. I had a look at the patch some time ago
> but didn’t have enough time to apply it.
> 
> I noticed that a NPE occurs when adding the following piece of code to
> PropertyCacheTestCase and running it:
> 
>    private final PropertyCache<Integer> cache = new PropertyCache<Integer>();
> 
>    private class CacheFiller implements Runnable {
> 
>        private final int start;
> 
>        CacheFiller(int start) {
>            this.start = start;
>        }
> 
>        public void run() {
>            for (int i = 0; i < 1000000; i++) {
>                cache.fetch(new Integer(start + i));
>            }
>        }
>    }
> 
>    public void testCleanUp() throws InterruptedException {
>        Thread t1 = new Thread(new CacheFiller(0));
>        Thread t2 = new Thread(new CacheFiller(10000));
>        t1.start();
>        t2.start();
>        t1.join();
>        t2.join();
>    }
> 
> Here’s the stacktrace:
> Exception in thread "Thread-1" java.lang.NullPointerException
> 	at
> org.apache.fop.fo.properties.PropertyCache.cleanReclaimedMapEntries(PropertyCache.java:153)
> 	at org.apache.fop.fo.properties.PropertyCache.fetch(PropertyCache.java:105)
> 	at
> org.apache.fop.fo.properties.PropertyCacheTestCase$CacheFiller.run(PropertyCacheTestCase.java:68)
> 	at java.lang.Thread.run(Thread.java:595)
> 
> It happens when running the test with a Sun 1.5 JRE but apparently not
> with OpenJDK 1.6. I’m still unsure about what’s happening.
> 
> It would be good to update the test cases to JUnit 4. Also, there are
> some clean-ups and improvements to bring. Among other things, symmetry
> is not properly tested in FObjectTest (symmetric.equals(getSut()) should
> also be run); the tests for reflexivity, symmetry and transitivity
> should be put in separate methods; testFailureCases is misnamed; etc.
> Little things but that take some time to fix.
> 
> Vincent
> 
> 
> On 28/02/12 17:09, mehdi houshmand wrote:
>> Hi Guys,
>> 
>> My apologies for the lack of transparency on this issue, but I didn't
>> actually review the changes you made here, in fact, I barely looked at
>> what PropertyCache actually does. I had some free time, and added a
>> bunch of unit tests.
>> 
>> The reason this hasn't been committed yet was because Vincent said he
>> had some questions about the patch. That's as far as I know, maybe he
>> could give some feedback on the issue.
>> 
>> Let me reiterate my apologies again on this, it's not fair that this
>> has been ignored. I'll endeavour to make the process more transparent
>> in future, I hope this doesn't prevent you or any other contributors
>> from submitting patches.
>> 
>> Mehdi
>> 
>> 
>> On 28 February 2012 16:52, Glenn Adams <gl...@skynav.com> wrote:
>>> I support committing this patch, however I don't see an ICLA listed at [1]
>>> for Alexios. Alexios, if you have not submitted an ICLA [2], please do so.
>>> 
>>> I would be happy to apply the patch (if Mehdi doesn't have the time).
>>> 
>>> [1] http://people.apache.org/committer-index.html#unlistedclas
>>> [2] http://www.apache.org/licenses/icla.txt
>>> 
>>> 
>>> On Tue, Feb 28, 2012 at 6:27 AM, Alexios Giotis <al...@gmail.com>
>>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> About 6 months ago, I had a deadlock issue that regularly stopped
>>>> production servers. While I was opening a bugzilla ticket, I found that this
>>>> was already reported back in 2009. This issue is still opened as it was
>>>> difficult to reproduce. On that issue, I added:
>>>> 
>>>> [1] An explanation of why a deadlock is possible.
>>>> [1] Stacktraces of deadlocked threads from a production server.
>>>> [2] A small unit test that adds a Thread.sleep() to the PropertyCache to
>>>> make it always reproducable.
>>>> [3] A patch solving this issue.
>>>> [4] Explanations of why the patch rewrites the existing PropertyCache
>>>> class.
>>>> 
>>>> This was then reviewed and unit tests were added [5]. On top of this, I
>>>> have committed the fix in my private branch and it works well on several big
>>>> production systems. This is as far as I can go before a FOP committer takes
>>>> it further. I am writing this because:
>>>> 
>>>> - Deadlocks should be fixed. When they occur, there is no way around them.
>>>> - The trunk is moving, the patch is aging and it will be more difficult to
>>>> apply it over time.
>>>> - It is discouraging for submitting more patches.
>>>> 
>>>> 
>>>> Alexios Giotis
>>>> 
>>>> 
>>>> 
>>>> 
>>>> [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c3
>>>> [2] https://issues.apache.org/bugzilla/attachment.cgi?id=27342
>>>> [3] https://issues.apache.org/bugzilla/attachment.cgi?id=27477&action=diff
>>>> [4] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c7
>>>> [5] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c9
>>>> 
>>> 


Re: Bugzilla #46962 - Deadlock in PropertyCache

Posted by Alexios Giotis <al...@gmail.com>.
Patch updated, see
https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c15

Alex Giotis


On Mar 1, 2012, at 2:20 PM, Alexios Giotis wrote:

> Vincent, 
> 
> No need to send more info. I just reproduced it on Windows Server 2008 R2 Enterprise running the latest, 'non-business', but not supported as of Nov 2009, Sun JRE 1.5.0_22. The production servers are all running JDK6 and I could not test it on a Mac.
> 
> The bug is indeed related to the concurrent hash map iterators and is fixed on JDK6, see
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6312056
> 
> I will update the patch and I hope a committer will check it.
> 
> Alex Giotis
> 
> 
> 
> 
> On Mar 1, 2012, at 3:13 AM, Alexios Giotis wrote:
> 
>> Hi Vincent,
>> 
>> The unit tests where added by Mehdi but I could happily update them to Junit 4 and create the separate methods.
>> 
>> Related to the NullPointerException that you found, I tried many times but unfortunately I can't reproduce them on my Macbook running MacOS X 10.7.3  with any JRE and have not seen it in any heavily loaded server. What is the environment that you tested it ? I could test it on different servers/OS or try to write a test of ConcurrentHashMap that can trigger it.
>> 
>> 
>> Related to what is happening, this is tricky. The ConcurrentHashMap does not allow putting null values, so one would not expect to get back null values. But in the cleanReclaimedMapEntries() we are getting the values using an iterator over the entrySet. As the javadoc says, the view's iterator are "weakly consistent", never throw ConcurrentModificationException and they are actually designed to be used by only one thread at a time. So this could explain why a null value could be returned. 
>> 
>> Depending on how this is reproduced, we could either avoid this by just checking for a null value or we could put the iteration in a block synchronized on the map instance. This loop is done once per 10000 items added, so it should not be slower. But before making this change, I need a way to reproduce it so that we are sure about it.
>> 
>> 
>> Alex Giotis
>> 
>> 
>> 
>> 
>> On Feb 29, 2012, at 11:42 PM, Vincent Hennebert wrote:
>> 
>>> Sorry for the delay about this. I had a look at the patch some time ago
>>> but didn’t have enough time to apply it.
>>> 
>>> I noticed that a NPE occurs when adding the following piece of code to
>>> PropertyCacheTestCase and running it:
>>> 
>>>  private final PropertyCache<Integer> cache = new PropertyCache<Integer>();
>>> 
>>>  private class CacheFiller implements Runnable {
>>> 
>>>      private final int start;
>>> 
>>>      CacheFiller(int start) {
>>>          this.start = start;
>>>      }
>>> 
>>>      public void run() {
>>>          for (int i = 0; i < 1000000; i++) {
>>>              cache.fetch(new Integer(start + i));
>>>          }
>>>      }
>>>  }
>>> 
>>>  public void testCleanUp() throws InterruptedException {
>>>      Thread t1 = new Thread(new CacheFiller(0));
>>>      Thread t2 = new Thread(new CacheFiller(10000));
>>>      t1.start();
>>>      t2.start();
>>>      t1.join();
>>>      t2.join();
>>>  }
>>> 
>>> Here’s the stacktrace:
>>> Exception in thread "Thread-1" java.lang.NullPointerException
>>> 	at
>>> org.apache.fop.fo.properties.PropertyCache.cleanReclaimedMapEntries(PropertyCache.java:153)
>>> 	at org.apache.fop.fo.properties.PropertyCache.fetch(PropertyCache.java:105)
>>> 	at
>>> org.apache.fop.fo.properties.PropertyCacheTestCase$CacheFiller.run(PropertyCacheTestCase.java:68)
>>> 	at java.lang.Thread.run(Thread.java:595)
>>> 
>>> It happens when running the test with a Sun 1.5 JRE but apparently not
>>> with OpenJDK 1.6. I’m still unsure about what’s happening.
>>> 
>>> It would be good to update the test cases to JUnit 4. Also, there are
>>> some clean-ups and improvements to bring. Among other things, symmetry
>>> is not properly tested in FObjectTest (symmetric.equals(getSut()) should
>>> also be run); the tests for reflexivity, symmetry and transitivity
>>> should be put in separate methods; testFailureCases is misnamed; etc.
>>> Little things but that take some time to fix.
>>> 
>>> Vincent
>>> 
>>> 
>>> On 28/02/12 17:09, mehdi houshmand wrote:
>>>> Hi Guys,
>>>> 
>>>> My apologies for the lack of transparency on this issue, but I didn't
>>>> actually review the changes you made here, in fact, I barely looked at
>>>> what PropertyCache actually does. I had some free time, and added a
>>>> bunch of unit tests.
>>>> 
>>>> The reason this hasn't been committed yet was because Vincent said he
>>>> had some questions about the patch. That's as far as I know, maybe he
>>>> could give some feedback on the issue.
>>>> 
>>>> Let me reiterate my apologies again on this, it's not fair that this
>>>> has been ignored. I'll endeavour to make the process more transparent
>>>> in future, I hope this doesn't prevent you or any other contributors
>>>> from submitting patches.
>>>> 
>>>> Mehdi
>>>> 
>>>> 
>>>> On 28 February 2012 16:52, Glenn Adams <gl...@skynav.com> wrote:
>>>>> I support committing this patch, however I don't see an ICLA listed at [1]
>>>>> for Alexios. Alexios, if you have not submitted an ICLA [2], please do so.
>>>>> 
>>>>> I would be happy to apply the patch (if Mehdi doesn't have the time).
>>>>> 
>>>>> [1] http://people.apache.org/committer-index.html#unlistedclas
>>>>> [2] http://www.apache.org/licenses/icla.txt
>>>>> 
>>>>> 
>>>>> On Tue, Feb 28, 2012 at 6:27 AM, Alexios Giotis <al...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> About 6 months ago, I had a deadlock issue that regularly stopped
>>>>>> production servers. While I was opening a bugzilla ticket, I found that this
>>>>>> was already reported back in 2009. This issue is still opened as it was
>>>>>> difficult to reproduce. On that issue, I added:
>>>>>> 
>>>>>> [1] An explanation of why a deadlock is possible.
>>>>>> [1] Stacktraces of deadlocked threads from a production server.
>>>>>> [2] A small unit test that adds a Thread.sleep() to the PropertyCache to
>>>>>> make it always reproducable.
>>>>>> [3] A patch solving this issue.
>>>>>> [4] Explanations of why the patch rewrites the existing PropertyCache
>>>>>> class.
>>>>>> 
>>>>>> This was then reviewed and unit tests were added [5]. On top of this, I
>>>>>> have committed the fix in my private branch and it works well on several big
>>>>>> production systems. This is as far as I can go before a FOP committer takes
>>>>>> it further. I am writing this because:
>>>>>> 
>>>>>> - Deadlocks should be fixed. When they occur, there is no way around them.
>>>>>> - The trunk is moving, the patch is aging and it will be more difficult to
>>>>>> apply it over time.
>>>>>> - It is discouraging for submitting more patches.
>>>>>> 
>>>>>> 
>>>>>> Alexios Giotis
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c3
>>>>>> [2] https://issues.apache.org/bugzilla/attachment.cgi?id=27342
>>>>>> [3] https://issues.apache.org/bugzilla/attachment.cgi?id=27477&action=diff
>>>>>> [4] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c7
>>>>>> [5] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c9
>>>>>> 
>>>>> 
>> 
> 


Re: Bugzilla #46962 - Deadlock in PropertyCache

Posted by Alexios Giotis <al...@gmail.com>.
Vincent, 

No need to send more info. I just reproduced it on Windows Server 2008 R2 Enterprise running the latest, 'non-business', but not supported as of Nov 2009, Sun JRE 1.5.0_22. The production servers are all running JDK6 and I could not test it on a Mac.

The bug is indeed related to the concurrent hash map iterators and is fixed on JDK6, see
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6312056

I will update the patch and I hope a committer will check it.

Alex Giotis




On Mar 1, 2012, at 3:13 AM, Alexios Giotis wrote:

> Hi Vincent,
> 
> The unit tests where added by Mehdi but I could happily update them to Junit 4 and create the separate methods.
> 
> Related to the NullPointerException that you found, I tried many times but unfortunately I can't reproduce them on my Macbook running MacOS X 10.7.3  with any JRE and have not seen it in any heavily loaded server. What is the environment that you tested it ? I could test it on different servers/OS or try to write a test of ConcurrentHashMap that can trigger it.
> 
> 
> Related to what is happening, this is tricky. The ConcurrentHashMap does not allow putting null values, so one would not expect to get back null values. But in the cleanReclaimedMapEntries() we are getting the values using an iterator over the entrySet. As the javadoc says, the view's iterator are "weakly consistent", never throw ConcurrentModificationException and they are actually designed to be used by only one thread at a time. So this could explain why a null value could be returned. 
> 
> Depending on how this is reproduced, we could either avoid this by just checking for a null value or we could put the iteration in a block synchronized on the map instance. This loop is done once per 10000 items added, so it should not be slower. But before making this change, I need a way to reproduce it so that we are sure about it.
> 
> 
> Alex Giotis
> 
> 
> 
> 
> On Feb 29, 2012, at 11:42 PM, Vincent Hennebert wrote:
> 
>> Sorry for the delay about this. I had a look at the patch some time ago
>> but didn’t have enough time to apply it.
>> 
>> I noticed that a NPE occurs when adding the following piece of code to
>> PropertyCacheTestCase and running it:
>> 
>>   private final PropertyCache<Integer> cache = new PropertyCache<Integer>();
>> 
>>   private class CacheFiller implements Runnable {
>> 
>>       private final int start;
>> 
>>       CacheFiller(int start) {
>>           this.start = start;
>>       }
>> 
>>       public void run() {
>>           for (int i = 0; i < 1000000; i++) {
>>               cache.fetch(new Integer(start + i));
>>           }
>>       }
>>   }
>> 
>>   public void testCleanUp() throws InterruptedException {
>>       Thread t1 = new Thread(new CacheFiller(0));
>>       Thread t2 = new Thread(new CacheFiller(10000));
>>       t1.start();
>>       t2.start();
>>       t1.join();
>>       t2.join();
>>   }
>> 
>> Here’s the stacktrace:
>> Exception in thread "Thread-1" java.lang.NullPointerException
>> 	at
>> org.apache.fop.fo.properties.PropertyCache.cleanReclaimedMapEntries(PropertyCache.java:153)
>> 	at org.apache.fop.fo.properties.PropertyCache.fetch(PropertyCache.java:105)
>> 	at
>> org.apache.fop.fo.properties.PropertyCacheTestCase$CacheFiller.run(PropertyCacheTestCase.java:68)
>> 	at java.lang.Thread.run(Thread.java:595)
>> 
>> It happens when running the test with a Sun 1.5 JRE but apparently not
>> with OpenJDK 1.6. I’m still unsure about what’s happening.
>> 
>> It would be good to update the test cases to JUnit 4. Also, there are
>> some clean-ups and improvements to bring. Among other things, symmetry
>> is not properly tested in FObjectTest (symmetric.equals(getSut()) should
>> also be run); the tests for reflexivity, symmetry and transitivity
>> should be put in separate methods; testFailureCases is misnamed; etc.
>> Little things but that take some time to fix.
>> 
>> Vincent
>> 
>> 
>> On 28/02/12 17:09, mehdi houshmand wrote:
>>> Hi Guys,
>>> 
>>> My apologies for the lack of transparency on this issue, but I didn't
>>> actually review the changes you made here, in fact, I barely looked at
>>> what PropertyCache actually does. I had some free time, and added a
>>> bunch of unit tests.
>>> 
>>> The reason this hasn't been committed yet was because Vincent said he
>>> had some questions about the patch. That's as far as I know, maybe he
>>> could give some feedback on the issue.
>>> 
>>> Let me reiterate my apologies again on this, it's not fair that this
>>> has been ignored. I'll endeavour to make the process more transparent
>>> in future, I hope this doesn't prevent you or any other contributors
>>> from submitting patches.
>>> 
>>> Mehdi
>>> 
>>> 
>>> On 28 February 2012 16:52, Glenn Adams <gl...@skynav.com> wrote:
>>>> I support committing this patch, however I don't see an ICLA listed at [1]
>>>> for Alexios. Alexios, if you have not submitted an ICLA [2], please do so.
>>>> 
>>>> I would be happy to apply the patch (if Mehdi doesn't have the time).
>>>> 
>>>> [1] http://people.apache.org/committer-index.html#unlistedclas
>>>> [2] http://www.apache.org/licenses/icla.txt
>>>> 
>>>> 
>>>> On Tue, Feb 28, 2012 at 6:27 AM, Alexios Giotis <al...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> About 6 months ago, I had a deadlock issue that regularly stopped
>>>>> production servers. While I was opening a bugzilla ticket, I found that this
>>>>> was already reported back in 2009. This issue is still opened as it was
>>>>> difficult to reproduce. On that issue, I added:
>>>>> 
>>>>> [1] An explanation of why a deadlock is possible.
>>>>> [1] Stacktraces of deadlocked threads from a production server.
>>>>> [2] A small unit test that adds a Thread.sleep() to the PropertyCache to
>>>>> make it always reproducable.
>>>>> [3] A patch solving this issue.
>>>>> [4] Explanations of why the patch rewrites the existing PropertyCache
>>>>> class.
>>>>> 
>>>>> This was then reviewed and unit tests were added [5]. On top of this, I
>>>>> have committed the fix in my private branch and it works well on several big
>>>>> production systems. This is as far as I can go before a FOP committer takes
>>>>> it further. I am writing this because:
>>>>> 
>>>>> - Deadlocks should be fixed. When they occur, there is no way around them.
>>>>> - The trunk is moving, the patch is aging and it will be more difficult to
>>>>> apply it over time.
>>>>> - It is discouraging for submitting more patches.
>>>>> 
>>>>> 
>>>>> Alexios Giotis
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c3
>>>>> [2] https://issues.apache.org/bugzilla/attachment.cgi?id=27342
>>>>> [3] https://issues.apache.org/bugzilla/attachment.cgi?id=27477&action=diff
>>>>> [4] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c7
>>>>> [5] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c9
>>>>> 
>>>> 
>