You are viewing a plain text version of this content. The canonical link for it is here.
Posted to muse-dev@ws.apache.org by Ch...@swisscom.com on 2007/09/04 23:50:17 UTC

RE: EMPTY_DOC thread stability issues

Hi All,

Joining this thread on Vinh's recommendation.

This issue I believe stems from both

https://issues.apache.org/jira/browse/XERCESJ-911

where large amounts of Muse use this functionality.  Regarding the
ThreadLocal solution it doesn't work to cache documents themeselves.
The sdk is very clear that only DocumentFactory.getInstance is thread
safe.  You can use ThreadLocal with DocumentBuilders however. (new
DocumentBuilders is quite expensive, but new documents are much cheaper)

The solution below (combined with TLS DocumentBuilders) should resolve
the issue with both WRONG_DOCUMENTs and the 911 issue from xerces.  I
read in the rest of the thread that Vinh had a simple test case with
multiple threads.  I'm happy to do a quick refactor in the code to put
these two approaches into the code base, if you could send me the test
case Vinh, I'll get started on it tomorrow.

NB whilst this "should work" it is still against the spirit of xerces
which is share nothing.  I'd also note that due to the nodecache
optimisation within Xerces, using NodeLists its just not possible to be
thread safe even for reads.

copied from muse-user:

Hi,

>From what I could work out, from within the list comments and the code,
the state is stored in the Document itself, and as cloneNode uses
Object.clone and then sets the doc it won't work.  Using importNode
helps a little (as it uses getFirstChild()/getNextSibling()), but it
just puts the problem to a later stage.

getAllElements just does the same, calls getChildNodes and then forces
the cache to be used.  Deleting the cache just stops the null for the
parent, it doesn't stop incorrect nodes being returned or race
conditions with other nulls.

The simple thing is to stop using getChildNodes, from what I can see in
the code there isn't a need for it.  The only place I've seen that
doesn't require all of the nodes anyway is in EndpointReference's
getNumberOfParameters, but that behaviour can be safely cached (its not
used directly in the project anyway).

Looking further at the use cases in Muse only the IsolationLayer
(because of the DeferredImpl) needs to call hasChildNodes() on the
document node, for it to force that synchronizeChildren be called (its
cached from then on in each node).  Then every other piece of code can
simply pointer chase with the getFirstChild()/getNextSibling() approach.
No synchronization required.

re using other jaxp's, the DOM itself makes no statement about even read
thread safety.  All of the jaxp impls suffer some form of threading
problem.  Considering all of the problems with fighting against
namespace problems (much worse IMO) it makes sense to stick with the
devil you know :-<.

Again for most of the xerces releases using the
getFirstChild()/getNextSibling() is a seamless dropin for the
getChildNodes problem.  Its a shame that the xerces guys are very much
against any form of thread safety (except application enforced).  Going
with the standard approach the only safe thing is to always serialize to
objects / keep the strings around, which would overly complicate the
code.

I'm willing to give it a try and send you patched libs to try out (I
don't have a test case for this yet) if its quick to reproduce, just let
me know.  If it works out I can raise a jira with the patches.

cheers,
Chris 


---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org


RE: EMPTY_DOC thread stability issues

Posted by Ch...@swisscom.com.
Hi Vinh,

Could you add your test to the Jira so we can download and help out?

cheers,
Chris  

-----Original Message-----
From: Vinh Nguyen (vinguye2) [mailto:vinguye2@cisco.com] 
Sent: Wednesday, September 05, 2007 9:52 AM
To: muse-dev@ws.apache.org
Subject: RE: EMPTY_DOC thread stability issues 

Hi all,
After doing more testing, the issue is correctly because of the Xerces
limitation.
Both a Document and a Node/Element are *NOT* thread-safe.

The following shows that Document.importNode() is not thread-safe:

17:26:49,730 ERROR [STDERR] java.lang.NullPointerException
17:26:49,730 ERROR [STDERR]     at
org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown Source)
17:26:49,730 ERROR [STDERR]     at
org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown Source)
17:26:49,730 ERROR [STDERR]     at
org.apache.muse.ws.addressing.EndpointReference.<init>(EndpointReference
.java:186)
17:26:49,730 ERROR [STDERR]     at
org.apache.muse.ws.notification.impl.SimpleNotificationMessage.setProduc
erReference(SimpleNotificationMessage.java:209)
17:26:49,730 ERROR [STDERR]     at
org.apache.muse.ws.notification.impl.SimpleSubscriptionManager.publish(S
impleSubscriptionManager.java:256)

The following shows that reading the children of a Node/Element is not
thread-safe:

00:16:53,400 ERROR [STDERR] java.lang.NullPointerException
00:16:53,400 ERROR [STDERR] 	at
org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
00:16:53,400 ERROR [STDERR] 	at
org.apache.xerces.dom.ParentNode.item(Unknown Source)
00:16:53,400 ERROR [STDERR] 	at
org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:883)
00:16:53,400 ERROR [STDERR] 	at
org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:815)
00:16:53,400 ERROR [STDERR] 	at
org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:791)
00:16:53,400 ERROR [STDERR] 	at
org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.java:974)
00:16:53,400 ERROR [STDERR] 	at
org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.java:977)
00:16:53,400 ERROR [STDERR] 	at
org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.java:940)
00:16:53,400 ERROR [STDERR] 	at
org.apache.muse.ws.notification.impl.SimpleNotificationMessage.toXML(Sim
pleNotificationMessage.java:291)
00:16:53,415 ERROR [STDERR] 	at
org.apache.muse.ws.notification.impl.SimpleNotificationMessage.toXML(Sim
pleNotificationMessage.java:239)
00:16:53,415 ERROR [STDERR] 	at
org.apache.muse.ws.notification.remote.NotificationConsumerClient.notify
(NotificationConsumerClient.java:97)
00:16:53,415 ERROR [STDERR] 	at
org.apache.muse.ws.notification.impl.SimpleSubscriptionManager.publish(S
impleSubscriptionManager.java:267)

To work around these problem, we should follow these rules:

1) Do not use XmlUtils.EMPTY_DOC in multi-thread processes.
2) Do not access the children of a Node/Element from multiple threads.
3) Do not create a Node/Element with its owner as XmlUtils.EMPTY_DOC, if
the node's children will be accessed.
4) Do not create a Node/Element with its owner as XmlUtils.EMPTY_DOC, if
the node itself can be accessed from multiple threads.

This most likely will affect a lot of Muse code.  As Chris pointed out,
Oliver's ThreadLocal solution for the EMPTY_DOC won't work.  EMPTY_DOC
can still be used, but any code which uses it must follow the rules
above.  Also, the DocumentBuilderFactory can't be a singleton either
since it's also not thread-safe.

NOTIFICATION FIXES:
The following updates finally fix the multi-thread notifications issue
for me:

1) Updated EndpointReference.java.  Changed all XmlUtils.EMPTY_DOC
references to XmlUtil.createDocument().  This is because a resource's
EPR and its Element representation can be accessed from multiple
threads, so all rules above apply.

2) Updated SimpleNotificationMessage.toXML().  Changed XmlUtil.EMPTY_DOC
reference to XmlUtil.createDocument().  This is because a root Element
is created and XmlUtils.getAllNamespaces(root) is called on it, so rule
#3 apply.

3) Updated NotificationConsumerClient.notify(NotificationMessage[]
messages) to not assume that messages[x].toXML() will return an Element
whose owner is EMPTY_DOC.
Changed these lines:
        Element notify =
XmlUtils.createElement(WsnConstants.NOTIFY_QNAME);        
        for (int n = 0; n < messages.length; ++n)
            notify.appendChild(messages[n].toXML(), true); To these
lines:
        Document doc = XmlUtils.createDocument();
        Element notify = XmlUtils.createElement(doc,
WsnConstants.NOTIFY_QNAME);
        for (int n = 0; n < messages.length; ++n)
            notify.appendChild(doc.importNode(messages[n].toXML(),
true));

So now, I no longer get any exceptions when generating notifications
simultaneously from multiple producers.

BUT, now I have another problem: notifications are still being lost
somehow, yet no errors appear at all!  I have to investigate this
further to see what's causing this problem.  If anyone encounters this
problem, please post to the group:)




-----Original Message-----
From: Chris.Twiner@swisscom.com [mailto:Chris.Twiner@swisscom.com]
Sent: Tuesday, September 04, 2007 2:50 PM
To: muse-dev@ws.apache.org
Subject: RE: EMPTY_DOC thread stability issues 

Hi All,

Joining this thread on Vinh's recommendation.

This issue I believe stems from both

https://issues.apache.org/jira/browse/XERCESJ-911

where large amounts of Muse use this functionality.  Regarding the
ThreadLocal solution it doesn't work to cache documents themeselves.
The sdk is very clear that only DocumentFactory.getInstance is thread
safe.  You can use ThreadLocal with DocumentBuilders however. (new
DocumentBuilders is quite expensive, but new documents are much cheaper)

The solution below (combined with TLS DocumentBuilders) should resolve
the issue with both WRONG_DOCUMENTs and the 911 issue from xerces.  I
read in the rest of the thread that Vinh had a simple test case with
multiple threads.  I'm happy to do a quick refactor in the code to put
these two approaches into the code base, if you could send me the test
case Vinh, I'll get started on it tomorrow.

NB whilst this "should work" it is still against the spirit of xerces
which is share nothing.  I'd also note that due to the nodecache
optimisation within Xerces, using NodeLists its just not possible to be
thread safe even for reads.

copied from muse-user:

Hi,

>From what I could work out, from within the list comments and the code,
the state is stored in the Document itself, and as cloneNode uses
Object.clone and then sets the doc it won't work.  Using importNode
helps a little (as it uses getFirstChild()/getNextSibling()), but it
just puts the problem to a later stage.

getAllElements just does the same, calls getChildNodes and then forces
the cache to be used.  Deleting the cache just stops the null for the
parent, it doesn't stop incorrect nodes being returned or race
conditions with other nulls.

The simple thing is to stop using getChildNodes, from what I can see in
the code there isn't a need for it.  The only place I've seen that
doesn't require all of the nodes anyway is in EndpointReference's
getNumberOfParameters, but that behaviour can be safely cached (its not
used directly in the project anyway).

Looking further at the use cases in Muse only the IsolationLayer
(because of the DeferredImpl) needs to call hasChildNodes() on the
document node, for it to force that synchronizeChildren be called (its
cached from then on in each node).  Then every other piece of code can
simply pointer chase with the getFirstChild()/getNextSibling() approach.
No synchronization required.

re using other jaxp's, the DOM itself makes no statement about even read
thread safety.  All of the jaxp impls suffer some form of threading
problem.  Considering all of the problems with fighting against
namespace problems (much worse IMO) it makes sense to stick with the
devil you know :-<.

Again for most of the xerces releases using the
getFirstChild()/getNextSibling() is a seamless dropin for the
getChildNodes problem.  Its a shame that the xerces guys are very much
against any form of thread safety (except application enforced).  Going
with the standard approach the only safe thing is to always serialize to
objects / keep the strings around, which would overly complicate the
code.

I'm willing to give it a try and send you patched libs to try out (I
don't have a test case for this yet) if its quick to reproduce, just let
me know.  If it works out I can raise a jira with the patches.

cheers,
Chris 


---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org


Re: RE: RE: EMPTY_DOC thread stability issues

Posted by Oliver Waeldrich <ol...@gmx.de>.
I put the test cases into the JIRA:

https://issues.apache.org/jira/secure/attachment/12365164/epr-tests.zip

Oliver


-------- Original-Nachricht --------
> Datum: Wed, 5 Sep 2007 13:20:21 +0200
> Von: Chris.Twiner@swisscom.com
> An: muse-dev@ws.apache.org
> Betreff: RE: RE: EMPTY_DOC thread stability issues

> Hi All,
> 
> Any piece of code that uses nodelists will be thread unsafe as soon as a
> document is shared across threads.  It doesn't matter if the original EPR
> has a seperate document or not.  The problem with TLSing the Document itself
> is that the standard says its not thread safe either, all that happens is
> push the problem further down the line until two threads try to nodelist the
> same document.  
> 
> point 1) affects this directly, the problem is that both the getChildNodes
> optimisation and the importing (as Vinh just showed) aren't thread safe.
> 
> For EPRs at least the only way to be truly thread safe is to store the
> actual information in raw xml strings (or objects) and re-parse it every time
> you use an epr.  Its expensive to then reparse but at least safe.
> 
> point 2) is exactly the same problem just pushed further down the line. 
> The resource EPR with its seperate document is then read simultaneously by
> seperate threads wishing to publish events leading to same issue. 
> getFirstChild/getNextSibling help with this but only as another hack tied to a
> specific version of xerces.
> 
> Please upload the test case, the more test cases we can use the better.
> 
> cheers,
> Chris
> 
> -----Original Message-----
> From: Oliver Waeldrich [mailto:oliver.waeldrich@gmx.de] 
> Sent: Wednesday, September 05, 2007 12:52 PM
> To: muse-dev@ws.apache.org
> Subject: Re: RE: EMPTY_DOC thread stability issues
> 
> 
> Hi all,
> 
> I did some additional testing related to the EMPTY_DOC problem. Therefore
> I set up a test where multiple threads created new EPRs based on a set of
> existing (shared) EPRs. Such shared EPRs are resource EPRs (e.g.
> notification producer EPR). In this test two different kinds of exceptions are thrown:
> 
> 1.)
> java.lang.NullPointerException
>     at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
>     at org.apache.xerces.dom.ParentNode.item(Unknown Source)
>     at org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:932)
>     at org.apache.muse.util.xml.XmlUtils.getElement(XmlUtils.java:1144)
>     at org.apache.muse.util.xml.XmlUtils.getElement(XmlUtils.java:1119)
>     at
> org.apache.muse.test.case1.TestEndpointReference.initializeFromXML(TestEndpointReference.java:560)
>     at
> org.apache.muse.test.case1.TestEndpointReference.<init>(TestEndpointReference.java:193)
>     at org.apache.muse.test.case1.TestThread.run(TestThread.java:60)
> 
> This exception relates to the following code parts in the EPR
> implementation:
>    
>         ...
>         Document doc = XmlUtils.EMPTY_DOC;
>         _xml = XmlUtils.createElement(doc, typeName);
> 
>         NodeList children = copy.toXML().getChildNodes();
>         int length = children.getLength();
>         
>         for (int n = 0; n < length; ++n)
>         {
>             Node next = doc.importNode(children.item(n), true);
>             _xml.appendChild(next);
>         }
> 
>         try
>         {
>             initializeFromXML();
>         }
>         ...
>         
> The reason for the exception is that multiple threads import nodes into
> the SHARED document. Later on in initializeFromXML() the different element
> nodes within the SHARED document are traversed simulations, which leads to
> the threading issue. The usage of ThreadLocal WILL eliminate such errors,
> since no SHARED document is used between the different threads to create XML
> elements. In fact each thread has its own document that is used for this.
> This is shown in the sample thread local.
> 
> 2.)
> 
> java.lang.NullPointerException
>     at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
>     at org.apache.xerces.dom.ParentNode.item(Unknown Source)
>     at
> org.apache.muse.test.case1.TestEndpointReference.<init>(TestEndpointReference.java:187)
>     at org.apache.muse.test.case1.TestThread.run(TestThread.java:60)
> 
> The second exception has a similar background. In the test case the
> initial EPRs are created in the class Test via the
>     
>         TestEndpointReference(URI address, QName typeName)
>          
> constructor. This constructor creates the internal XML representation by
> using EMPTY_DOC. The original EPR is copied by the following code:
> 
>     NodeList children = copy.toXML().getChildNodes();
>     int length = children.getLength();
>     
>     for (int n = 0; n < length; ++n)
>     {
>         Node next = doc.importNode(children.item(n), true);
>         _xml.appendChild(next);
>     }
> 
> The copy.toXML() method simply returns the XML element created with
> EMPTY_DOC. Here again multiple thread try to traverse the same node
> concurrently which leads to the exception.        
>         
> Therefore, in my point of view we have two different problems:
> 
>      
>  1.) Multiple threads must not use the same target document to import
> nodes.
>   
>      This means XML elements, that are only used within a request/response
>      cycle (e.g. for de-/serialization) can use ThreadLocal mechnisms.
>      It is not necessary to create new documents all the way for this
> purpose.
>      
>  2.) XML elements, that have a lifetime longer than a request/response
>      cycle, must be created by using new (not shared) documents. This I
> already
>      pointed out in my first mail. This is especially true for EPRs, that
> are
>      
> 
> I attached a number of test cases that illustrate my point.
> 
> To Rich:
> 
>   >>The default implementation creates a thread for processing the
>   >>notify request, and then immediately releases the thread that called
> it.
>   
>   You are right: in this special case the objects need to be copied
>   by using a separate document, since here the message objects leave the
>   request/response cycle. This has to be taken into account.
>   
> Finally, the usage of createDocument() compared to the solution I
> presented adds a performance overhead of more than 50%. This can simply be verified
> with the tests I attached. However, I admit that my solution requires more
> attention by the programmers than simply replacing EMPTY_DOC, but I think
> it is worth.   
> 
> Interestingly, I still found that sometimes exceptions will occur, even
> when using new documents for every XML operation. This is really rare, but in
> my opinion this results from the “still” shared usage of DOM
> objects within the EPR (the _xml element). Therefore, concurrent read access
> to this instance variable can still occour. To solve the problem with the
> EPR completely, this instance variable should perhaps be removed from the
> EPR implementation.
>   
> Regards,
> Oliver
> 
> p.s. unfortunatelly I could not attach the test files, since this is
> rejected by the mailer deamon. However, I could provide the test in the JIRA.
> --
> Psssst! Schon vom neuen GMX MultiMessenger gehört?
> Der kanns mit allen: http://www.gmx.net/de/go/multimessenger
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: muse-dev-help@ws.apache.org
> 

-- 
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org


RE: EMPTY_DOC thread stability issues

Posted by Ch...@swisscom.com.
Has anyone else tested your producer / consumer test?  We really need people to do that.  Simple, expected, fixes are to move to getfirstchild/nextsibling and the tls approach (could use a pool for that too).  But we need to make sure that at least one other person can repeat your findings before anything can be deemed to be solved.

I could repeat Olivers tests on a single cpu machine but I think yours require at least 2 cpus to cause the issues.  (It may not even be the same issue of course).

I can provide a patch for both fairly quickly but we really need at least two different setups for your tests to verify that the solution is complete.

-----Original Message-----
From: Vinh Nguyen (vinguye2) [mailto:vinguye2@cisco.com] 
Sent: Tuesday, October 02, 2007 10:05 PM
To: muse-dev@ws.apache.org
Subject: RE: EMPTY_DOC thread stability issues

Any idea when the overall solution will be tested and an official patch released?  I think this is a critical issue for mostly everyone.
 

-----Original Message-----
From: Oliver Waeldrich [mailto:oliver.waeldrich@gmx.de]
Sent: Wednesday, September 12, 2007 2:47 AM
To: muse-dev@ws.apache.org
Subject: RE: EMPTY_DOC thread stability issues

Hi all,

I have checked the proposal of Chris and I really like it. I think the best way to proceed is to refactor the XmlUtils class a bit. The refatorisation of XmlUtils means that each Thread will have its own instance of a document builder. New documents are created by this instance. Therefore a new method getDocumentBuilder() is needed. Furthermore, the usage of createBuilder() in XmlUtils should be replaced by getDocumentBuilder() method.  

    static final ThreadLocal tls = new ThreadLocal(){
        protected Object initialValue() {
           // the method createBuilder needs to be synchronized 
           return createBuilder();
        }
    };
    
    private static DocumentBuilder getDocumentBuilder() {
        return (DocumentBuilder)tls.get();
    }

This also means that the representations of EMPTY_DOC in the Muse code can be replaced with createDocument() without experiencing a big performance loss. With the changes proposed/done by Vinh I think the major parts of the problem should be solved.

Thanks to Chris, Rich and Vinh for the fruitful discussion.

Oliver

-------- Original-Nachricht --------
> Datum: Thu, 6 Sep 2007 17:00:09 +0200
> Von: Chris.Twiner@swisscom.com
> An: muse-dev@ws.apache.org
> Betreff: RE: EMPTY_DOC thread stability issues

> Hi all,
> 
> @Oliver, I've tried running the tests but I get NPE's from both the 
> threadlocal version and the current version.  Have I run something 
> incorrectly?
> 
> java.lang.NullPointerException
> 	at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
> 	at org.apache.xerces.dom.ParentNode.item(Unknown Source)
> 	at
> org.apache.muse.test.thread.local.TestEndpointReference.<init>(TestEnd
> po
> intReference.java:197)
> 	at
> org.apache.muse.test.thread.local.TestThread.run(TestThread.java:60)
> java.lang.NullPointerException
> 	at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
> 	at org.apache.xerces.dom.ParentNode.item(Unknown Source)
> 	at
> org.apache.muse.test.thread.local.TestEndpointReference.<init>(TestEnd
> po
> intReference.java:197)
> 	at
> org.apache.muse.test.thread.local.TestThread.run(TestThread.java:60)
> 
> the shared epr and seperate docs versions shows no errors.  
> 
> The seperate doc one takes for ever, quite a performance hit on both 
> the synchronized for getLocalDoc, which shouldn't be necessary since 
> newInstance is threadsafe on DocumentBuilderFactory and the 
> createDocument.  I replaced the getLocalDoc with :
> 
> 	static final ThreadLocal tls = new ThreadLocal(){
> 		protected Object initialValue() {
> 			 DocumentBuilderFactory factory =
> DocumentBuilderFactory.newInstance();
> 		     factory.setNamespaceAware(true);
> 		     factory.setIgnoringComments(true);
> 		     try
> 		        {
> 		            return factory.newDocumentBuilder();
> 		        }
> 		        catch (ParserConfigurationException error)
> 		        {
> 		            throw new
> RuntimeException(error.getMessage(), error);
> 		        }
> 		}
> 	};
> 	
>     public static Document getLocalDoc() {
>     	return ((DocumentBuilder)tls.get()).newDocument();
> 	}
> 
> and it improved the performance considerably (equal with the current 
> solution, but correct).  Creating new documentbuilder factories is 
> extremly expensive, jar lookups resource creation, class loading etc.
> 
> cheers,
> Chris
> 
> PS A drop in replacement of a customised axis 1.4 - does namespaces 
> etc correctly - runs in 128 secs (half+ the speed of the current buggy 
> impl).  If others are interested this version doesn't suffer threading 
> issues on read only, it just creates a few too many things but it 
> could be stripped down into a usable general dom.
> PPS I'm going to start working with Vinhs tests now.
> 
> -----Original Message-----
> From: Vinh Nguyen (vinguye2) [mailto:vinguye2@cisco.com]
> Sent: Thursday, September 06, 2007 3:45 AM
> To: muse-dev@ws.apache.org
> Subject: RE: EMPTY_DOC thread stability issues
> 
> My test case has been posted to JIRA Muse-270:
> 
> https://issues.apache.org/jira/browse/MUSE-270
>  
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: muse-dev-help@ws.apache.org

-- 



Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org


RE: EMPTY_DOC thread stability issues

Posted by "Vinh Nguyen (vinguye2)" <vi...@cisco.com>.
Any idea when the overall solution will be tested and an official patch released?  I think this is a critical issue for mostly everyone.
 

-----Original Message-----
From: Oliver Waeldrich [mailto:oliver.waeldrich@gmx.de] 
Sent: Wednesday, September 12, 2007 2:47 AM
To: muse-dev@ws.apache.org
Subject: RE: EMPTY_DOC thread stability issues

Hi all,

I have checked the proposal of Chris and I really like it. I think the best way to proceed is to refactor the XmlUtils class a bit. The refatorisation of XmlUtils means that each Thread will have its own instance of a document builder. New documents are created by this instance. Therefore a new method getDocumentBuilder() is needed. Furthermore, the usage of createBuilder() in XmlUtils should be replaced by getDocumentBuilder() method.  

    static final ThreadLocal tls = new ThreadLocal(){
        protected Object initialValue() {
           // the method createBuilder needs to be synchronized 
           return createBuilder();
        }
    };
    
    private static DocumentBuilder getDocumentBuilder() {
        return (DocumentBuilder)tls.get();
    }

This also means that the representations of EMPTY_DOC in the Muse code can be replaced with createDocument() without experiencing a big performance loss. With the changes proposed/done by Vinh I think the major parts of the problem should be solved.

Thanks to Chris, Rich and Vinh for the fruitful discussion.

Oliver

-------- Original-Nachricht --------
> Datum: Thu, 6 Sep 2007 17:00:09 +0200
> Von: Chris.Twiner@swisscom.com
> An: muse-dev@ws.apache.org
> Betreff: RE: EMPTY_DOC thread stability issues

> Hi all,
> 
> @Oliver, I've tried running the tests but I get NPE's from both the 
> threadlocal version and the current version.  Have I run something 
> incorrectly?
> 
> java.lang.NullPointerException
> 	at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
> 	at org.apache.xerces.dom.ParentNode.item(Unknown Source)
> 	at
> org.apache.muse.test.thread.local.TestEndpointReference.<init>(TestEnd
> po
> intReference.java:197)
> 	at
> org.apache.muse.test.thread.local.TestThread.run(TestThread.java:60)
> java.lang.NullPointerException
> 	at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
> 	at org.apache.xerces.dom.ParentNode.item(Unknown Source)
> 	at
> org.apache.muse.test.thread.local.TestEndpointReference.<init>(TestEnd
> po
> intReference.java:197)
> 	at
> org.apache.muse.test.thread.local.TestThread.run(TestThread.java:60)
> 
> the shared epr and seperate docs versions shows no errors.  
> 
> The seperate doc one takes for ever, quite a performance hit on both 
> the synchronized for getLocalDoc, which shouldn't be necessary since 
> newInstance is threadsafe on DocumentBuilderFactory and the 
> createDocument.  I replaced the getLocalDoc with :
> 
> 	static final ThreadLocal tls = new ThreadLocal(){
> 		protected Object initialValue() {
> 			 DocumentBuilderFactory factory =
> DocumentBuilderFactory.newInstance();
> 		     factory.setNamespaceAware(true);
> 		     factory.setIgnoringComments(true);
> 		     try
> 		        {
> 		            return factory.newDocumentBuilder();
> 		        }
> 		        catch (ParserConfigurationException error)
> 		        {
> 		            throw new
> RuntimeException(error.getMessage(), error);
> 		        }
> 		}
> 	};
> 	
>     public static Document getLocalDoc() {
>     	return ((DocumentBuilder)tls.get()).newDocument();
> 	}
> 
> and it improved the performance considerably (equal with the current 
> solution, but correct).  Creating new documentbuilder factories is 
> extremly expensive, jar lookups resource creation, class loading etc.
> 
> cheers,
> Chris
> 
> PS A drop in replacement of a customised axis 1.4 - does namespaces 
> etc correctly - runs in 128 secs (half+ the speed of the current buggy 
> impl).  If others are interested this version doesn't suffer threading 
> issues on read only, it just creates a few too many things but it 
> could be stripped down into a usable general dom.
> PPS I'm going to start working with Vinhs tests now.
> 
> -----Original Message-----
> From: Vinh Nguyen (vinguye2) [mailto:vinguye2@cisco.com]
> Sent: Thursday, September 06, 2007 3:45 AM
> To: muse-dev@ws.apache.org
> Subject: RE: EMPTY_DOC thread stability issues
> 
> My test case has been posted to JIRA Muse-270:
> 
> https://issues.apache.org/jira/browse/MUSE-270
>  
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: muse-dev-help@ws.apache.org

-- 



Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org


RE: EMPTY_DOC thread stability issues

Posted by Oliver Waeldrich <ol...@gmx.de>.
Hi all,

I have checked the proposal of Chris and I really like it. I think the best way to proceed is to refactor the XmlUtils class a bit. The refatorisation of XmlUtils means that each Thread will have its own instance of a document builder. New documents are created by this instance. Therefore a new method getDocumentBuilder() is needed. Furthermore, the usage of createBuilder() in XmlUtils should be replaced by getDocumentBuilder() method.  

    static final ThreadLocal tls = new ThreadLocal(){
        protected Object initialValue() {
           // the method createBuilder needs to be synchronized 
           return createBuilder();
        }
    };
    
    private static DocumentBuilder getDocumentBuilder() {
        return (DocumentBuilder)tls.get();
    }

This also means that the representations of EMPTY_DOC in the Muse code can be replaced with createDocument() without experiencing a big performance loss. With the changes proposed/done by Vinh I think the major parts of the problem should be solved.

Thanks to Chris, Rich and Vinh for the fruitful discussion.

Oliver

-------- Original-Nachricht --------
> Datum: Thu, 6 Sep 2007 17:00:09 +0200
> Von: Chris.Twiner@swisscom.com
> An: muse-dev@ws.apache.org
> Betreff: RE: EMPTY_DOC thread stability issues

> Hi all,
> 
> @Oliver, I've tried running the tests but I get NPE's from both the
> threadlocal version and the current version.  Have I run something
> incorrectly?
> 
> java.lang.NullPointerException
> 	at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
> 	at org.apache.xerces.dom.ParentNode.item(Unknown Source)
> 	at
> org.apache.muse.test.thread.local.TestEndpointReference.<init>(TestEndpo
> intReference.java:197)
> 	at
> org.apache.muse.test.thread.local.TestThread.run(TestThread.java:60)
> java.lang.NullPointerException
> 	at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
> 	at org.apache.xerces.dom.ParentNode.item(Unknown Source)
> 	at
> org.apache.muse.test.thread.local.TestEndpointReference.<init>(TestEndpo
> intReference.java:197)
> 	at
> org.apache.muse.test.thread.local.TestThread.run(TestThread.java:60) 
> 
> the shared epr and seperate docs versions shows no errors.  
> 
> The seperate doc one takes for ever, quite a performance hit on both the
> synchronized for getLocalDoc, which shouldn't be necessary since
> newInstance is threadsafe on DocumentBuilderFactory and the
> createDocument.  I replaced the getLocalDoc with :
> 
> 	static final ThreadLocal tls = new ThreadLocal(){
> 		protected Object initialValue() {
> 			 DocumentBuilderFactory factory =
> DocumentBuilderFactory.newInstance();
> 		     factory.setNamespaceAware(true);
> 		     factory.setIgnoringComments(true);
> 		     try
> 		        {
> 		            return factory.newDocumentBuilder();
> 		        }
> 		        catch (ParserConfigurationException error)
> 		        {
> 		            throw new
> RuntimeException(error.getMessage(), error);
> 		        }
> 		}
> 	};
> 	
>     public static Document getLocalDoc() {
>     	return ((DocumentBuilder)tls.get()).newDocument();
> 	}
> 
> and it improved the performance considerably (equal with the current
> solution, but correct).  Creating new documentbuilder factories is
> extremly expensive, jar lookups resource creation, class loading etc.
> 
> cheers,
> Chris
> 
> PS A drop in replacement of a customised axis 1.4 - does namespaces etc
> correctly - runs in 128 secs (half+ the speed of the current buggy
> impl).  If others are interested this version doesn't suffer threading
> issues on read only, it just creates a few too many things but it could
> be stripped down into a usable general dom.
> PPS I'm going to start working with Vinhs tests now.
> 
> -----Original Message-----
> From: Vinh Nguyen (vinguye2) [mailto:vinguye2@cisco.com] 
> Sent: Thursday, September 06, 2007 3:45 AM
> To: muse-dev@ws.apache.org
> Subject: RE: EMPTY_DOC thread stability issues
> 
> My test case has been posted to JIRA Muse-270:
> 
> https://issues.apache.org/jira/browse/MUSE-270
>  
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: muse-dev-help@ws.apache.org

-- 



Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org


Re: RE: EMPTY_DOC thread stability issues

Posted by Oliver Waeldrich <ol...@gmx.de>.
Hi all,

@Chris: No, you did nothing wrong. The tests in the local thread package just show that the exceptions disappear, which result from the concurrent access to EMPTY_DOC (e.g. EMPTY_DOC.importNode()). These exceptions are thrown in the initializeFromXml() method. 

However, in the tread local test still multiple threads try to access the same DOM elements in the source EPRs (the _xml member variable). This is taken into account by the shared EPR test cases. Here the DOM elements in the shared source EPRs are created with unique documents. The test still uses the thread local mechanism DOM operations, hence it shows quite well that the remaining exception from the thread local test case result from the concurrent read access to the _xml object in the EPRs, which in turn were all created in one shared DOM document, EMPTY_DOC.

In my opinion, if we create an DOM element what we intend to share later on with multiple threads, we need to create them in a unique document (or even omit DOM member variables in these objects). This is to make the read access thread safe. 

If we don't intend to share objects later on, we just have to make sure that one DOM document is only use by one thread at a time. This is done by ThreadLocal.

Regards,
Oliver






initializeFromXML(TestEndpointReference.java:560)
-------- Original-Nachricht --------
> Datum: Thu, 6 Sep 2007 17:00:09 +0200
> Von: Chris.Twiner@swisscom.com
> An: muse-dev@ws.apache.org
> Betreff: RE: EMPTY_DOC thread stability issues

> Hi all,
> 
> @Oliver, I've tried running the tests but I get NPE's from both the
> threadlocal version and the current version.  Have I run something
> incorrectly?
> 
> java.lang.NullPointerException
> 	at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
> 	at org.apache.xerces.dom.ParentNode.item(Unknown Source)
> 	at
> org.apache.muse.test.thread.local.TestEndpointReference.<init>(TestEndpo
> intReference.java:197)
> 	at
> org.apache.muse.test.thread.local.TestThread.run(TestThread.java:60)
> java.lang.NullPointerException
> 	at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
> 	at org.apache.xerces.dom.ParentNode.item(Unknown Source)
> 	at
> org.apache.muse.test.thread.local.TestEndpointReference.<init>(TestEndpo
> intReference.java:197)
> 	at
> org.apache.muse.test.thread.local.TestThread.run(TestThread.java:60) 
> 
> the shared epr and seperate docs versions shows no errors.  
> 
> The seperate doc one takes for ever, quite a performance hit on both the
> synchronized for getLocalDoc, which shouldn't be necessary since
> newInstance is threadsafe on DocumentBuilderFactory and the
> createDocument.  I replaced the getLocalDoc with :
> 
> 	static final ThreadLocal tls = new ThreadLocal(){
> 		protected Object initialValue() {
> 			 DocumentBuilderFactory factory =
> DocumentBuilderFactory.newInstance();
> 		     factory.setNamespaceAware(true);
> 		     factory.setIgnoringComments(true);
> 		     try
> 		        {
> 		            return factory.newDocumentBuilder();
> 		        }
> 		        catch (ParserConfigurationException error)
> 		        {
> 		            throw new
> RuntimeException(error.getMessage(), error);
> 		        }
> 		}
> 	};
> 	
>     public static Document getLocalDoc() {
>     	return ((DocumentBuilder)tls.get()).newDocument();
> 	}
> 
> and it improved the performance considerably (equal with the current
> solution, but correct).  Creating new documentbuilder factories is
> extremly expensive, jar lookups resource creation, class loading etc.
> 
> cheers,
> Chris
> 
> PS A drop in replacement of a customised axis 1.4 - does namespaces etc
> correctly - runs in 128 secs (half+ the speed of the current buggy
> impl).  If others are interested this version doesn't suffer threading
> issues on read only, it just creates a few too many things but it could
> be stripped down into a usable general dom.
> PPS I'm going to start working with Vinhs tests now.
> 
> -----Original Message-----
> From: Vinh Nguyen (vinguye2) [mailto:vinguye2@cisco.com] 
> Sent: Thursday, September 06, 2007 3:45 AM
> To: muse-dev@ws.apache.org
> Subject: RE: EMPTY_DOC thread stability issues
> 
> My test case has been posted to JIRA Muse-270:
> 
> https://issues.apache.org/jira/browse/MUSE-270
>  
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: muse-dev-help@ws.apache.org

-- 
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org


RE: EMPTY_DOC thread stability issues

Posted by Ch...@swisscom.com.
Hi all,

@Oliver, I've tried running the tests but I get NPE's from both the
threadlocal version and the current version.  Have I run something
incorrectly?

java.lang.NullPointerException
	at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
	at org.apache.xerces.dom.ParentNode.item(Unknown Source)
	at
org.apache.muse.test.thread.local.TestEndpointReference.<init>(TestEndpo
intReference.java:197)
	at
org.apache.muse.test.thread.local.TestThread.run(TestThread.java:60)
java.lang.NullPointerException
	at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
	at org.apache.xerces.dom.ParentNode.item(Unknown Source)
	at
org.apache.muse.test.thread.local.TestEndpointReference.<init>(TestEndpo
intReference.java:197)
	at
org.apache.muse.test.thread.local.TestThread.run(TestThread.java:60) 

the shared epr and seperate docs versions shows no errors.  

The seperate doc one takes for ever, quite a performance hit on both the
synchronized for getLocalDoc, which shouldn't be necessary since
newInstance is threadsafe on DocumentBuilderFactory and the
createDocument.  I replaced the getLocalDoc with :

	static final ThreadLocal tls = new ThreadLocal(){
		protected Object initialValue() {
			 DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
		     factory.setNamespaceAware(true);
		     factory.setIgnoringComments(true);
		     try
		        {
		            return factory.newDocumentBuilder();
		        }
		        catch (ParserConfigurationException error)
		        {
		            throw new
RuntimeException(error.getMessage(), error);
		        }
		}
	};
	
    public static Document getLocalDoc() {
    	return ((DocumentBuilder)tls.get()).newDocument();
	}

and it improved the performance considerably (equal with the current
solution, but correct).  Creating new documentbuilder factories is
extremly expensive, jar lookups resource creation, class loading etc.

cheers,
Chris

PS A drop in replacement of a customised axis 1.4 - does namespaces etc
correctly - runs in 128 secs (half+ the speed of the current buggy
impl).  If others are interested this version doesn't suffer threading
issues on read only, it just creates a few too many things but it could
be stripped down into a usable general dom.
PPS I'm going to start working with Vinhs tests now.

-----Original Message-----
From: Vinh Nguyen (vinguye2) [mailto:vinguye2@cisco.com] 
Sent: Thursday, September 06, 2007 3:45 AM
To: muse-dev@ws.apache.org
Subject: RE: EMPTY_DOC thread stability issues

My test case has been posted to JIRA Muse-270:

https://issues.apache.org/jira/browse/MUSE-270
 

---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org


RE: EMPTY_DOC thread stability issues

Posted by "Vinh Nguyen (vinguye2)" <vi...@cisco.com>.
My test case has been posted to JIRA Muse-270:

https://issues.apache.org/jira/browse/MUSE-270
 

-----Original Message-----
From: Vinh Nguyen (vinguye2) 
Sent: Wednesday, September 05, 2007 12:54 PM
To: muse-dev@ws.apache.org
Subject: RE: EMPTY_DOC thread stability issues

Minor correction to point (b):

SimpleNotificationMessage actually calls EndpointReference.toXML(Document), which correctly returns a new Element copy.  So I made updates to SimpleNotificationMessage to use a new Document instead of EMPTY_DOC when calling EndpointReference.toXML(Document), since there can be several SimpleNotificationMessage objects trying to simultaneously access the child nodes of the toXML(Document) result.

The EndpointReference.toXML() method (no Document parameter) returns the same internal Element reference.  So callers must be careful to synchronize access to the result Element's children from multiple threads.  I think this can easily be done by applying a "synchronized(element)" lock on the Element object before accessing it.


-----Original Message-----
From: Vinh Nguyen (vinguye2)
Sent: Wednesday, September 05, 2007 11:40 AM
To: muse-dev@ws.apache.org
Subject: RE: EMPTY_DOC thread stability issues

To sum up, the two problems are:

1) A Document is not thread safe.
2) A Node/Element is not thread safe.

For #1, EMPTY_DOC can still be used in cases where an Element is created and only used in a single thread.  Or, with the help of ThreadLocal, it can be used in multiple threads...but not if the Element's child nodes will be read from multiple threads.

#2 requires extra care when creating an Element.  This means not just knowing when to use a new Document, but knowing if the Element itself needs to be created anew each time it needs to be accessed.  So, take the EndpointReference class for example.

a) Since the toXML() method is called from multiple threads  (i.e. Muse gets the producer resource's
   EPR to set it in a notification message), it should always use XmlUtils.createDocument() instead
   of XmlUtils.EMPTY_DOC.
b) Since toXML() returns an Element whose children will be accessed from multiple threads, then the
   Element itself should be created anew for each thread/caller.  So ideally, EndpointReference.toXML()
   should not return the same internal Element reference, but create a new Element each time.  This prevents
   errors from occurring when two SimpleNotificationMessage classes need to simultaneously read the child
   nodelist of the same resource EPR.

These rules apply not just to EMPTY_DOC, but to any class that needs to use/create a Document or Element.  The use of a shared EMPTY_DOC  increases performance, but it also can cause code confusion.  For example, a lot of Muse code will obtain an Element and automatically assume the owner document is EMPTY_DOC (i.e. serializers need to return an Element whose owner is EMPTY_DOC).  But, if my custom code doesn't follow this undocumented convention, I will run into errors.  It took me awhile to figure out the serializer restriction.

I'm also still concerned about ThreadLocal.  It does allow the EMPTY_DOC to be reused across multiple threads.  But, if Xerces is in fact storing an internal cache for the document and is depending on the thread to expire in order for the cache to clear, then that might be a problem.  ThreadLocal will not expire until the app shuts down, which means the Xerces cache will continue to build up for as many times EMPTY_DOC is used, and this might cause unexpected problems later on during runtime.
-Vinh



-----Original Message-----
From: Oliver Waeldrich [mailto:oliver.waeldrich@gmx.de]
Sent: Wednesday, September 05, 2007 6:16 AM
To: muse-dev@ws.apache.org
Subject: RE: EMPTY_DOC thread stability issues

Hi all,

> point 1) affects this directly, the problem is that both the 
> getChildNodes optimisation and the importing (as Vinh just showed) 
> aren't thread safe.

I agree. The point is that using ThreadLocal is just an optimized version compared to the approach of creating new documents. It can be used for several cases were the XML fragments are only used temporally. Remember the initial idea why EMPTY_DOC was introduced was to minimize document creation. It should be used as a factory for creating XML fragments. 
In a lot of cases you create XML fragments (e.g. SOAP envelope, header, body), use them (e.g. de-/serialize) and discard them. 

Therefore, these fragments only exist for a short time within one worker thread of a servlet container, or more concrete they only exist within a reqest/response cycle. These fragments are also only accessed by the current thread. However, since document is not thread safe we can not share EMPTY_DOC over multiple worker threads.

MUSE would profit from using ThreadLocal and not creating new XML documents all the time, since every worker thread has its own factory document. Hence we do not have the threading issues.

If we need to use objects over multiple requests, we need to create them in thread safe representations, which potentially do not include DOM elements. But this is only the case for a minority of the XML related objects created in muse.

Again, the TreadLocal approach does not solve our problem alone. It is just an optimization. Since some refactoring has to be done, I would suggest not to simply search and replace EMPTY_DOC by createDocument(), since the basic idea of reusing the document object was totally fine.

> point 2) is exactly the same problem just pushed further down the line. 
> The resource EPR with its seperate document is then read 
> simultaneously by seperate threads wishing to publish events leading to same issue.

I think this is really the root of the matter (at least for the EPRs). 
Potentially we must really use raw strings as internal representations of EPRs instead of DOM elements. 

Oliver


-------- Original-Nachricht --------
> Datum: Wed, 5 Sep 2007 13:20:21 +0200
> Von: Chris.Twiner@swisscom.com
> An: muse-dev@ws.apache.org
> Betreff: RE: RE: EMPTY_DOC thread stability issues

> Hi All,
> 
> Any piece of code that uses nodelists will be thread unsafe as soon as 
> a document is shared across threads.  It doesn't matter if the 
> original EPR has a seperate document or not.  The problem with TLSing 
> the Document itself is that the standard says its not thread safe 
> either, all that happens is push the problem further down the line 
> until two threads try to nodelist the same document.
> 
> point 1) affects this directly, the problem is that both the 
> getChildNodes optimisation and the importing (as Vinh just showed) aren't thread safe.
> 
> For EPRs at least the only way to be truly thread safe is to store the 
> actual information in raw xml strings (or objects) and re-parse it 
> every time you use an epr.  Its expensive to then reparse but at least safe.
> 
> point 2) is exactly the same problem just pushed further down the line. 
> The resource EPR with its seperate document is then read 
> simultaneously by seperate threads wishing to publish events leading to same issue.
> getFirstChild/getNextSibling help with this but only as another hack 
> tied to a specific version of xerces.
> 
> Please upload the test case, the more test cases we can use the better.
> 
> cheers,
> Chris
> 
> -----Original Message-----
> From: Oliver Waeldrich [mailto:oliver.waeldrich@gmx.de]
> Sent: Wednesday, September 05, 2007 12:52 PM
> To: muse-dev@ws.apache.org
> Subject: Re: RE: EMPTY_DOC thread stability issues
> 
> 
> Hi all,
> 
> I did some additional testing related to the EMPTY_DOC problem. 
> Therefore I set up a test where multiple threads created new EPRs 
> based on a set of existing (shared) EPRs. Such shared EPRs are resource EPRs (e.g.
> notification producer EPR). In this test two different kinds of exceptions are thrown:
> 
> 1.)
> java.lang.NullPointerException
>     at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
>     at org.apache.xerces.dom.ParentNode.item(Unknown Source)
>     at org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:932)
>     at org.apache.muse.util.xml.XmlUtils.getElement(XmlUtils.java:1144)
>     at org.apache.muse.util.xml.XmlUtils.getElement(XmlUtils.java:1119)
>     at
> org.apache.muse.test.case1.TestEndpointReference.initializeFromXML(TestEndpointReference.java:560)
>     at
> org.apache.muse.test.case1.TestEndpointReference.<init>(TestEndpointReference.java:193)
>     at org.apache.muse.test.case1.TestThread.run(TestThread.java:60)
> 
> This exception relates to the following code parts in the EPR
> implementation:
>    
>         ...
>         Document doc = XmlUtils.EMPTY_DOC;
>         _xml = XmlUtils.createElement(doc, typeName);
> 
>         NodeList children = copy.toXML().getChildNodes();
>         int length = children.getLength();
>         
>         for (int n = 0; n < length; ++n)
>         {
>             Node next = doc.importNode(children.item(n), true);
>             _xml.appendChild(next);
>         }
> 
>         try
>         {
>             initializeFromXML();
>         }
>         ...
>         
> The reason for the exception is that multiple threads import nodes 
> into the SHARED document. Later on in initializeFromXML() the 
> different element nodes within the SHARED document are traversed 
> simulations, which leads to the threading issue. The usage of 
> ThreadLocal WILL eliminate such errors, since no SHARED document is 
> used between the different threads to create XML elements. In fact each thread has its own document that is used for this.
> This is shown in the sample thread local.
> 
> 2.)
> 
> java.lang.NullPointerException
>     at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
>     at org.apache.xerces.dom.ParentNode.item(Unknown Source)
>     at
> org.apache.muse.test.case1.TestEndpointReference.<init>(TestEndpointReference.java:187)
>     at org.apache.muse.test.case1.TestThread.run(TestThread.java:60)
> 
> The second exception has a similar background. In the test case the 
> initial EPRs are created in the class Test via the
>     
>         TestEndpointReference(URI address, QName typeName)
>          
> constructor. This constructor creates the internal XML representation 
> by using EMPTY_DOC. The original EPR is copied by the following code:
> 
>     NodeList children = copy.toXML().getChildNodes();
>     int length = children.getLength();
>     
>     for (int n = 0; n < length; ++n)
>     {
>         Node next = doc.importNode(children.item(n), true);
>         _xml.appendChild(next);
>     }
> 
> The copy.toXML() method simply returns the XML element created with 
> EMPTY_DOC. Here again multiple thread try to traverse the same node
> concurrently which leads to the exception.        
>         
> Therefore, in my point of view we have two different problems:
> 
>      
>  1.) Multiple threads must not use the same target document to import 
> nodes.
>   
>      This means XML elements, that are only used within a request/response
>      cycle (e.g. for de-/serialization) can use ThreadLocal mechnisms.
>      It is not necessary to create new documents all the way for this 
> purpose.
>      
>  2.) XML elements, that have a lifetime longer than a request/response
>      cycle, must be created by using new (not shared) documents. This 
> I already
>      pointed out in my first mail. This is especially true for EPRs, 
> that are
>      
> 
> I attached a number of test cases that illustrate my point.
> 
> To Rich:
> 
>   >>The default implementation creates a thread for processing the
>   >>notify request, and then immediately releases the thread that 
> called it.
>   
>   You are right: in this special case the objects need to be copied
>   by using a separate document, since here the message objects leave the
>   request/response cycle. This has to be taken into account.
>   
> Finally, the usage of createDocument() compared to the solution I 
> presented adds a performance overhead of more than 50%. This can 
> simply be verified with the tests I attached. However, I admit that my 
> solution requires more attention by the programmers than simply replacing EMPTY_DOC, but I think
> it is worth.   
> 
> Interestingly, I still found that sometimes exceptions will occur, 
> even when using new documents for every XML operation. This is really 
> rare, but in my opinion this results from the “still” shared usage 
> of DOM objects within the EPR (the _xml element). Therefore, 
> concurrent read access to this instance variable can still occour. To 
> solve the problem with the EPR completely, this instance variable 
> should perhaps be removed from the EPR implementation.
>   
> Regards,
> Oliver
> 
> p.s. unfortunatelly I could not attach the test files, since this is 
> rejected by the mailer deamon. However, I could provide the test in the JIRA.
> --
> Psssst! Schon vom neuen GMX MultiMessenger gehört?
> Der kanns mit allen: http://www.gmx.net/de/go/multimessenger
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: muse-dev-help@ws.apache.org
> 

--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org

RE: EMPTY_DOC thread stability issues

Posted by "Vinh Nguyen (vinguye2)" <vi...@cisco.com>.
Minor correction to point (b):

SimpleNotificationMessage actually calls EndpointReference.toXML(Document), which correctly returns a new Element copy.  So I made updates to SimpleNotificationMessage to use a new Document instead of EMPTY_DOC when calling EndpointReference.toXML(Document), since there can be several SimpleNotificationMessage objects trying to simultaneously access the child nodes of the toXML(Document) result.

The EndpointReference.toXML() method (no Document parameter) returns the same internal Element reference.  So callers must be careful to synchronize access to the result Element's children from multiple threads.  I think this can easily be done by applying a "synchronized(element)" lock on the Element object before accessing it.


-----Original Message-----
From: Vinh Nguyen (vinguye2) 
Sent: Wednesday, September 05, 2007 11:40 AM
To: muse-dev@ws.apache.org
Subject: RE: EMPTY_DOC thread stability issues

To sum up, the two problems are:

1) A Document is not thread safe.
2) A Node/Element is not thread safe.

For #1, EMPTY_DOC can still be used in cases where an Element is created and only used in a single thread.  Or, with the help of ThreadLocal, it can be used in multiple threads...but not if the Element's child nodes will be read from multiple threads.

#2 requires extra care when creating an Element.  This means not just knowing when to use a new Document, but knowing if the Element itself needs to be created anew each time it needs to be accessed.  So, take the EndpointReference class for example.

a) Since the toXML() method is called from multiple threads  (i.e. Muse gets the producer resource's
   EPR to set it in a notification message), it should always use XmlUtils.createDocument() instead
   of XmlUtils.EMPTY_DOC.
b) Since toXML() returns an Element whose children will be accessed from multiple threads, then the
   Element itself should be created anew for each thread/caller.  So ideally, EndpointReference.toXML()
   should not return the same internal Element reference, but create a new Element each time.  This prevents
   errors from occurring when two SimpleNotificationMessage classes need to simultaneously read the child
   nodelist of the same resource EPR.

These rules apply not just to EMPTY_DOC, but to any class that needs to use/create a Document or Element.  The use of a shared EMPTY_DOC  increases performance, but it also can cause code confusion.  For example, a lot of Muse code will obtain an Element and automatically assume the owner document is EMPTY_DOC (i.e. serializers need to return an Element whose owner is EMPTY_DOC).  But, if my custom code doesn't follow this undocumented convention, I will run into errors.  It took me awhile to figure out the serializer restriction.

I'm also still concerned about ThreadLocal.  It does allow the EMPTY_DOC to be reused across multiple threads.  But, if Xerces is in fact storing an internal cache for the document and is depending on the thread to expire in order for the cache to clear, then that might be a problem.  ThreadLocal will not expire until the app shuts down, which means the Xerces cache will continue to build up for as many times EMPTY_DOC is used, and this might cause unexpected problems later on during runtime.
-Vinh



-----Original Message-----
From: Oliver Waeldrich [mailto:oliver.waeldrich@gmx.de]
Sent: Wednesday, September 05, 2007 6:16 AM
To: muse-dev@ws.apache.org
Subject: RE: EMPTY_DOC thread stability issues

Hi all,

> point 1) affects this directly, the problem is that both the 
> getChildNodes optimisation and the importing (as Vinh just showed) 
> aren't thread safe.

I agree. The point is that using ThreadLocal is just an optimized version compared to the approach of creating new documents. It can be used for several cases were the XML fragments are only used temporally. Remember the initial idea why EMPTY_DOC was introduced was to minimize document creation. It should be used as a factory for creating XML fragments. 
In a lot of cases you create XML fragments (e.g. SOAP envelope, header, body), use them (e.g. de-/serialize) and discard them. 

Therefore, these fragments only exist for a short time within one worker thread of a servlet container, or more concrete they only exist within a reqest/response cycle. These fragments are also only accessed by the current thread. However, since document is not thread safe we can not share EMPTY_DOC over multiple worker threads.

MUSE would profit from using ThreadLocal and not creating new XML documents all the time, since every worker thread has its own factory document. Hence we do not have the threading issues.

If we need to use objects over multiple requests, we need to create them in thread safe representations, which potentially do not include DOM elements. But this is only the case for a minority of the XML related objects created in muse.

Again, the TreadLocal approach does not solve our problem alone. It is just an optimization. Since some refactoring has to be done, I would suggest not to simply search and replace EMPTY_DOC by createDocument(), since the basic idea of reusing the document object was totally fine.

> point 2) is exactly the same problem just pushed further down the line. 
> The resource EPR with its seperate document is then read 
> simultaneously by seperate threads wishing to publish events leading to same issue.

I think this is really the root of the matter (at least for the EPRs). 
Potentially we must really use raw strings as internal representations of EPRs instead of DOM elements. 

Oliver


-------- Original-Nachricht --------
> Datum: Wed, 5 Sep 2007 13:20:21 +0200
> Von: Chris.Twiner@swisscom.com
> An: muse-dev@ws.apache.org
> Betreff: RE: RE: EMPTY_DOC thread stability issues

> Hi All,
> 
> Any piece of code that uses nodelists will be thread unsafe as soon as 
> a document is shared across threads.  It doesn't matter if the 
> original EPR has a seperate document or not.  The problem with TLSing 
> the Document itself is that the standard says its not thread safe 
> either, all that happens is push the problem further down the line 
> until two threads try to nodelist the same document.
> 
> point 1) affects this directly, the problem is that both the 
> getChildNodes optimisation and the importing (as Vinh just showed) aren't thread safe.
> 
> For EPRs at least the only way to be truly thread safe is to store the 
> actual information in raw xml strings (or objects) and re-parse it 
> every time you use an epr.  Its expensive to then reparse but at least safe.
> 
> point 2) is exactly the same problem just pushed further down the line. 
> The resource EPR with its seperate document is then read 
> simultaneously by seperate threads wishing to publish events leading to same issue.
> getFirstChild/getNextSibling help with this but only as another hack 
> tied to a specific version of xerces.
> 
> Please upload the test case, the more test cases we can use the better.
> 
> cheers,
> Chris
> 
> -----Original Message-----
> From: Oliver Waeldrich [mailto:oliver.waeldrich@gmx.de]
> Sent: Wednesday, September 05, 2007 12:52 PM
> To: muse-dev@ws.apache.org
> Subject: Re: RE: EMPTY_DOC thread stability issues
> 
> 
> Hi all,
> 
> I did some additional testing related to the EMPTY_DOC problem. 
> Therefore I set up a test where multiple threads created new EPRs 
> based on a set of existing (shared) EPRs. Such shared EPRs are resource EPRs (e.g.
> notification producer EPR). In this test two different kinds of exceptions are thrown:
> 
> 1.)
> java.lang.NullPointerException
>     at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
>     at org.apache.xerces.dom.ParentNode.item(Unknown Source)
>     at org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:932)
>     at org.apache.muse.util.xml.XmlUtils.getElement(XmlUtils.java:1144)
>     at org.apache.muse.util.xml.XmlUtils.getElement(XmlUtils.java:1119)
>     at
> org.apache.muse.test.case1.TestEndpointReference.initializeFromXML(TestEndpointReference.java:560)
>     at
> org.apache.muse.test.case1.TestEndpointReference.<init>(TestEndpointReference.java:193)
>     at org.apache.muse.test.case1.TestThread.run(TestThread.java:60)
> 
> This exception relates to the following code parts in the EPR
> implementation:
>    
>         ...
>         Document doc = XmlUtils.EMPTY_DOC;
>         _xml = XmlUtils.createElement(doc, typeName);
> 
>         NodeList children = copy.toXML().getChildNodes();
>         int length = children.getLength();
>         
>         for (int n = 0; n < length; ++n)
>         {
>             Node next = doc.importNode(children.item(n), true);
>             _xml.appendChild(next);
>         }
> 
>         try
>         {
>             initializeFromXML();
>         }
>         ...
>         
> The reason for the exception is that multiple threads import nodes 
> into the SHARED document. Later on in initializeFromXML() the 
> different element nodes within the SHARED document are traversed 
> simulations, which leads to the threading issue. The usage of 
> ThreadLocal WILL eliminate such errors, since no SHARED document is 
> used between the different threads to create XML elements. In fact each thread has its own document that is used for this.
> This is shown in the sample thread local.
> 
> 2.)
> 
> java.lang.NullPointerException
>     at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
>     at org.apache.xerces.dom.ParentNode.item(Unknown Source)
>     at
> org.apache.muse.test.case1.TestEndpointReference.<init>(TestEndpointReference.java:187)
>     at org.apache.muse.test.case1.TestThread.run(TestThread.java:60)
> 
> The second exception has a similar background. In the test case the 
> initial EPRs are created in the class Test via the
>     
>         TestEndpointReference(URI address, QName typeName)
>          
> constructor. This constructor creates the internal XML representation 
> by using EMPTY_DOC. The original EPR is copied by the following code:
> 
>     NodeList children = copy.toXML().getChildNodes();
>     int length = children.getLength();
>     
>     for (int n = 0; n < length; ++n)
>     {
>         Node next = doc.importNode(children.item(n), true);
>         _xml.appendChild(next);
>     }
> 
> The copy.toXML() method simply returns the XML element created with 
> EMPTY_DOC. Here again multiple thread try to traverse the same node
> concurrently which leads to the exception.        
>         
> Therefore, in my point of view we have two different problems:
> 
>      
>  1.) Multiple threads must not use the same target document to import 
> nodes.
>   
>      This means XML elements, that are only used within a request/response
>      cycle (e.g. for de-/serialization) can use ThreadLocal mechnisms.
>      It is not necessary to create new documents all the way for this 
> purpose.
>      
>  2.) XML elements, that have a lifetime longer than a request/response
>      cycle, must be created by using new (not shared) documents. This 
> I already
>      pointed out in my first mail. This is especially true for EPRs, 
> that are
>      
> 
> I attached a number of test cases that illustrate my point.
> 
> To Rich:
> 
>   >>The default implementation creates a thread for processing the
>   >>notify request, and then immediately releases the thread that 
> called it.
>   
>   You are right: in this special case the objects need to be copied
>   by using a separate document, since here the message objects leave the
>   request/response cycle. This has to be taken into account.
>   
> Finally, the usage of createDocument() compared to the solution I 
> presented adds a performance overhead of more than 50%. This can 
> simply be verified with the tests I attached. However, I admit that my 
> solution requires more attention by the programmers than simply replacing EMPTY_DOC, but I think
> it is worth.   
> 
> Interestingly, I still found that sometimes exceptions will occur, 
> even when using new documents for every XML operation. This is really 
> rare, but in my opinion this results from the “still” shared usage 
> of DOM objects within the EPR (the _xml element). Therefore, 
> concurrent read access to this instance variable can still occour. To 
> solve the problem with the EPR completely, this instance variable 
> should perhaps be removed from the EPR implementation.
>   
> Regards,
> Oliver
> 
> p.s. unfortunatelly I could not attach the test files, since this is 
> rejected by the mailer deamon. However, I could provide the test in the JIRA.
> --
> Psssst! Schon vom neuen GMX MultiMessenger gehört?
> Der kanns mit allen: http://www.gmx.net/de/go/multimessenger
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: muse-dev-help@ws.apache.org
> 

--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org

RE: EMPTY_DOC thread stability issues

Posted by "Vinh Nguyen (vinguye2)" <vi...@cisco.com>.
To sum up, the two problems are:

1) A Document is not thread safe.
2) A Node/Element is not thread safe.

For #1, EMPTY_DOC can still be used in cases where an Element is created and only used in a single thread.  Or, with the help of ThreadLocal, it can be used in multiple threads...but not if the Element's child nodes will be read from multiple threads.

#2 requires extra care when creating an Element.  This means not just knowing when to use a new Document, but knowing if the Element itself needs to be created anew each time it needs to be accessed.  So, take the EndpointReference class for example.

a) Since the toXML() method is called from multiple threads  (i.e. Muse gets the producer resource's
   EPR to set it in a notification message), it should always use XmlUtils.createDocument() instead
   of XmlUtils.EMPTY_DOC.
b) Since toXML() returns an Element whose children will be accessed from multiple threads, then the
   Element itself should be created anew for each thread/caller.  So ideally, EndpointReference.toXML()
   should not return the same internal Element reference, but create a new Element each time.  This prevents
   errors from occurring when two SimpleNotificationMessage classes need to simultaneously read the child
   nodelist of the same resource EPR.

These rules apply not just to EMPTY_DOC, but to any class that needs to use/create a Document or Element.  The use of a shared EMPTY_DOC  increases performance, but it also can cause code confusion.  For example, a lot of Muse code will obtain an Element and automatically assume the owner document is EMPTY_DOC (i.e. serializers need to return an Element whose owner is EMPTY_DOC).  But, if my custom code doesn't follow this undocumented convention, I will run into errors.  It took me awhile to figure out the serializer restriction.

I'm also still concerned about ThreadLocal.  It does allow the EMPTY_DOC to be reused across multiple threads.  But, if Xerces is in fact storing an internal cache for the document and is depending on the thread to expire in order for the cache to clear, then that might be a problem.  ThreadLocal will not expire until the app shuts down, which means the Xerces cache will continue to build up for as many times EMPTY_DOC is used, and this might cause unexpected problems later on during runtime.
-Vinh



-----Original Message-----
From: Oliver Waeldrich [mailto:oliver.waeldrich@gmx.de] 
Sent: Wednesday, September 05, 2007 6:16 AM
To: muse-dev@ws.apache.org
Subject: RE: EMPTY_DOC thread stability issues

Hi all,

> point 1) affects this directly, the problem is that both the 
> getChildNodes optimisation and the importing (as Vinh just showed) 
> aren't thread safe.

I agree. The point is that using ThreadLocal is just an optimized version compared to the approach of creating new documents. It can be used for several cases were the XML fragments are only used temporally. Remember the initial idea why EMPTY_DOC was introduced was to minimize document creation. It should be used as a factory for creating XML fragments. 
In a lot of cases you create XML fragments (e.g. SOAP envelope, header, body), use them (e.g. de-/serialize) and discard them. 

Therefore, these fragments only exist for a short time within one worker thread of a servlet container, or more concrete they only exist within a reqest/response cycle. These fragments are also only accessed by the current thread. However, since document is not thread safe we can not share EMPTY_DOC over multiple worker threads.

MUSE would profit from using ThreadLocal and not creating new XML documents all the time, since every worker thread has its own factory document. Hence we do not have the threading issues.

If we need to use objects over multiple requests, we need to create them in thread safe representations, which potentially do not include DOM elements. But this is only the case for a minority of the XML related objects created in muse.

Again, the TreadLocal approach does not solve our problem alone. It is just an optimization. Since some refactoring has to be done, I would suggest not to simply search and replace EMPTY_DOC by createDocument(), since the basic idea of reusing the document object was totally fine.

> point 2) is exactly the same problem just pushed further down the line. 
> The resource EPR with its seperate document is then read 
> simultaneously by seperate threads wishing to publish events leading to same issue.

I think this is really the root of the matter (at least for the EPRs). 
Potentially we must really use raw strings as internal representations of EPRs instead of DOM elements. 

Oliver


-------- Original-Nachricht --------
> Datum: Wed, 5 Sep 2007 13:20:21 +0200
> Von: Chris.Twiner@swisscom.com
> An: muse-dev@ws.apache.org
> Betreff: RE: RE: EMPTY_DOC thread stability issues

> Hi All,
> 
> Any piece of code that uses nodelists will be thread unsafe as soon as 
> a document is shared across threads.  It doesn't matter if the 
> original EPR has a seperate document or not.  The problem with TLSing 
> the Document itself is that the standard says its not thread safe 
> either, all that happens is push the problem further down the line 
> until two threads try to nodelist the same document.
> 
> point 1) affects this directly, the problem is that both the 
> getChildNodes optimisation and the importing (as Vinh just showed) aren't thread safe.
> 
> For EPRs at least the only way to be truly thread safe is to store the 
> actual information in raw xml strings (or objects) and re-parse it 
> every time you use an epr.  Its expensive to then reparse but at least safe.
> 
> point 2) is exactly the same problem just pushed further down the line. 
> The resource EPR with its seperate document is then read 
> simultaneously by seperate threads wishing to publish events leading to same issue.
> getFirstChild/getNextSibling help with this but only as another hack 
> tied to a specific version of xerces.
> 
> Please upload the test case, the more test cases we can use the better.
> 
> cheers,
> Chris
> 
> -----Original Message-----
> From: Oliver Waeldrich [mailto:oliver.waeldrich@gmx.de]
> Sent: Wednesday, September 05, 2007 12:52 PM
> To: muse-dev@ws.apache.org
> Subject: Re: RE: EMPTY_DOC thread stability issues
> 
> 
> Hi all,
> 
> I did some additional testing related to the EMPTY_DOC problem. 
> Therefore I set up a test where multiple threads created new EPRs 
> based on a set of existing (shared) EPRs. Such shared EPRs are resource EPRs (e.g.
> notification producer EPR). In this test two different kinds of exceptions are thrown:
> 
> 1.)
> java.lang.NullPointerException
>     at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
>     at org.apache.xerces.dom.ParentNode.item(Unknown Source)
>     at org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:932)
>     at org.apache.muse.util.xml.XmlUtils.getElement(XmlUtils.java:1144)
>     at org.apache.muse.util.xml.XmlUtils.getElement(XmlUtils.java:1119)
>     at
> org.apache.muse.test.case1.TestEndpointReference.initializeFromXML(TestEndpointReference.java:560)
>     at
> org.apache.muse.test.case1.TestEndpointReference.<init>(TestEndpointReference.java:193)
>     at org.apache.muse.test.case1.TestThread.run(TestThread.java:60)
> 
> This exception relates to the following code parts in the EPR
> implementation:
>    
>         ...
>         Document doc = XmlUtils.EMPTY_DOC;
>         _xml = XmlUtils.createElement(doc, typeName);
> 
>         NodeList children = copy.toXML().getChildNodes();
>         int length = children.getLength();
>         
>         for (int n = 0; n < length; ++n)
>         {
>             Node next = doc.importNode(children.item(n), true);
>             _xml.appendChild(next);
>         }
> 
>         try
>         {
>             initializeFromXML();
>         }
>         ...
>         
> The reason for the exception is that multiple threads import nodes 
> into the SHARED document. Later on in initializeFromXML() the 
> different element nodes within the SHARED document are traversed 
> simulations, which leads to the threading issue. The usage of 
> ThreadLocal WILL eliminate such errors, since no SHARED document is 
> used between the different threads to create XML elements. In fact each thread has its own document that is used for this.
> This is shown in the sample thread local.
> 
> 2.)
> 
> java.lang.NullPointerException
>     at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
>     at org.apache.xerces.dom.ParentNode.item(Unknown Source)
>     at
> org.apache.muse.test.case1.TestEndpointReference.<init>(TestEndpointReference.java:187)
>     at org.apache.muse.test.case1.TestThread.run(TestThread.java:60)
> 
> The second exception has a similar background. In the test case the 
> initial EPRs are created in the class Test via the
>     
>         TestEndpointReference(URI address, QName typeName)
>          
> constructor. This constructor creates the internal XML representation 
> by using EMPTY_DOC. The original EPR is copied by the following code:
> 
>     NodeList children = copy.toXML().getChildNodes();
>     int length = children.getLength();
>     
>     for (int n = 0; n < length; ++n)
>     {
>         Node next = doc.importNode(children.item(n), true);
>         _xml.appendChild(next);
>     }
> 
> The copy.toXML() method simply returns the XML element created with 
> EMPTY_DOC. Here again multiple thread try to traverse the same node
> concurrently which leads to the exception.        
>         
> Therefore, in my point of view we have two different problems:
> 
>      
>  1.) Multiple threads must not use the same target document to import 
> nodes.
>   
>      This means XML elements, that are only used within a request/response
>      cycle (e.g. for de-/serialization) can use ThreadLocal mechnisms.
>      It is not necessary to create new documents all the way for this 
> purpose.
>      
>  2.) XML elements, that have a lifetime longer than a request/response
>      cycle, must be created by using new (not shared) documents. This 
> I already
>      pointed out in my first mail. This is especially true for EPRs, 
> that are
>      
> 
> I attached a number of test cases that illustrate my point.
> 
> To Rich:
> 
>   >>The default implementation creates a thread for processing the
>   >>notify request, and then immediately releases the thread that 
> called it.
>   
>   You are right: in this special case the objects need to be copied
>   by using a separate document, since here the message objects leave the
>   request/response cycle. This has to be taken into account.
>   
> Finally, the usage of createDocument() compared to the solution I 
> presented adds a performance overhead of more than 50%. This can 
> simply be verified with the tests I attached. However, I admit that my 
> solution requires more attention by the programmers than simply replacing EMPTY_DOC, but I think
> it is worth.   
> 
> Interestingly, I still found that sometimes exceptions will occur, 
> even when using new documents for every XML operation. This is really 
> rare, but in my opinion this results from the “still” shared usage 
> of DOM objects within the EPR (the _xml element). Therefore, 
> concurrent read access to this instance variable can still occour. To 
> solve the problem with the EPR completely, this instance variable 
> should perhaps be removed from the EPR implementation.
>   
> Regards,
> Oliver
> 
> p.s. unfortunatelly I could not attach the test files, since this is 
> rejected by the mailer deamon. However, I could provide the test in the JIRA.
> --
> Psssst! Schon vom neuen GMX MultiMessenger gehört?
> Der kanns mit allen: http://www.gmx.net/de/go/multimessenger
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: muse-dev-help@ws.apache.org
> 

--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org

RE: EMPTY_DOC thread stability issues

Posted by Oliver Waeldrich <ol...@gmx.de>.
Hi all,

> point 1) affects this directly, the problem is that both the 
> getChildNodes optimisation and the importing (as Vinh just showed) 
> aren't thread safe.

I agree. The point is that using ThreadLocal is just an optimized version 
compared to the approach of creating new documents. It can be used for 
several cases were the XML fragments are only used temporally. Remember 
the initial idea why EMPTY_DOC was introduced was to minimize document 
creation. It should be used as a factory for creating XML fragments. 
In a lot of cases you create XML fragments (e.g. SOAP envelope, header, body), use them (e.g. de-/serialize) and discard them. 

Therefore, these fragments only exist for a short time within one 
worker thread of a servlet container, or more concrete they only exist 
within a reqest/response cycle. These fragments are also only accessed 
by the current thread. However, since document is not thread safe we 
can not share EMPTY_DOC over multiple worker threads.

MUSE would profit from using ThreadLocal and not creating new XML 
documents all the time, since every worker thread has its own factory 
document. Hence we do not have the threading issues.

If we need to use objects over multiple requests, we need to create them 
in thread safe representations, which potentially do not include DOM
elements. But this is only the case for a minority of the XML related 
objects created in muse.

Again, the TreadLocal approach does not solve our problem alone. It is 
just an optimization. Since some refactoring has to be done, I would 
suggest not to simply search and replace EMPTY_DOC by createDocument(), 
since the basic idea of reusing the document object was totally fine.

> point 2) is exactly the same problem just pushed further down the line. 
> The resource EPR with its seperate document is then read simultaneously
> by seperate threads wishing to publish events leading to same issue. 

I think this is really the root of the matter (at least for the EPRs). 
Potentially we must really use raw strings as internal representations
of EPRs instead of DOM elements. 

Oliver


-------- Original-Nachricht --------
> Datum: Wed, 5 Sep 2007 13:20:21 +0200
> Von: Chris.Twiner@swisscom.com
> An: muse-dev@ws.apache.org
> Betreff: RE: RE: EMPTY_DOC thread stability issues

> Hi All,
> 
> Any piece of code that uses nodelists will be thread unsafe as soon as a
> document is shared across threads.  It doesn't matter if the original EPR
> has a seperate document or not.  The problem with TLSing the Document itself
> is that the standard says its not thread safe either, all that happens is
> push the problem further down the line until two threads try to nodelist the
> same document.  
> 
> point 1) affects this directly, the problem is that both the getChildNodes
> optimisation and the importing (as Vinh just showed) aren't thread safe.
> 
> For EPRs at least the only way to be truly thread safe is to store the
> actual information in raw xml strings (or objects) and re-parse it every time
> you use an epr.  Its expensive to then reparse but at least safe.
> 
> point 2) is exactly the same problem just pushed further down the line. 
> The resource EPR with its seperate document is then read simultaneously by
> seperate threads wishing to publish events leading to same issue. 
> getFirstChild/getNextSibling help with this but only as another hack tied to a
> specific version of xerces.
> 
> Please upload the test case, the more test cases we can use the better.
> 
> cheers,
> Chris
> 
> -----Original Message-----
> From: Oliver Waeldrich [mailto:oliver.waeldrich@gmx.de] 
> Sent: Wednesday, September 05, 2007 12:52 PM
> To: muse-dev@ws.apache.org
> Subject: Re: RE: EMPTY_DOC thread stability issues
> 
> 
> Hi all,
> 
> I did some additional testing related to the EMPTY_DOC problem. Therefore
> I set up a test where multiple threads created new EPRs based on a set of
> existing (shared) EPRs. Such shared EPRs are resource EPRs (e.g.
> notification producer EPR). In this test two different kinds of exceptions are thrown:
> 
> 1.)
> java.lang.NullPointerException
>     at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
>     at org.apache.xerces.dom.ParentNode.item(Unknown Source)
>     at org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:932)
>     at org.apache.muse.util.xml.XmlUtils.getElement(XmlUtils.java:1144)
>     at org.apache.muse.util.xml.XmlUtils.getElement(XmlUtils.java:1119)
>     at
> org.apache.muse.test.case1.TestEndpointReference.initializeFromXML(TestEndpointReference.java:560)
>     at
> org.apache.muse.test.case1.TestEndpointReference.<init>(TestEndpointReference.java:193)
>     at org.apache.muse.test.case1.TestThread.run(TestThread.java:60)
> 
> This exception relates to the following code parts in the EPR
> implementation:
>    
>         ...
>         Document doc = XmlUtils.EMPTY_DOC;
>         _xml = XmlUtils.createElement(doc, typeName);
> 
>         NodeList children = copy.toXML().getChildNodes();
>         int length = children.getLength();
>         
>         for (int n = 0; n < length; ++n)
>         {
>             Node next = doc.importNode(children.item(n), true);
>             _xml.appendChild(next);
>         }
> 
>         try
>         {
>             initializeFromXML();
>         }
>         ...
>         
> The reason for the exception is that multiple threads import nodes into
> the SHARED document. Later on in initializeFromXML() the different element
> nodes within the SHARED document are traversed simulations, which leads to
> the threading issue. The usage of ThreadLocal WILL eliminate such errors,
> since no SHARED document is used between the different threads to create XML
> elements. In fact each thread has its own document that is used for this.
> This is shown in the sample thread local.
> 
> 2.)
> 
> java.lang.NullPointerException
>     at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
>     at org.apache.xerces.dom.ParentNode.item(Unknown Source)
>     at
> org.apache.muse.test.case1.TestEndpointReference.<init>(TestEndpointReference.java:187)
>     at org.apache.muse.test.case1.TestThread.run(TestThread.java:60)
> 
> The second exception has a similar background. In the test case the
> initial EPRs are created in the class Test via the
>     
>         TestEndpointReference(URI address, QName typeName)
>          
> constructor. This constructor creates the internal XML representation by
> using EMPTY_DOC. The original EPR is copied by the following code:
> 
>     NodeList children = copy.toXML().getChildNodes();
>     int length = children.getLength();
>     
>     for (int n = 0; n < length; ++n)
>     {
>         Node next = doc.importNode(children.item(n), true);
>         _xml.appendChild(next);
>     }
> 
> The copy.toXML() method simply returns the XML element created with
> EMPTY_DOC. Here again multiple thread try to traverse the same node
> concurrently which leads to the exception.        
>         
> Therefore, in my point of view we have two different problems:
> 
>      
>  1.) Multiple threads must not use the same target document to import
> nodes.
>   
>      This means XML elements, that are only used within a request/response
>      cycle (e.g. for de-/serialization) can use ThreadLocal mechnisms.
>      It is not necessary to create new documents all the way for this
> purpose.
>      
>  2.) XML elements, that have a lifetime longer than a request/response
>      cycle, must be created by using new (not shared) documents. This I
> already
>      pointed out in my first mail. This is especially true for EPRs, that
> are
>      
> 
> I attached a number of test cases that illustrate my point.
> 
> To Rich:
> 
>   >>The default implementation creates a thread for processing the
>   >>notify request, and then immediately releases the thread that called
> it.
>   
>   You are right: in this special case the objects need to be copied
>   by using a separate document, since here the message objects leave the
>   request/response cycle. This has to be taken into account.
>   
> Finally, the usage of createDocument() compared to the solution I
> presented adds a performance overhead of more than 50%. This can simply be verified
> with the tests I attached. However, I admit that my solution requires more
> attention by the programmers than simply replacing EMPTY_DOC, but I think
> it is worth.   
> 
> Interestingly, I still found that sometimes exceptions will occur, even
> when using new documents for every XML operation. This is really rare, but in
> my opinion this results from the “still” shared usage of DOM
> objects within the EPR (the _xml element). Therefore, concurrent read access
> to this instance variable can still occour. To solve the problem with the
> EPR completely, this instance variable should perhaps be removed from the
> EPR implementation.
>   
> Regards,
> Oliver
> 
> p.s. unfortunatelly I could not attach the test files, since this is
> rejected by the mailer deamon. However, I could provide the test in the JIRA.
> --
> Psssst! Schon vom neuen GMX MultiMessenger gehört?
> Der kanns mit allen: http://www.gmx.net/de/go/multimessenger
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: muse-dev-help@ws.apache.org
> 

-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org


RE: RE: EMPTY_DOC thread stability issues

Posted by Ch...@swisscom.com.
Hi All,

Any piece of code that uses nodelists will be thread unsafe as soon as a document is shared across threads.  It doesn't matter if the original EPR has a seperate document or not.  The problem with TLSing the Document itself is that the standard says its not thread safe either, all that happens is push the problem further down the line until two threads try to nodelist the same document.  

point 1) affects this directly, the problem is that both the getChildNodes optimisation and the importing (as Vinh just showed) aren't thread safe.

For EPRs at least the only way to be truly thread safe is to store the actual information in raw xml strings (or objects) and re-parse it every time you use an epr.  Its expensive to then reparse but at least safe.

point 2) is exactly the same problem just pushed further down the line.  The resource EPR with its seperate document is then read simultaneously by seperate threads wishing to publish events leading to same issue.  getFirstChild/getNextSibling help with this but only as another hack tied to a specific version of xerces.

Please upload the test case, the more test cases we can use the better.

cheers,
Chris

-----Original Message-----
From: Oliver Waeldrich [mailto:oliver.waeldrich@gmx.de] 
Sent: Wednesday, September 05, 2007 12:52 PM
To: muse-dev@ws.apache.org
Subject: Re: RE: EMPTY_DOC thread stability issues


Hi all,

I did some additional testing related to the EMPTY_DOC problem. Therefore I set up a test where multiple threads created new EPRs based on a set of existing (shared) EPRs. Such shared EPRs are resource EPRs (e.g. notification producer EPR). In this test two different kinds of exceptions are thrown:

1.)
java.lang.NullPointerException
    at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
    at org.apache.xerces.dom.ParentNode.item(Unknown Source)
    at org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:932)
    at org.apache.muse.util.xml.XmlUtils.getElement(XmlUtils.java:1144)
    at org.apache.muse.util.xml.XmlUtils.getElement(XmlUtils.java:1119)
    at org.apache.muse.test.case1.TestEndpointReference.initializeFromXML(TestEndpointReference.java:560)
    at org.apache.muse.test.case1.TestEndpointReference.<init>(TestEndpointReference.java:193)
    at org.apache.muse.test.case1.TestThread.run(TestThread.java:60)

This exception relates to the following code parts in the EPR implementation:
   
        ...
        Document doc = XmlUtils.EMPTY_DOC;
        _xml = XmlUtils.createElement(doc, typeName);

        NodeList children = copy.toXML().getChildNodes();
        int length = children.getLength();
        
        for (int n = 0; n < length; ++n)
        {
            Node next = doc.importNode(children.item(n), true);
            _xml.appendChild(next);
        }

        try
        {
            initializeFromXML();
        }
        ...
        
The reason for the exception is that multiple threads import nodes into the SHARED document. Later on in initializeFromXML() the different element nodes within the SHARED document are traversed simulations, which leads to the threading issue. The usage of ThreadLocal WILL eliminate such errors, since no SHARED document is used between the different threads to create XML elements. In fact each thread has its own document that is used for this. This is shown in the sample thread local.

2.)

java.lang.NullPointerException
    at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
    at org.apache.xerces.dom.ParentNode.item(Unknown Source)
    at org.apache.muse.test.case1.TestEndpointReference.<init>(TestEndpointReference.java:187)
    at org.apache.muse.test.case1.TestThread.run(TestThread.java:60)

The second exception has a similar background. In the test case the initial EPRs are created in the class Test via the
    
        TestEndpointReference(URI address, QName typeName)
         
constructor. This constructor creates the internal XML representation by using EMPTY_DOC. The original EPR is copied by the following code:

    NodeList children = copy.toXML().getChildNodes();
    int length = children.getLength();
    
    for (int n = 0; n < length; ++n)
    {
        Node next = doc.importNode(children.item(n), true);
        _xml.appendChild(next);
    }

The copy.toXML() method simply returns the XML element created with EMPTY_DOC. Here again multiple thread try to traverse the same node
concurrently which leads to the exception.        
        
Therefore, in my point of view we have two different problems:

     
 1.) Multiple threads must not use the same target document to import nodes.
  
     This means XML elements, that are only used within a request/response
     cycle (e.g. for de-/serialization) can use ThreadLocal mechnisms.
     It is not necessary to create new documents all the way for this purpose.
     
 2.) XML elements, that have a lifetime longer than a request/response
     cycle, must be created by using new (not shared) documents. This I already
     pointed out in my first mail. This is especially true for EPRs, that are
     

I attached a number of test cases that illustrate my point.

To Rich:

  >>The default implementation creates a thread for processing the
  >>notify request, and then immediately releases the thread that called it.
  
  You are right: in this special case the objects need to be copied
  by using a separate document, since here the message objects leave the
  request/response cycle. This has to be taken into account.
  
Finally, the usage of createDocument() compared to the solution I presented adds a performance overhead of more than 50%. This can simply be verified with the tests I attached. However, I admit that my solution requires more attention by the programmers than simply replacing EMPTY_DOC, but I think
it is worth.   

Interestingly, I still found that sometimes exceptions will occur, even when using new documents for every XML operation. This is really rare, but in my opinion this results from the “still” shared usage of DOM objects within the EPR (the _xml element). Therefore, concurrent read access to this instance variable can still occour. To solve the problem with the EPR completely, this instance variable should perhaps be removed from the EPR implementation.
  
Regards,
Oliver

p.s. unfortunatelly I could not attach the test files, since this is rejected by the mailer deamon. However, I could provide the test in the JIRA.
--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org


Re: RE: EMPTY_DOC thread stability issues

Posted by Oliver Waeldrich <ol...@gmx.de>.
Hi all,

I did some additional testing related to the EMPTY_DOC problem. Therefore I set up a test where multiple threads created new EPRs based on a set of existing (shared) EPRs. Such shared EPRs are resource EPRs (e.g. notification producer EPR). In this test two different kinds of exceptions are thrown:

1.)  
java.lang.NullPointerException
    at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
    at org.apache.xerces.dom.ParentNode.item(Unknown Source)
    at org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:932)
    at org.apache.muse.util.xml.XmlUtils.getElement(XmlUtils.java:1144)
    at org.apache.muse.util.xml.XmlUtils.getElement(XmlUtils.java:1119)
    at org.apache.muse.test.case1.TestEndpointReference.initializeFromXML(TestEndpointReference.java:560)
    at org.apache.muse.test.case1.TestEndpointReference.<init>(TestEndpointReference.java:193)
    at org.apache.muse.test.case1.TestThread.run(TestThread.java:60)

This exception relates to the following code parts in the EPR implementation:
   
        ...
        Document doc = XmlUtils.EMPTY_DOC;
        _xml = XmlUtils.createElement(doc, typeName);

        NodeList children = copy.toXML().getChildNodes();
        int length = children.getLength();
        
        for (int n = 0; n < length; ++n)
        {
            Node next = doc.importNode(children.item(n), true);
            _xml.appendChild(next);
        }

        try
        {
            initializeFromXML();
        }
        ...
        
The reason for the exception is that multiple threads import
nodes into the SHARED document. Later on in initializeFromXML()
the different element nodes within the SHARED document are traversed
simulations, which leads to the threading issue. The usage of
ThreadLocal WILL eliminate such errors, since no SHARED document is
used between the different threads to create XML elements. In fact
each thread has its own document that is used for this. This is shown
in the sample thread local.

2.)

java.lang.NullPointerException
    at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
    at org.apache.xerces.dom.ParentNode.item(Unknown Source)
    at org.apache.muse.test.case1.TestEndpointReference.<init>(TestEndpointReference.java:187)
    at org.apache.muse.test.case1.TestThread.run(TestThread.java:60)

The second exception has a similar background. In the test case
the initial EPRs are created in the class Test via the
    
        TestEndpointReference(URI address, QName typeName)
         
constructor. This constructor creates the internal XML representation
by using EMPTY_DOC. The original EPR is copied by the following code:

    NodeList children = copy.toXML().getChildNodes();
    int length = children.getLength();
    
    for (int n = 0; n < length; ++n)
    {
        Node next = doc.importNode(children.item(n), true);
        _xml.appendChild(next);
    }

The copy.toXML() method simply returns the XML element created with
EMPTY_DOC. Here again multiple thread try to traverse the same node
concurrently which leads to the exception.        
        
Therefore, in my point of view we have two different problems:

     
 1.) Multiple threads must not use the same target document to import nodes.
  
     This means XML elements, that are only used within a request/response
     cycle (e.g. for de-/serialization) can use ThreadLocal mechnisms.
     It is not necessary to create new documents all the way for this purpose.
     
 2.) XML elements, that have a lifetime longer than a request/response
     cycle, must be created by using new (not shared) documents. This I already
     pointed out in my first mail. This is especially true for EPRs, that are
     

I attached a number of test cases that illustrate my point.

To Rich:

  >>The default implementation creates a thread for processing the
  >>notify request, and then immediately releases the thread that called it.
  
  You are right: in this special case the objects need to be copied
  by using a separate document, since here the message objects leave the
  request/response cycle. This has to be taken into account.
  
Finally, the usage of createDocument() compared to the solution I presented
adds a performance overhead of more than 50%. This can simply be verified
with the tests I attached. However, I admit that my solution requires more
attention by the programmers than simply replacing EMPTY_DOC, but I think
it is worth.   

Interestingly, I still found that sometimes exceptions will occur, even when using new documents for every XML operation. This is really rare, but in my opinion this results from the “still” shared usage of DOM objects within the EPR (the _xml element). Therefore, concurrent read access to this instance variable can still occour. To solve the problem with the EPR completely, this instance variable should perhaps be removed from the EPR implementation.
  
Regards,
Oliver

p.s. unfortunatelly I could not attach the test files, since this is rejected by the mailer deamon. However, I could provide the test in the JIRA.
-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org


RE: EMPTY_DOC thread stability issues

Posted by "Vinh Nguyen (vinguye2)" <vi...@cisco.com>.
Hi all,
After doing more testing, the issue is correctly because of the Xerces
limitation.
Both a Document and a Node/Element are *NOT* thread-safe.

The following shows that Document.importNode() is not thread-safe:

17:26:49,730 ERROR [STDERR] java.lang.NullPointerException
17:26:49,730 ERROR [STDERR]     at
org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown Source)
17:26:49,730 ERROR [STDERR]     at
org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown Source)
17:26:49,730 ERROR [STDERR]     at
org.apache.muse.ws.addressing.EndpointReference.<init>(EndpointReference
.java:186)
17:26:49,730 ERROR [STDERR]     at
org.apache.muse.ws.notification.impl.SimpleNotificationMessage.setProduc
erReference(SimpleNotificationMessage.java:209)
17:26:49,730 ERROR [STDERR]     at
org.apache.muse.ws.notification.impl.SimpleSubscriptionManager.publish(S
impleSubscriptionManager.java:256)

The following shows that reading the children of a Node/Element is not
thread-safe:

00:16:53,400 ERROR [STDERR] java.lang.NullPointerException
00:16:53,400 ERROR [STDERR] 	at
org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
00:16:53,400 ERROR [STDERR] 	at
org.apache.xerces.dom.ParentNode.item(Unknown Source)
00:16:53,400 ERROR [STDERR] 	at
org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:883)
00:16:53,400 ERROR [STDERR] 	at
org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:815)
00:16:53,400 ERROR [STDERR] 	at
org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:791)
00:16:53,400 ERROR [STDERR] 	at
org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.java:974)
00:16:53,400 ERROR [STDERR] 	at
org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.java:977)
00:16:53,400 ERROR [STDERR] 	at
org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.java:940)
00:16:53,400 ERROR [STDERR] 	at
org.apache.muse.ws.notification.impl.SimpleNotificationMessage.toXML(Sim
pleNotificationMessage.java:291)
00:16:53,415 ERROR [STDERR] 	at
org.apache.muse.ws.notification.impl.SimpleNotificationMessage.toXML(Sim
pleNotificationMessage.java:239)
00:16:53,415 ERROR [STDERR] 	at
org.apache.muse.ws.notification.remote.NotificationConsumerClient.notify
(NotificationConsumerClient.java:97)
00:16:53,415 ERROR [STDERR] 	at
org.apache.muse.ws.notification.impl.SimpleSubscriptionManager.publish(S
impleSubscriptionManager.java:267)

To work around these problem, we should follow these rules:

1) Do not use XmlUtils.EMPTY_DOC in multi-thread processes.
2) Do not access the children of a Node/Element from multiple threads.
3) Do not create a Node/Element with its owner as XmlUtils.EMPTY_DOC, if
the node's children will be accessed.
4) Do not create a Node/Element with its owner as XmlUtils.EMPTY_DOC, if
the node itself can be accessed from multiple threads.

This most likely will affect a lot of Muse code.  As Chris pointed out,
Oliver's ThreadLocal solution for the EMPTY_DOC won't work.  EMPTY_DOC
can still be used, but any code which uses it must follow the rules
above.  Also, the DocumentBuilderFactory can't be a singleton either
since it's also not thread-safe.

NOTIFICATION FIXES:
The following updates finally fix the multi-thread notifications issue
for me:

1) Updated EndpointReference.java.  Changed all XmlUtils.EMPTY_DOC
references to XmlUtil.createDocument().  This is because a resource's
EPR and its Element representation can be accessed from multiple
threads, so all rules above apply.

2) Updated SimpleNotificationMessage.toXML().  Changed XmlUtil.EMPTY_DOC
reference to XmlUtil.createDocument().  This is because a root Element
is created and XmlUtils.getAllNamespaces(root) is called on it, so rule
#3 apply.

3) Updated NotificationConsumerClient.notify(NotificationMessage[]
messages) to not assume that messages[x].toXML() will return an Element
whose owner is EMPTY_DOC.
Changed these lines:
        Element notify =
XmlUtils.createElement(WsnConstants.NOTIFY_QNAME);        
        for (int n = 0; n < messages.length; ++n)
            notify.appendChild(messages[n].toXML(), true);
To these lines:
        Document doc = XmlUtils.createDocument();
        Element notify = XmlUtils.createElement(doc,
WsnConstants.NOTIFY_QNAME);
        for (int n = 0; n < messages.length; ++n)
            notify.appendChild(doc.importNode(messages[n].toXML(),
true));

So now, I no longer get any exceptions when generating notifications
simultaneously from multiple producers.

BUT, now I have another problem: notifications are still being lost
somehow, yet no errors appear at all!  I have to investigate this
further to see what's causing this problem.  If anyone encounters this
problem, please post to the group:)




-----Original Message-----
From: Chris.Twiner@swisscom.com [mailto:Chris.Twiner@swisscom.com] 
Sent: Tuesday, September 04, 2007 2:50 PM
To: muse-dev@ws.apache.org
Subject: RE: EMPTY_DOC thread stability issues 

Hi All,

Joining this thread on Vinh's recommendation.

This issue I believe stems from both

https://issues.apache.org/jira/browse/XERCESJ-911

where large amounts of Muse use this functionality.  Regarding the
ThreadLocal solution it doesn't work to cache documents themeselves.
The sdk is very clear that only DocumentFactory.getInstance is thread
safe.  You can use ThreadLocal with DocumentBuilders however. (new
DocumentBuilders is quite expensive, but new documents are much cheaper)

The solution below (combined with TLS DocumentBuilders) should resolve
the issue with both WRONG_DOCUMENTs and the 911 issue from xerces.  I
read in the rest of the thread that Vinh had a simple test case with
multiple threads.  I'm happy to do a quick refactor in the code to put
these two approaches into the code base, if you could send me the test
case Vinh, I'll get started on it tomorrow.

NB whilst this "should work" it is still against the spirit of xerces
which is share nothing.  I'd also note that due to the nodecache
optimisation within Xerces, using NodeLists its just not possible to be
thread safe even for reads.

copied from muse-user:

Hi,

>From what I could work out, from within the list comments and the code,
the state is stored in the Document itself, and as cloneNode uses
Object.clone and then sets the doc it won't work.  Using importNode
helps a little (as it uses getFirstChild()/getNextSibling()), but it
just puts the problem to a later stage.

getAllElements just does the same, calls getChildNodes and then forces
the cache to be used.  Deleting the cache just stops the null for the
parent, it doesn't stop incorrect nodes being returned or race
conditions with other nulls.

The simple thing is to stop using getChildNodes, from what I can see in
the code there isn't a need for it.  The only place I've seen that
doesn't require all of the nodes anyway is in EndpointReference's
getNumberOfParameters, but that behaviour can be safely cached (its not
used directly in the project anyway).

Looking further at the use cases in Muse only the IsolationLayer
(because of the DeferredImpl) needs to call hasChildNodes() on the
document node, for it to force that synchronizeChildren be called (its
cached from then on in each node).  Then every other piece of code can
simply pointer chase with the getFirstChild()/getNextSibling() approach.
No synchronization required.

re using other jaxp's, the DOM itself makes no statement about even read
thread safety.  All of the jaxp impls suffer some form of threading
problem.  Considering all of the problems with fighting against
namespace problems (much worse IMO) it makes sense to stick with the
devil you know :-<.

Again for most of the xerces releases using the
getFirstChild()/getNextSibling() is a seamless dropin for the
getChildNodes problem.  Its a shame that the xerces guys are very much
against any form of thread safety (except application enforced).  Going
with the standard approach the only safe thing is to always serialize to
objects / keep the strings around, which would overly complicate the
code.

I'm willing to give it a try and send you patched libs to try out (I
don't have a test case for this yet) if its quick to reproduce, just let
me know.  If it works out I can raise a jira with the patches.

cheers,
Chris 


---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: muse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: muse-dev-help@ws.apache.org