You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cxf.apache.org by "mr.andersen" <xm...@bec.dk> on 2008/10/01 08:55:14 UTC

Re: Schema DOM memory problem

Hi Daniel

Any change that you have found time to look into this problem and made some
changes.
Then I'm like to try it out, since I having the same problem as Charles had.

Morten



dkulp wrote:
> 
> 
> Charles,
> 
> One of the primary reasons (right now) for keeping the DOM tree around is 
> to work around some severe bugs in XmlSchema.   The XmlSchema serializer 
> in 1.3.2 loses a bunch of things so the results schemas that you get 
> would not be correct.    I think all the bugs have been fixed in 
> XmlSchema and I've been asking for a new release.  See:
> http://mail-archives.apache.org/mod_mbox/ws-commons-dev/200802.mbox/<200802071000.14543.dkulp%40apache.org>
> but so far, no luck.   I'd appreciate it if you could also start bugging 
> them.   :-)   If we can get a version that can actually round-trip 
> schema properly, I'm OK with dropping the DOM. 
> 
> That all said, I've also thought about creating a "ShemaManager" to go 
> along with the current WSDLManager to cache a lot of this.    Just 
> haven't gotten around to doing it.   I'd definitely welcome any patches 
> that would help us head that direction.   :-)
> 
> Dan
> 
> 
> 
> 
> 
> On Tuesday 12 February 2008, Charles O'Farrell wrote:
>> G'day all,
>>
>> I have been given the task of generating WSDL from my companies large
>> collection of application models, as well as handling the invoking of
>> corresponding services which are already deployed. The number of
>> possible services numbers in the hundreds, with a handful of large
>> (2MB) shared shemas.
>>
>> When trying to run a small Jetty server with more than one of these
>> generated WSDLs I quickly ran out of memory (the default setting - 64M
>> I think). While it wouldn't be hard to bump up the memory allocation,
>> I feared the final scenario of hundreds of WSDLs would be problematic
>> even for large amounts of memory.
>>
>> To cut a long story short this is what I found:
>>
>> 1. For each WSDL, every imported schema is loaded into memory,
>> regardless of whether it is shared among other WSDLs.
>> 2. Every Schema DOM tree is stored in memory after parsing.
>>
>> Given that the Schema is parsed to the more useful XmlSchema object
>> tree, I'm not sure what benefits are gained from keeping it in DOM. I
>> fixed the memory bloat by some minor changes in SchemaUtil, which I
>> will explain briefly here. Note that reflection was unfortunately
>> required in dealing with the XmlSchema library.
>>
>> 1. Used a static map to update the XmlSchemaCollection parameter with
>> any cached Schemas before calling schemaCol.read(schemaElem,
>> systemId); in extractSchema
>>
>> 2. Nulled out cached DOM elements in the following:
>>
>>    - extractSchema() -> xmlSchema.setElement() (well actually I
>> stopped it being set)
>>    - addSchema() -> schema.setElement() after targetNamespace is
>>    retrieved
>>    - At the end of getSchemas() iterate any new schemas, get its
>>    NodeNamespaceContext, call getDeclaredPrefixes() before settings
>> its node field to null.
>>
>> 3. Ignored schemaList from the constructor and instead just relied on
>> an internal set to avoid recursion. (I think this map is only needed
>> on the WSDL2Java?)
>> 4. Fixed WSDLQueryHandler to output full WSDL due to missing schema
>> node (I loaded it from the file system instead of serialising the
>> Definition object)
>>
>> I guess my biggest qualm in all this is that it was extremely
>> difficult to subclass and spring SchemaUtil to make the required
>> changes. In particular I had to reproduce the following invocation
>> class chain to fix the problem.
>>
>> JaxWsServiceFactoryBean -> buildServiceFromWSDL() ->
>> WSDLServiceFactory -> create() -> WSDLServiceBuilder -> getSchemas()
>> -> SchemaUtil
>>
>> Because SchemaUtil isn't a sprung object, nor any of the other
>> classes, and because most of the methods/fields are private I ended up
>> literally copy+pasting each class.
>>
>> Forgive me if this all sounds like criticism, because I am very
>> impressed and happy with CXF. This is just as much a documenting of my
>> findings as anything else.
>>
>> Anyway. I'm not too worried about what happens now but I am curious
>> what you guys think of all this.
>>
>> Cheers,
>>
>> Charles O'Farrell
> 
> 
> 
> -- 
> J. Daniel Kulp
> Principal Engineer, IONA
> dkulp@apache.org
> http://www.dankulp.com/blog
> 
> 

-- 
View this message in context: http://www.nabble.com/Schema-DOM-memory-problem-tp15430330p19755456.html
Sent from the cxf-user mailing list archive at Nabble.com.


Re: Schema DOM memory problem

Posted by Daniel Kulp <dk...@apache.org>.
On Wednesday 01 October 2008, mr.andersen wrote:
> Hi Daniel
>
> Any change that you have found time to look into this problem and made
> some changes.
> Then I'm like to try it out, since I having the same problem as
> Charles had.

I wish I had better news for you.  :-(

Everytime I turn around, I log some more issues with XmlSchema.  
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&pid=12310250&sorter/order=DESC&sorter/field=priority&resolution=-1&component=12310702

Particularly:
https://issues.apache.org/jira/browse/WSCOMMONS-363

A start of a patch is attached to the related issue, but the webservice 
team is pretty much ignoring xmlschema again.   Bugging them would be a 
good thing.

Dan



>
> Morten
>
> dkulp wrote:
> > Charles,
> >
> > One of the primary reasons (right now) for keeping the DOM tree
> > around is to work around some severe bugs in XmlSchema.   The
> > XmlSchema serializer in 1.3.2 loses a bunch of things so the results
> > schemas that you get would not be correct.    I think all the bugs
> > have been fixed in XmlSchema and I've been asking for a new release.
> >  See:
> > http://mail-archives.apache.org/mod_mbox/ws-commons-dev/200802.mbox/
> ><200802071000.14543.dkulp%40apache.org> but so far, no luck.   I'd
> > appreciate it if you could also start bugging them.   :-)   If we
> > can get a version that can actually round-trip schema properly, I'm
> > OK with dropping the DOM.
> >
> > That all said, I've also thought about creating a "ShemaManager" to
> > go along with the current WSDLManager to cache a lot of this.   
> > Just haven't gotten around to doing it.   I'd definitely welcome any
> > patches that would help us head that direction.   :-)
> >
> > Dan
> >
> > On Tuesday 12 February 2008, Charles O'Farrell wrote:
> >> G'day all,
> >>
> >> I have been given the task of generating WSDL from my companies
> >> large collection of application models, as well as handling the
> >> invoking of corresponding services which are already deployed. The
> >> number of possible services numbers in the hundreds, with a handful
> >> of large (2MB) shared shemas.
> >>
> >> When trying to run a small Jetty server with more than one of these
> >> generated WSDLs I quickly ran out of memory (the default setting -
> >> 64M I think). While it wouldn't be hard to bump up the memory
> >> allocation, I feared the final scenario of hundreds of WSDLs would
> >> be problematic even for large amounts of memory.
> >>
> >> To cut a long story short this is what I found:
> >>
> >> 1. For each WSDL, every imported schema is loaded into memory,
> >> regardless of whether it is shared among other WSDLs.
> >> 2. Every Schema DOM tree is stored in memory after parsing.
> >>
> >> Given that the Schema is parsed to the more useful XmlSchema object
> >> tree, I'm not sure what benefits are gained from keeping it in DOM.
> >> I fixed the memory bloat by some minor changes in SchemaUtil, which
> >> I will explain briefly here. Note that reflection was unfortunately
> >> required in dealing with the XmlSchema library.
> >>
> >> 1. Used a static map to update the XmlSchemaCollection parameter
> >> with any cached Schemas before calling schemaCol.read(schemaElem,
> >> systemId); in extractSchema
> >>
> >> 2. Nulled out cached DOM elements in the following:
> >>
> >>    - extractSchema() -> xmlSchema.setElement() (well actually I
> >> stopped it being set)
> >>    - addSchema() -> schema.setElement() after targetNamespace is
> >>    retrieved
> >>    - At the end of getSchemas() iterate any new schemas, get its
> >>    NodeNamespaceContext, call getDeclaredPrefixes() before settings
> >> its node field to null.
> >>
> >> 3. Ignored schemaList from the constructor and instead just relied
> >> on an internal set to avoid recursion. (I think this map is only
> >> needed on the WSDL2Java?)
> >> 4. Fixed WSDLQueryHandler to output full WSDL due to missing schema
> >> node (I loaded it from the file system instead of serialising the
> >> Definition object)
> >>
> >> I guess my biggest qualm in all this is that it was extremely
> >> difficult to subclass and spring SchemaUtil to make the required
> >> changes. In particular I had to reproduce the following invocation
> >> class chain to fix the problem.
> >>
> >> JaxWsServiceFactoryBean -> buildServiceFromWSDL() ->
> >> WSDLServiceFactory -> create() -> WSDLServiceBuilder ->
> >> getSchemas() -> SchemaUtil
> >>
> >> Because SchemaUtil isn't a sprung object, nor any of the other
> >> classes, and because most of the methods/fields are private I ended
> >> up literally copy+pasting each class.
> >>
> >> Forgive me if this all sounds like criticism, because I am very
> >> impressed and happy with CXF. This is just as much a documenting of
> >> my findings as anything else.
> >>
> >> Anyway. I'm not too worried about what happens now but I am curious
> >> what you guys think of all this.
> >>
> >> Cheers,
> >>
> >> Charles O'Farrell
> >
> > --
> > J. Daniel Kulp
> > Principal Engineer, IONA
> > dkulp@apache.org
> > http://www.dankulp.com/blog



-- 
J. Daniel Kulp
Principal Engineer, IONA
dkulp@apache.org
http://www.dankulp.com/blog