You are viewing a plain text version of this content. The canonical link for it is here.
Posted to p-dev@xerces.apache.org by "Jason E. Stewart" <ja...@openinformatics.com> on 2003/11/12 09:46:22 UTC

Re: subscribed! (was Re: XML::Simple in Xerces (was Re: Xerces-perl for Win32))

Brian Faull <bf...@mitre.org> writes:

> I am stress-testing my first Xerces application 

good, I haven't stress tested it in a looooong time.

> and seem to be running into a HUGE memory leak. 

not so good.

> Not sure if it's on my end, or in Xerces-p or in Xerces-C... need to
> do a bit more investigation before I draw conclusions. I'm streaming
> (as fast as possible) 500-byte (or so) XML strings... within 10
> minutes, X has locked up and the hard drive is thrashing... so this
> is pretty serious. :) Have you run into anything like this? Or, do
> you know if there's any Xerces call I need to make to be sure that
> objects are freed? I don't see any related posts...

Depends on what you are doing - if it is DOM related, then yes, you
must tell the parser to release the memory, otherwise it grows. From
the API docs:

   void AbstractDOMParser::resetDocumentPool()
    	  
  Reset the documents vector pool and release all the associated memory
  back to the system.
  
  When parsing a document using a DOM parser, all memory allocated for a
  DOM tree is associated to the DOM document.
  
  If you do multiple parse using the same DOM parser instance, then
  multiple DOM documents will be generated and saved in a vector
  pool. All these documents (and thus all the allocated memory) won't be
  deleted until the parser instance is destroyed.
  
  If you don't need these DOM documents anymore and don't want to
  destroy the DOM parser instance at this moment, then you can call this
  method to reset the document vector pool and release all the allocated
  memory back to the system.


There are not likely to be Xerces-C leaks, they have been worked on
for a long time, but it is possible there are XML-Xerces leaks - I
haven't really tested this in some time.

Post any trouble to the list,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: stress testing Xerces

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org

Re: stress testing Xerces

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
Steve Mathias <sm...@unm.edu> writes:

> Sorry Jason, but I was probably not clear enough here.  From the
> perspective of my code "validation" includes parsing, ie. if validation
> is turned off parser(s) are never even created.  The only reason I'm
> parsing in this context is for the side effect of validation.  

Ok, yes that makes sense, thanks.

Cheers,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: stress testing Xerces

Posted by Steve Mathias <sm...@unm.edu>.
>>>>> "Jason" == Jason E Stewart <ja...@openinformatics.com> writes:

>> When the XML validation is turned on, the script gradually eats
>> memory until it crashes.  If validation is off, the script runs fine.

Jason> Well, that is good news (depending on your POV) - this is
Jason> possibly a Xerces-C memory leak then, and not something stupid
Jason> that I've done. I've writtent the list, and I'll work on a C++
Jason> test to see if I can reproduce it outside XML-Xerces.

<snip>

Jason> Do you really need to validate internally? You could wrap the
Jason> script to run an external validator like nsglms if you really
Jason> need it. I hope to have this fixed soon.

Sorry Jason, but I was probably not clear enough here.  From the
perspective of my code "validation" includes parsing, ie. if validation
is turned off parser(s) are never even created.  The only reason I'm
parsing in this context is for the side effect of validation.  I will
look into nsglms.

Jason> More as it happens, jas.

Thanks.

Steve
-- 
(    Stephen L. Mathias, Ph.D.                     (                    (
 )   Office of Biocomputing                         )  s m a t h i a s   )
(    University of New Mexico School of Medicine   (   @ p o b l a n o  (
 )   MSC08 4560                                     )  . h e a l t h .   )
(    1 University of New Mexico                    (   u n m . e d u    (
 )   Albuquerque, NM 87131-0001                     )                    )

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: stress testing Xerces

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
Steve Mathias <sm...@unm.edu> writes:

> I've been meaning to compose an e-mail about this for a few days now,
> but just haven't gotten around to it.  You might not like to hear this,
> but I think there is a *big* problem.

<sigh>
Okey dokey
</sigh>

> When the XML validation is turned on, the script gradually eats memory
> until it crashes.  If validation is off, the script runs fine.  

Well, that is good news (depending on your POV) - this is possibly a
Xerces-C memory leak then, and not something stupid that I've
done. I've writtent the list, and I'll work on a C++ test to see if I
can reproduce it outside XML-Xerces.

> I have tried everything I can think of to get the memory to be
> released, but with no success.
> 
> I am *definitely* creating a new parser every time.  

Yeah, there isn't anything that you can do about this - if it's in the
validation bit, that's deep in the internals of Xerces-C, and it's
nothing that XML-Xerces could possibly affect.

> Here's the sub that does the validation:
> 
> sub validateXML {
>   my $xml = shift ;
> 
>   # Just to make sure there is only one, $Parser is global but it's not used anywhere else:
>   $Parser = XML::Xerces::XMLReaderFactory::createXMLReader() ;
>   $Parser->setFeature("http://xml.org/sax/features/namespaces", 1) ;
>   $Parser->setFeature("http://apache.org/xml/features/validation/schema", 1) ;
>   $Parser->setFeature("http://apache.org/xml/features/validation/schema-full-checking", 1) ;
>   $Parser->setFeature("http://apache.org/xml/features/validation-error-as-fatal", 1) ;
>   $Parser->setFeature("http://xml.org/sax/features/validation", 1) ;
>   $Parser->$Parser->setFeature("http://apache.org/xml/features/validation/dynamic", 0) ;

See the new example for samples/SAX2Count.pl on how to use the unicode
constants defined in Xerces-C, that will keep you from having to
hard-code these strings in your app. All the unicode constants are
enumerated in docs/XMLUni.txt.

>   my $errorHandler = new XML::Xerces::PerlErrorHandler() ;
>   $Parser->setErrorHandler($errorHandler) ;
>   my $contentHandler = new XML::Xerces::PerlContentHandler() ;
>   $Parser->setContentHandler($contentHandler) ;
>
>   eval {
>     $Parser->parse( XML::Xerces::MemBufInputSource->new($xml) ) ;
>   } ;
>   undef $Parser ; # reclaim resources??

It should. In the meantime, I would run without validation. 

Do you really need to validate internally? You could wrap the script
to run an external validator like nsglms if you really need it. I hope
to have this fixed soon.

More as it happens,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: stress testing Xerces

Posted by Steve Mathias <sm...@unm.edu>.
>>>>> "Jason" == Jason E Stewart <ja...@openinformatics.com> writes:

Jason> jason@openinformatics.com (Jason E. Stewart) writes:
>> Brian Faull <bf...@mitre.org> writes:
>> 
>> Depends on what you are doing - if it is DOM related, then yes, you
>> must tell the parser to release the memory, otherwise it grows. From
>> the API docs:
>> 
>> void AbstractDOMParser::resetDocumentPool()
>> 
>> Reset the documents vector pool and release all the associated memory
>> back to the system.
>> 
>> When parsing a document using a DOM parser, all memory allocated for
>> a DOM tree is associated to the DOM document.
>> 
>> If you do multiple parse using the same DOM parser instance, then
>> multiple DOM documents will be generated and saved in a vector
>> pool. All these documents (and thus all the allocated memory) won't
>> be deleted until the parser instance is destroyed.
>> 
>> If you don't need these DOM documents anymore and don't want to
>> destroy the DOM parser instance at this moment, then you can call
>> this method to reset the document vector pool and release all the
>> allocated memory back to the system.

Jason> As a note - if you create a new parser each time, this should
Jason> *not* cause a leak:

Jason> while (1) { my $parser = XML::Xerces::XercesDOMParser->new();
Jason> $parser->parse(XML::Xerces::MemBufInputSource->new('<test/>')); }

Jason> if it does, that's a *big* problem, and I'd like to know about
Jason> it.

Hi Jason,

I've been meaning to compose an e-mail about this for a few days now,
but just haven't gotten around to it.  You might not like to hear this,
but I think there is a *big* problem.

I have a script which pulls data out of a database and formats it as
XML.  There is ~2.4Gb of XML once it is done.  The code pulls the data
out in chunks of reasonable size (~15Kb each as XML), formats each chunk
as an individual XML document, optionally validates the document against
a schema, and then prints it out.

When the XML validation is turned on, the script gradually eats memory
until it crashes.  If validation is off, the script runs fine.  I have
tried everything I can think of to get the memory to be released, but
with no success.

I am *definitely* creating a new parser every time.  Here's the sub that
does the validation:

sub validateXML {
  my $xml = shift ;

  # Just to make sure there is only one, $Parser is global but it's not used anywhere else:
  $Parser = XML::Xerces::XMLReaderFactory::createXMLReader() ;
  $Parser->setFeature("http://xml.org/sax/features/namespaces", 1) ;
  $Parser->setFeature("http://apache.org/xml/features/validation/schema", 1) ;
  $Parser->setFeature("http://apache.org/xml/features/validation/schema-full-checking", 1) ;
  $Parser->setFeature("http://apache.org/xml/features/validation-error-as-fatal", 1) ;
  $Parser->setFeature("http://xml.org/sax/features/validation", 1) ;
  $Parser->setFeature("http://apache.org/xml/features/validation/dynamic", 0) ;
  my $errorHandler = new XML::Xerces::PerlErrorHandler() ;
  $Parser->setErrorHandler($errorHandler) ;
  my $contentHandler = new XML::Xerces::PerlContentHandler() ;
  $Parser->setContentHandler($contentHandler) ;

  eval {
    $Parser->parse( XML::Xerces::MemBufInputSource->new($xml) ) ;
  } ;
  undef $Parser ; # reclaim resources??
  if ($@) {
    return (0, $@) ;
  } else {
    return (1, '') ;
  }
}

Thoughts, ideas...?

Steve
-- 
(    Stephen L. Mathias, Ph.D.                     (                    (
 )   Office of Biocomputing                         )  s m a t h i a s   )
(    University of New Mexico School of Medicine   (   @ p o b l a n o  (
 )   MSC08 4560                                     )  . h e a l t h .   )
(    1 University of New Mexico                    (   u n m . e d u    (
 )   Albuquerque, NM 87131-0001                     )                    )

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: stress testing Xerces

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
jason@openinformatics.com (Jason E. Stewart) writes:

> Brian Faull <bf...@mitre.org> writes:
> 
> Depends on what you are doing - if it is DOM related, then yes, you
> must tell the parser to release the memory, otherwise it grows. From
> the API docs:
> 
>    void AbstractDOMParser::resetDocumentPool()
>     	  
>   Reset the documents vector pool and release all the associated memory
>   back to the system.
>   
>   When parsing a document using a DOM parser, all memory allocated for a
>   DOM tree is associated to the DOM document.
>   
>   If you do multiple parse using the same DOM parser instance, then
>   multiple DOM documents will be generated and saved in a vector
>   pool. All these documents (and thus all the allocated memory) won't be
>   deleted until the parser instance is destroyed.
>   
>   If you don't need these DOM documents anymore and don't want to
>   destroy the DOM parser instance at this moment, then you can call this
>   method to reset the document vector pool and release all the allocated
>   memory back to the system.

As a note - if you create a new parser each time, this should *not*
cause a leak:

   while (1) {
     my $parser = XML::Xerces::XercesDOMParser->new();
     $parser->parse(XML::Xerces::MemBufInputSource->new('<test/>'));
   }

if it does, that's a *big* problem, and I'd like to know about it.

Cheers,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org