You are viewing a plain text version of this content. The canonical link for it is here.
Posted to p-dev@xerces.apache.org by "Jason E. Stewart" <ja...@openinformatics.com> on 2003/11/12 10:27:13 UTC

Re: stress testing Xerces

jason@openinformatics.com (Jason E. Stewart) writes:

> Brian Faull <bf...@mitre.org> writes:
> 
> Depends on what you are doing - if it is DOM related, then yes, you
> must tell the parser to release the memory, otherwise it grows. From
> the API docs:
> 
>    void AbstractDOMParser::resetDocumentPool()
>     	  
>   Reset the documents vector pool and release all the associated memory
>   back to the system.
>   
>   When parsing a document using a DOM parser, all memory allocated for a
>   DOM tree is associated to the DOM document.
>   
>   If you do multiple parse using the same DOM parser instance, then
>   multiple DOM documents will be generated and saved in a vector
>   pool. All these documents (and thus all the allocated memory) won't be
>   deleted until the parser instance is destroyed.
>   
>   If you don't need these DOM documents anymore and don't want to
>   destroy the DOM parser instance at this moment, then you can call this
>   method to reset the document vector pool and release all the allocated
>   memory back to the system.

As a note - if you create a new parser each time, this should *not*
cause a leak:

   while (1) {
     my $parser = XML::Xerces::XercesDOMParser->new();
     $parser->parse(XML::Xerces::MemBufInputSource->new('<test/>'));
   }

if it does, that's a *big* problem, and I'd like to know about it.

Cheers,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: stress testing Xerces

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org

Re: stress testing Xerces

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
Steve Mathias <sm...@unm.edu> writes:

> Sorry Jason, but I was probably not clear enough here.  From the
> perspective of my code "validation" includes parsing, ie. if validation
> is turned off parser(s) are never even created.  The only reason I'm
> parsing in this context is for the side effect of validation.  

Ok, yes that makes sense, thanks.

Cheers,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: stress testing Xerces

Posted by Steve Mathias <sm...@unm.edu>.
>>>>> "Jason" == Jason E Stewart <ja...@openinformatics.com> writes:

>> When the XML validation is turned on, the script gradually eats
>> memory until it crashes.  If validation is off, the script runs fine.

Jason> Well, that is good news (depending on your POV) - this is
Jason> possibly a Xerces-C memory leak then, and not something stupid
Jason> that I've done. I've writtent the list, and I'll work on a C++
Jason> test to see if I can reproduce it outside XML-Xerces.

<snip>

Jason> Do you really need to validate internally? You could wrap the
Jason> script to run an external validator like nsglms if you really
Jason> need it. I hope to have this fixed soon.

Sorry Jason, but I was probably not clear enough here.  From the
perspective of my code "validation" includes parsing, ie. if validation
is turned off parser(s) are never even created.  The only reason I'm
parsing in this context is for the side effect of validation.  I will
look into nsglms.

Jason> More as it happens, jas.

Thanks.

Steve
-- 
(    Stephen L. Mathias, Ph.D.                     (                    (
 )   Office of Biocomputing                         )  s m a t h i a s   )
(    University of New Mexico School of Medicine   (   @ p o b l a n o  (
 )   MSC08 4560                                     )  . h e a l t h .   )
(    1 University of New Mexico                    (   u n m . e d u    (
 )   Albuquerque, NM 87131-0001                     )                    )

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: stress testing Xerces

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
Steve Mathias <sm...@unm.edu> writes:

> I've been meaning to compose an e-mail about this for a few days now,
> but just haven't gotten around to it.  You might not like to hear this,
> but I think there is a *big* problem.

<sigh>
Okey dokey
</sigh>

> When the XML validation is turned on, the script gradually eats memory
> until it crashes.  If validation is off, the script runs fine.  

Well, that is good news (depending on your POV) - this is possibly a
Xerces-C memory leak then, and not something stupid that I've
done. I've writtent the list, and I'll work on a C++ test to see if I
can reproduce it outside XML-Xerces.

> I have tried everything I can think of to get the memory to be
> released, but with no success.
> 
> I am *definitely* creating a new parser every time.  

Yeah, there isn't anything that you can do about this - if it's in the
validation bit, that's deep in the internals of Xerces-C, and it's
nothing that XML-Xerces could possibly affect.

> Here's the sub that does the validation:
> 
> sub validateXML {
>   my $xml = shift ;
> 
>   # Just to make sure there is only one, $Parser is global but it's not used anywhere else:
>   $Parser = XML::Xerces::XMLReaderFactory::createXMLReader() ;
>   $Parser->setFeature("http://xml.org/sax/features/namespaces", 1) ;
>   $Parser->setFeature("http://apache.org/xml/features/validation/schema", 1) ;
>   $Parser->setFeature("http://apache.org/xml/features/validation/schema-full-checking", 1) ;
>   $Parser->setFeature("http://apache.org/xml/features/validation-error-as-fatal", 1) ;
>   $Parser->setFeature("http://xml.org/sax/features/validation", 1) ;
>   $Parser->$Parser->setFeature("http://apache.org/xml/features/validation/dynamic", 0) ;

See the new example for samples/SAX2Count.pl on how to use the unicode
constants defined in Xerces-C, that will keep you from having to
hard-code these strings in your app. All the unicode constants are
enumerated in docs/XMLUni.txt.

>   my $errorHandler = new XML::Xerces::PerlErrorHandler() ;
>   $Parser->setErrorHandler($errorHandler) ;
>   my $contentHandler = new XML::Xerces::PerlContentHandler() ;
>   $Parser->setContentHandler($contentHandler) ;
>
>   eval {
>     $Parser->parse( XML::Xerces::MemBufInputSource->new($xml) ) ;
>   } ;
>   undef $Parser ; # reclaim resources??

It should. In the meantime, I would run without validation. 

Do you really need to validate internally? You could wrap the script
to run an external validator like nsglms if you really need it. I hope
to have this fixed soon.

More as it happens,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: stress testing Xerces

Posted by Steve Mathias <sm...@unm.edu>.
>>>>> "Jason" == Jason E Stewart <ja...@openinformatics.com> writes:

Jason> jason@openinformatics.com (Jason E. Stewart) writes:
>> Brian Faull <bf...@mitre.org> writes:
>> 
>> Depends on what you are doing - if it is DOM related, then yes, you
>> must tell the parser to release the memory, otherwise it grows. From
>> the API docs:
>> 
>> void AbstractDOMParser::resetDocumentPool()
>> 
>> Reset the documents vector pool and release all the associated memory
>> back to the system.
>> 
>> When parsing a document using a DOM parser, all memory allocated for
>> a DOM tree is associated to the DOM document.
>> 
>> If you do multiple parse using the same DOM parser instance, then
>> multiple DOM documents will be generated and saved in a vector
>> pool. All these documents (and thus all the allocated memory) won't
>> be deleted until the parser instance is destroyed.
>> 
>> If you don't need these DOM documents anymore and don't want to
>> destroy the DOM parser instance at this moment, then you can call
>> this method to reset the document vector pool and release all the
>> allocated memory back to the system.

Jason> As a note - if you create a new parser each time, this should
Jason> *not* cause a leak:

Jason> while (1) { my $parser = XML::Xerces::XercesDOMParser->new();
Jason> $parser->parse(XML::Xerces::MemBufInputSource->new('<test/>')); }

Jason> if it does, that's a *big* problem, and I'd like to know about
Jason> it.

Hi Jason,

I've been meaning to compose an e-mail about this for a few days now,
but just haven't gotten around to it.  You might not like to hear this,
but I think there is a *big* problem.

I have a script which pulls data out of a database and formats it as
XML.  There is ~2.4Gb of XML once it is done.  The code pulls the data
out in chunks of reasonable size (~15Kb each as XML), formats each chunk
as an individual XML document, optionally validates the document against
a schema, and then prints it out.

When the XML validation is turned on, the script gradually eats memory
until it crashes.  If validation is off, the script runs fine.  I have
tried everything I can think of to get the memory to be released, but
with no success.

I am *definitely* creating a new parser every time.  Here's the sub that
does the validation:

sub validateXML {
  my $xml = shift ;

  # Just to make sure there is only one, $Parser is global but it's not used anywhere else:
  $Parser = XML::Xerces::XMLReaderFactory::createXMLReader() ;
  $Parser->setFeature("http://xml.org/sax/features/namespaces", 1) ;
  $Parser->setFeature("http://apache.org/xml/features/validation/schema", 1) ;
  $Parser->setFeature("http://apache.org/xml/features/validation/schema-full-checking", 1) ;
  $Parser->setFeature("http://apache.org/xml/features/validation-error-as-fatal", 1) ;
  $Parser->setFeature("http://xml.org/sax/features/validation", 1) ;
  $Parser->setFeature("http://apache.org/xml/features/validation/dynamic", 0) ;
  my $errorHandler = new XML::Xerces::PerlErrorHandler() ;
  $Parser->setErrorHandler($errorHandler) ;
  my $contentHandler = new XML::Xerces::PerlContentHandler() ;
  $Parser->setContentHandler($contentHandler) ;

  eval {
    $Parser->parse( XML::Xerces::MemBufInputSource->new($xml) ) ;
  } ;
  undef $Parser ; # reclaim resources??
  if ($@) {
    return (0, $@) ;
  } else {
    return (1, '') ;
  }
}

Thoughts, ideas...?

Steve
-- 
(    Stephen L. Mathias, Ph.D.                     (                    (
 )   Office of Biocomputing                         )  s m a t h i a s   )
(    University of New Mexico School of Medicine   (   @ p o b l a n o  (
 )   MSC08 4560                                     )  . h e a l t h .   )
(    1 University of New Mexico                    (   u n m . e d u    (
 )   Albuquerque, NM 87131-0001                     )                    )

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org