You are viewing a plain text version of this content. The canonical link for it is here.
Posted to p-dev@xerces.apache.org by "Jason E. Stewart" <ja...@openinformatics.com> on 2003/11/08 05:45:59 UTC

Re: Xerces-perl for Win32

Brian Faull <bf...@mitre.org> writes:

> Thank you so much for the quick response! I really appreciate it.

Welcome.

> I'm trying to do precisely those two things: an ActivePerl module and
> generate some example code, so I'll be happy to contribute as soon as I
> have my head around the intended usage of the API!

Excellent, that will be very helpful. Email the list if you have any
API questions - I like to have the answers archived so that other
users can locate them in the list archives or in the official:
Distributed Tech Library(TM) ... (aka. Google).

> I'm trying to do one simple thing... grab the entire contents of the XML
> file as a big hash... much like XML::Simple. 

Ok. That would be a useful application to have.

> I've got the tags that have attributes using getAttributes -- very
> easy. However, stepping through an element that has a number of
> children, then (recursively) constructing a hash is utterly
> painful. Since this is DOM, I presume that this structure already
> exists in memory anyway... is there any way to access this with
> Perl? DomDocument->to_hash()!?

I would use the DOMNodeIterator API to automatically walk the tree
(you could also use the DOMTreeWalker API, but I think NodeIterator is
the officially supported one). Check out t/DOMNodeIterator.t to see
how to do it. The trick is to create a filter that only accepts
Element nodes, and then keep a stack of the current DOMElement's and
you can use the equality operator to test who is the parent of the
current node. 

Let me see if I can whip something up.

> I'm sure you get plenty of emails like this -- I'll email the list too. If
> you've got a quick solution, though, I'd be happy to entertain it!

It's better to email the list - that way it gets archived.

Cheers,
jas.

> Thanks again for your help,
> -brian
> 
> 
> "Jason E. Stewart" wrote:
> > 
> > Brian Faull <bf...@mitre.org> writes:
> > 
> > > I'm working on a project in Perl that involves a lot of XML parsing,
> > > XML::Simple isn't cutting it anymore; we're looking into Xerces; we need
> > > to be portable so Win32 support is a must. Xerces looks great except for
> > > the no-windows support.
> > 
> > Hi Brian,
> > 
> > My appologies, but the Xerces-p web page is *really* out of date. We
> > are at release 2.3.0-3, and yes, there is Win32 support.
> > 
> > Check out the archives of the dev list for Oct and Nov at:
> > 
> >   http://nagoya.apache.org/eyebrowse/SummarizeList?listId=86
> > 
> > Martin Raspe has been working on this, and has gotten good results.
> > 
> > >  - Is there an archive tarball of the last (1.3.3-try3) Win32-supported
> > > release? (can't find it anywhere...)
> > 
> > You should be able to build the existing code with VC++ 6.
> > 
> > >  - Need help with the Windows port? Might be able to throw some time
> > >    at it...
> > 
> > Yes! Martin has been doing great, but I would love to get a packaged
> > binary (a PPM module for ActivePerl), plus debugging some of the
> > exception handling problems is always appreciated.
> > 
> > Other things would be example code files to include with the project
> > or just short tutorials on how to get started.
> > 
> > Cheers,
> > jas.
> 
> -- 
> Brian Faull
> Senior Integrated Electronics Engineer
> D620 - Communications and Networking
> The MITRE Corporation
> 202 Burlington Road, MS E015
> Bedford, MA 01730-1420
> V:781.271.5736  F:781.271.8875
> mailto:bfaull@mitre.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: stress testing Xerces

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org

Re: stress testing Xerces

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
Steve Mathias <sm...@unm.edu> writes:

> Sorry Jason, but I was probably not clear enough here.  From the
> perspective of my code "validation" includes parsing, ie. if validation
> is turned off parser(s) are never even created.  The only reason I'm
> parsing in this context is for the side effect of validation.  

Ok, yes that makes sense, thanks.

Cheers,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: stress testing Xerces

Posted by Steve Mathias <sm...@unm.edu>.
>>>>> "Jason" == Jason E Stewart <ja...@openinformatics.com> writes:

>> When the XML validation is turned on, the script gradually eats
>> memory until it crashes.  If validation is off, the script runs fine.

Jason> Well, that is good news (depending on your POV) - this is
Jason> possibly a Xerces-C memory leak then, and not something stupid
Jason> that I've done. I've writtent the list, and I'll work on a C++
Jason> test to see if I can reproduce it outside XML-Xerces.

<snip>

Jason> Do you really need to validate internally? You could wrap the
Jason> script to run an external validator like nsglms if you really
Jason> need it. I hope to have this fixed soon.

Sorry Jason, but I was probably not clear enough here.  From the
perspective of my code "validation" includes parsing, ie. if validation
is turned off parser(s) are never even created.  The only reason I'm
parsing in this context is for the side effect of validation.  I will
look into nsglms.

Jason> More as it happens, jas.

Thanks.

Steve
-- 
(    Stephen L. Mathias, Ph.D.                     (                    (
 )   Office of Biocomputing                         )  s m a t h i a s   )
(    University of New Mexico School of Medicine   (   @ p o b l a n o  (
 )   MSC08 4560                                     )  . h e a l t h .   )
(    1 University of New Mexico                    (   u n m . e d u    (
 )   Albuquerque, NM 87131-0001                     )                    )

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: stress testing Xerces

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
Steve Mathias <sm...@unm.edu> writes:

> I've been meaning to compose an e-mail about this for a few days now,
> but just haven't gotten around to it.  You might not like to hear this,
> but I think there is a *big* problem.

<sigh>
Okey dokey
</sigh>

> When the XML validation is turned on, the script gradually eats memory
> until it crashes.  If validation is off, the script runs fine.  

Well, that is good news (depending on your POV) - this is possibly a
Xerces-C memory leak then, and not something stupid that I've
done. I've writtent the list, and I'll work on a C++ test to see if I
can reproduce it outside XML-Xerces.

> I have tried everything I can think of to get the memory to be
> released, but with no success.
> 
> I am *definitely* creating a new parser every time.  

Yeah, there isn't anything that you can do about this - if it's in the
validation bit, that's deep in the internals of Xerces-C, and it's
nothing that XML-Xerces could possibly affect.

> Here's the sub that does the validation:
> 
> sub validateXML {
>   my $xml = shift ;
> 
>   # Just to make sure there is only one, $Parser is global but it's not used anywhere else:
>   $Parser = XML::Xerces::XMLReaderFactory::createXMLReader() ;
>   $Parser->setFeature("http://xml.org/sax/features/namespaces", 1) ;
>   $Parser->setFeature("http://apache.org/xml/features/validation/schema", 1) ;
>   $Parser->setFeature("http://apache.org/xml/features/validation/schema-full-checking", 1) ;
>   $Parser->setFeature("http://apache.org/xml/features/validation-error-as-fatal", 1) ;
>   $Parser->setFeature("http://xml.org/sax/features/validation", 1) ;
>   $Parser->$Parser->setFeature("http://apache.org/xml/features/validation/dynamic", 0) ;

See the new example for samples/SAX2Count.pl on how to use the unicode
constants defined in Xerces-C, that will keep you from having to
hard-code these strings in your app. All the unicode constants are
enumerated in docs/XMLUni.txt.

>   my $errorHandler = new XML::Xerces::PerlErrorHandler() ;
>   $Parser->setErrorHandler($errorHandler) ;
>   my $contentHandler = new XML::Xerces::PerlContentHandler() ;
>   $Parser->setContentHandler($contentHandler) ;
>
>   eval {
>     $Parser->parse( XML::Xerces::MemBufInputSource->new($xml) ) ;
>   } ;
>   undef $Parser ; # reclaim resources??

It should. In the meantime, I would run without validation. 

Do you really need to validate internally? You could wrap the script
to run an external validator like nsglms if you really need it. I hope
to have this fixed soon.

More as it happens,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: stress testing Xerces

Posted by Steve Mathias <sm...@unm.edu>.
>>>>> "Jason" == Jason E Stewart <ja...@openinformatics.com> writes:

Jason> jason@openinformatics.com (Jason E. Stewart) writes:
>> Brian Faull <bf...@mitre.org> writes:
>> 
>> Depends on what you are doing - if it is DOM related, then yes, you
>> must tell the parser to release the memory, otherwise it grows. From
>> the API docs:
>> 
>> void AbstractDOMParser::resetDocumentPool()
>> 
>> Reset the documents vector pool and release all the associated memory
>> back to the system.
>> 
>> When parsing a document using a DOM parser, all memory allocated for
>> a DOM tree is associated to the DOM document.
>> 
>> If you do multiple parse using the same DOM parser instance, then
>> multiple DOM documents will be generated and saved in a vector
>> pool. All these documents (and thus all the allocated memory) won't
>> be deleted until the parser instance is destroyed.
>> 
>> If you don't need these DOM documents anymore and don't want to
>> destroy the DOM parser instance at this moment, then you can call
>> this method to reset the document vector pool and release all the
>> allocated memory back to the system.

Jason> As a note - if you create a new parser each time, this should
Jason> *not* cause a leak:

Jason> while (1) { my $parser = XML::Xerces::XercesDOMParser->new();
Jason> $parser->parse(XML::Xerces::MemBufInputSource->new('<test/>')); }

Jason> if it does, that's a *big* problem, and I'd like to know about
Jason> it.

Hi Jason,

I've been meaning to compose an e-mail about this for a few days now,
but just haven't gotten around to it.  You might not like to hear this,
but I think there is a *big* problem.

I have a script which pulls data out of a database and formats it as
XML.  There is ~2.4Gb of XML once it is done.  The code pulls the data
out in chunks of reasonable size (~15Kb each as XML), formats each chunk
as an individual XML document, optionally validates the document against
a schema, and then prints it out.

When the XML validation is turned on, the script gradually eats memory
until it crashes.  If validation is off, the script runs fine.  I have
tried everything I can think of to get the memory to be released, but
with no success.

I am *definitely* creating a new parser every time.  Here's the sub that
does the validation:

sub validateXML {
  my $xml = shift ;

  # Just to make sure there is only one, $Parser is global but it's not used anywhere else:
  $Parser = XML::Xerces::XMLReaderFactory::createXMLReader() ;
  $Parser->setFeature("http://xml.org/sax/features/namespaces", 1) ;
  $Parser->setFeature("http://apache.org/xml/features/validation/schema", 1) ;
  $Parser->setFeature("http://apache.org/xml/features/validation/schema-full-checking", 1) ;
  $Parser->setFeature("http://apache.org/xml/features/validation-error-as-fatal", 1) ;
  $Parser->setFeature("http://xml.org/sax/features/validation", 1) ;
  $Parser->setFeature("http://apache.org/xml/features/validation/dynamic", 0) ;
  my $errorHandler = new XML::Xerces::PerlErrorHandler() ;
  $Parser->setErrorHandler($errorHandler) ;
  my $contentHandler = new XML::Xerces::PerlContentHandler() ;
  $Parser->setContentHandler($contentHandler) ;

  eval {
    $Parser->parse( XML::Xerces::MemBufInputSource->new($xml) ) ;
  } ;
  undef $Parser ; # reclaim resources??
  if ($@) {
    return (0, $@) ;
  } else {
    return (1, '') ;
  }
}

Thoughts, ideas...?

Steve
-- 
(    Stephen L. Mathias, Ph.D.                     (                    (
 )   Office of Biocomputing                         )  s m a t h i a s   )
(    University of New Mexico School of Medicine   (   @ p o b l a n o  (
 )   MSC08 4560                                     )  . h e a l t h .   )
(    1 University of New Mexico                    (   u n m . e d u    (
 )   Albuquerque, NM 87131-0001                     )                    )

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: stress testing Xerces

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
jason@openinformatics.com (Jason E. Stewart) writes:

> Brian Faull <bf...@mitre.org> writes:
> 
> Depends on what you are doing - if it is DOM related, then yes, you
> must tell the parser to release the memory, otherwise it grows. From
> the API docs:
> 
>    void AbstractDOMParser::resetDocumentPool()
>     	  
>   Reset the documents vector pool and release all the associated memory
>   back to the system.
>   
>   When parsing a document using a DOM parser, all memory allocated for a
>   DOM tree is associated to the DOM document.
>   
>   If you do multiple parse using the same DOM parser instance, then
>   multiple DOM documents will be generated and saved in a vector
>   pool. All these documents (and thus all the allocated memory) won't be
>   deleted until the parser instance is destroyed.
>   
>   If you don't need these DOM documents anymore and don't want to
>   destroy the DOM parser instance at this moment, then you can call this
>   method to reset the document vector pool and release all the allocated
>   memory back to the system.

As a note - if you create a new parser each time, this should *not*
cause a leak:

   while (1) {
     my $parser = XML::Xerces::XercesDOMParser->new();
     $parser->parse(XML::Xerces::MemBufInputSource->new('<test/>'));
   }

if it does, that's a *big* problem, and I'd like to know about it.

Cheers,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: subscribed! (was Re: XML::Simple in Xerces (was Re: Xerces-perl for Win32))

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
Brian Faull <bf...@mitre.org> writes:

> I am stress-testing my first Xerces application 

good, I haven't stress tested it in a looooong time.

> and seem to be running into a HUGE memory leak. 

not so good.

> Not sure if it's on my end, or in Xerces-p or in Xerces-C... need to
> do a bit more investigation before I draw conclusions. I'm streaming
> (as fast as possible) 500-byte (or so) XML strings... within 10
> minutes, X has locked up and the hard drive is thrashing... so this
> is pretty serious. :) Have you run into anything like this? Or, do
> you know if there's any Xerces call I need to make to be sure that
> objects are freed? I don't see any related posts...

Depends on what you are doing - if it is DOM related, then yes, you
must tell the parser to release the memory, otherwise it grows. From
the API docs:

   void AbstractDOMParser::resetDocumentPool()
    	  
  Reset the documents vector pool and release all the associated memory
  back to the system.
  
  When parsing a document using a DOM parser, all memory allocated for a
  DOM tree is associated to the DOM document.
  
  If you do multiple parse using the same DOM parser instance, then
  multiple DOM documents will be generated and saved in a vector
  pool. All these documents (and thus all the allocated memory) won't be
  deleted until the parser instance is destroyed.
  
  If you don't need these DOM documents anymore and don't want to
  destroy the DOM parser instance at this moment, then you can call this
  method to reset the document vector pool and release all the allocated
  memory back to the system.


There are not likely to be Xerces-C leaks, they have been worked on
for a long time, but it is possible there are XML-Xerces leaks - I
haven't really tested this in some time.

Post any trouble to the list,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: XML::Simple in Xerces (was Re: Xerces-perl for Win32)

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
Brian Faull <bf...@mitre.org> writes:

> "Jason E. Stewart" wrote:
> 
> > > Wow, thanks!
> > 
> > Welcome. New users always inspire me to add features to Xerces.
> 
> Glad I could help! :)

BTW, you are not a member of the xerces-p-dev list yet, so I have to
authorize each of your posts.

If you're going to be using xerces on a regular basis (which I hope
you will) I suggest subscribing. It's actually the user list *and* the
dev list all rolled into one - not enough messages to warrant two
seperate lists.

>From the exmlm docs:

  To subscribe to the list, send a message to:
     <xe...@xml.apache.org>
  
  You can start a subscription for an alternate address,
  for example "john@host.domain", just add a hyphen and your
  address (with '=' instead of '@') after the command word:
  <xe...@xml.apache.org>

Cheers (and thanks again),
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: XML::Simple in Xerces (was Re: Xerces-perl for Win32)

Posted by Brian Faull <bf...@mitre.org>.
"Jason E. Stewart" wrote:

> > Wow, thanks!
> 
> Welcome. New users always inspire me to add features to Xerces.

Glad I could help! :)

> looks like you hit the 'send' key before the 'attach' key, we didn't
> get anything.

Drat... I do that far more often than I'd like to admit... I'm typing this
*after* I attached the file this time. Sorry for the spam.

> > p.s. sorry that the subject in this thread has strayed so far from the
> > original 'subject'...
> 
> time for a subject change, I think...

Good subject! :)
-brian



"Jason E. Stewart" wrote:
> 
> Brian Faull <bf...@mitre.org> writes:
> 
> > Wow, thanks!
> 
> Welcome. New users always inspire me to add features to Xerces.
> 
> > Attached, if anyone is interested, is a modification of node2hash that
> > creates a data structure that is identical to that which XML::Simple
> > creates in XMLin(). It's not pretty, but neither is the data structure
> > used by XML::Simple. :)
> 
> looks like you hit the 'send' key before the 'attach' key, we didn't
> get anything.
> 
> > I've written one the other way, too, but it's still flaky; I'll post both
> > again when I've cleaned them up.
> 
> Ok, sounds good.
> 
> > Thanks again,
> > -brian
> 
> you're welcome.
> 
> > p.s. sorry that the subject in this thread has strayed so far from the
> > original 'subject'...
> 
> time for a subject change, I think...

XML::Simple in Xerces (was Re: Xerces-perl for Win32)

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
Brian Faull <bf...@mitre.org> writes:

> Wow, thanks!

Welcome. New users always inspire me to add features to Xerces.

> Attached, if anyone is interested, is a modification of node2hash that
> creates a data structure that is identical to that which XML::Simple
> creates in XMLin(). It's not pretty, but neither is the data structure
> used by XML::Simple. :)

looks like you hit the 'send' key before the 'attach' key, we didn't
get anything.

> I've written one the other way, too, but it's still flaky; I'll post both
> again when I've cleaned them up.

Ok, sounds good. 

> Thanks again,
> -brian

you're welcome.

> p.s. sorry that the subject in this thread has strayed so far from the
> original 'subject'...

time for a subject change, I think...

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: Xerces-perl for Win32

Posted by Brian Faull <bf...@mitre.org>.
Wow, thanks!

Attached, if anyone is interested, is a modification of node2hash that
creates a data structure that is identical to that which XML::Simple
creates in XMLin(). It's not pretty, but neither is the data structure
used by XML::Simple. :)

I've written one the other way, too, but it's still flaky; I'll post both
again when I've cleaned them up.

Thanks again,
-brian

p.s. sorry that the subject in this thread has strayed so far from the
original 'subject'...


"Jason E. Stewart" wrote:
> 
> jason@openinformatics.com (Jason E. Stewart) writes:
> 
> > > I've got the tags that have attributes using getAttributes -- very
> > > easy. However, stepping through an element that has a number of
> > > children, then (recursively) constructing a hash is utterly
> > > painful. Since this is DOM, I presume that this structure already
> > > exists in memory anyway... is there any way to access this with
> > > Perl? DomDocument->to_hash()!?
> >
> > Let me see if I can whip something up.
> 
> Ok, here it is. I actually just used plain old DOM. You should be able
> to modify the node2hash() subroutine to get the output you need.
> 
> Cheers,
> jas.
> --
> 
>   --------------------------------------------------------------------------
>                      Name: DOM2hash.pl
>    DOM2hash.pl       Type: Perl Program (application/x-perl)
>               Description: DOM2hash script


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: Xerces-perl for Win32

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org