You are viewing a plain text version of this content. The canonical link for it is here.
Posted to p-dev@xerces.apache.org by Chris Cheung <ch...@clc.cuhk.edu.hk> on 2003/12/19 13:13:24 UTC

Memory access bug in XMLString2Perl()

Dear all,

  I am currently using Xerces-C 2.3.0 and Xerces-Perl 2.3.0-4.

  When I use valgrind (a popular memory debugger, 
(http://valgrind.kde.org/) to check the following simple program:

------------------------------------------
#!/usr/bin/perl -w

use strict;

use XML::Xerces;

my $xmlString = '<?xml version="1.0"?><A><B>Hello</B></A>';

my $parser = XML::Xerces::XercesDOMParser->new();
$parser->parse(XML::Xerces::MemBufInputSource->new($xmlString));

my $doc = $parser->getDocument();

my $root = $doc->getDocumentElement;
print $root->getAttribute("notExist");
-----------------------------------------

$ valgrind ./parse.pl

valgrind detected invalid memory access like:

==24771== Invalid write of size 1
==24771==    at 0x42F06DEA: XMLString2Perl(unsigned short const*) 
(Xerces.cpp:1004)
==24771==    by 0x4306E53B: _wrap_DOMElement_getAttribute 
(Xerces.cpp:59286)
==24771==    by 0x402ACCD5: Perl_pp_entersub (in 
/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/libperl.so)
==24771==    by 0x402A62E8: Perl_runops_standard (in 
/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/libperl.so)
==24771==    Address 0x418CD8FC is 0 bytes after a block of size 0 alloc'd
==24771==    at 0x40026268: __builtin_vec_new (in 
/usr/lib/valgrind/vgskin_memcheck.so)
==24771==    by 0x400262C0: operator new[](unsigned) (in 
/usr/lib/valgrind/vgskin_memcheck.so)
==24771==    by 0x42F06DAE: XMLString2Perl(unsigned short const*) 
(Xerces.cpp:995)
==24771==    by 0x4306E53B: _wrap_DOMElement_getAttribute 
(Xerces.cpp:59286)

I used -ggdb3 in building Xerces-Perl and hence line number is displayed 
in the error message. It seems that

in line 995 of Xerces.cpp:

SV*
XMLString2Perl(const XMLCh* input) {
    SV *output;
  unsigned int charsEaten = 0;
  int length  = XMLString::stringLen(input);      // string length

  XMLByte* res = new XMLByte[length * UTF8_MAXLEN];          
     // output string

  unsigned int total_chars =
    UTF8_TRANSCODER->transcodeTo((const XMLCh*) input,
                   (unsigned int) length,
                   (XMLByte*) res,
                   (unsigned int) length*UTF8_MAXLEN,
                   charsEaten,
                   XMLTranscoder::UnRep_Throw
                   );
  res[total_chars] = '\0';

The memory to malloc should be (length * UTF8_MAXLEN + 1):

  XMLByte* res = new XMLByte[length * UTF8_MAXLEN + 1];          // output 

so that the memory for the ending '\0' is not missed.

Thank you for your attention.


-- 
Best Regards,

Chris Cheung
Center for Large-Scale Computation

Have a nice day!


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: Memory access bug in XMLString2Perl()

Posted by Chris Cheung <ch...@clc.cuhk.edu.hk>.
Dear Jason,

On 22 Dec 2003, Jason E. Stewart wrote:

> > Will this issue be fixed in the next release?
> 
> Sure, I'll add the fix. 

Thank you very much!

> Did the change affect the torrential memory leaks
> that XML-Xerces has? 

No. I turned on valgrind's --leak-check=yes and it reports no memory leak
due to XMLString2Perl(). In fact, the memory is freed within the function
(line 1015 of Xerces.cpp). This change does not affect the memory leak due
to Xerces's parsing -- the leak exist before and after the change.

> Has valgrind identified any other sources where
> the memorly leaks can be coming from? It is *ultra-critical* to get
> those fixed as soon as possible.

Yes. I used the script "mleak5.pl"

----------------------------------
#!/usr/bin/perl -w

use XML::Xerces;

my $impl = 
XML::Xerces::DOMImplementationRegistry::getDOMImplementation('LS');

my $parser =
$impl->createDOMBuilder($XML::Xerces::DOMImplementationLS::MODE_SYNCHRONOUS,'');

$parser->setFeature("$XML::Xerces::XMLUni::fgDOMNamespaces", 1);
$parser->setFeature("$XML::Xerces::XMLUni::fgXercesSchema", 0);
$parser->setFeature("$XML::Xerces::XMLUni::fgXercesSchemaFullChecking",
        0) ;

$parser->setFeature("$XML::Xerces::XMLUni::fgDOMValidation", 0);

my $doc = $parser->parseURI("example.xml") ;

$parser->resetDocumentPool();
------------------------------------

And use valgrind to check:

$ valgrind --leak-check=yes --num-callers=100 perl ./mleak5.pl

One error message relavant to Xerces-C/Perl's memory leak is:

==28213== 6480 bytes in 236 blocks are possibly lost in loss record 22 of 
30
==28213==    at 0x40026164: __builtin_new (in 
/usr/lib/valgrind/vgskin_memcheck.so)
==28213==    by 0x400261BC: operator new(unsigned) (in 
/usr/lib/valgrind/vgskin_memcheck.so)
==28213==    by 0x43A2930E: 
xercesc_2_3::MemoryManagerImpl::allocate(unsigned) 
(MemoryManagerImpl.cpp:75)
==28213==    by 0x43A9C37F: xercesc_2_3::XMemory::operator new(unsigned) 
(XMemory.cpp:77)
==28213==    by 0x439FC2DF: xercesc_2_3::DTDGrammar::resetEntityDeclPool() 
(DTDGrammar.cpp:217)
==28213==    by 0x439FC2A4: xercesc_2_3::DTDGrammar::reset() 
(DTDGrammar.cpp:201)
==28213==    by 0x439FBF22: 
xercesc_2_3::DTDGrammar::DTDGrammar(xercesc_2_3::MemoryManager*) 
(DTDGrammar.cpp:128)
==28213==    by 0x43A19AEE: 
xercesc_2_3::IGXMLScanner::scanReset(xercesc_2_3::InputSource const&) 
(IGXMLScanner2.cpp:867)
==28213==    by 0x43A1E7DB: 
xercesc_2_3::IGXMLScanner::scanDocument(xercesc_2_3::InputSource const&) 
(IGXMLScanner.cpp:190)
==28213==    by 0x43AAB581: xercesc_2_3::XMLScanner::scanDocument(unsigned 
short const*) (XMLScanner.cpp:419)
==28213==    by 0x43983508: xercesc_2_3::AbstractDOMParser::parse(unsigned 
short const*) (AbstractDOMParser.cpp:457)
==28213==    by 0x439C3F15: xercesc_2_3::DOMBuilderImpl::parseURI(unsigned 
short const*) (DOMBuilderImpl.cpp:447)
==28213==    by 0x43119080: _wrap_DOMBuilder_parseURI__SWIG_0 
(Xerces.cpp:62581)
==28213==    by 0x4311A0CC: _wrap_DOMBuilder_parseURI (Xerces.cpp:62679)
==28213==    by 0x402ACCD5: Perl_pp_entersub (in 
/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/libperl.so)
==28213==    by 0x402A62E8: Perl_runops_standard (in 
/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/libperl.so)
==28213==    by 0x40240D2A: (within 
/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/libperl.so)
==28213==    by 0x40240AC4: perl_run (in 
/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/libperl.so)
==28213==    by 0x8049182: main (in /usr/bin/perl5.8.0)
==28213==    by 0x403C7159: __libc_start_main (in /lib/libc-2.2.5.so)
==28213==    by 0x8049010: (within /usr/bin/perl5.8.0)

Hope this help.


-- 
Best Regards,

Chris Cheung
Center for Large-Scale Computation

Have a nice day!




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: Memory access bug in XMLString2Perl()

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
Chris Cheung <ch...@clc.cuhk.edu.hk> writes:

> On 20 Dec 2003, Jason E. Stewart wrote:
> 
> > > The memory to malloc should be (length * UTF8_MAXLEN + 1):
> > > 
> > >   XMLByte* res = new XMLByte[length * UTF8_MAXLEN + 1];          // output 
> > > 
> > > so that the memory for the ending '\0' is not missed.
> > 
> > Does changing this make a difference? I would be surprised - the call
> > to transcode to gives the *maximum possible length* for the buffer,
> > and returns the *actual* length used. Does changing this make valgrind
> > happy? I've got no objection to adding this to the code, I'm just
> > curious. 
> 
> Yes. After I changed the length, recompile and perform the same test, the
> error message from valgrind disappear. 

Ok, thanks Chris. 

> Will this issue be fixed in the next release?

Sure, I'll add the fix. Did the change affect the torrential memory leaks
that XML-Xerces has? Has valgrind identified any other sources where
the memorly leaks can be coming from? It is *ultra-critical* to get
those fixed as soon as possible.

Cheers,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: Memory access bug in XMLString2Perl()

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
Chris Cheung <ch...@clc.cuhk.edu.hk> writes:

> On 20 Dec 2003, Jason E. Stewart wrote:
> 
> > > The memory to malloc should be (length * UTF8_MAXLEN + 1):
> > > 
> > >   XMLByte* res = new XMLByte[length * UTF8_MAXLEN + 1];          // output 
> > > 
> > > so that the memory for the ending '\0' is not missed.
> > 
> > Does changing this make a difference? I would be surprised - the call
> > to transcode to gives the *maximum possible length* for the buffer,
> > and returns the *actual* length used. Does changing this make valgrind
> > happy? I've got no objection to adding this to the code, I'm just
> > curious. 
> 
> Yes. After I changed the length, recompile and perform the same test, the
> error message from valgrind disappear. Will this issue be fixed in the 
> next release?

Chris,

I forgot to add, could you please add a bug report for this one? That
way it won't get lost in the shuffle. You need for a seperate patch
file or anything, you past your original email into the report:

  http://nagoya.apache.org/bugzilla/enter_bug.cgi?product=Xerces-P

Cheers,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: Memory access bug in XMLString2Perl()

Posted by Chris Cheung <ch...@clc.cuhk.edu.hk>.
On 20 Dec 2003, Jason E. Stewart wrote:

> > The memory to malloc should be (length * UTF8_MAXLEN + 1):
> > 
> >   XMLByte* res = new XMLByte[length * UTF8_MAXLEN + 1];          // output 
> > 
> > so that the memory for the ending '\0' is not missed.
> 
> Does changing this make a difference? I would be surprised - the call
> to transcode to gives the *maximum possible length* for the buffer,
> and returns the *actual* length used. Does changing this make valgrind
> happy? I've got no objection to adding this to the code, I'm just
> curious. 

Yes. After I changed the length, recompile and perform the same test, the
error message from valgrind disappear. Will this issue be fixed in the 
next release?

> 
> Cheers,
> jas.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-p-dev-help@xml.apache.org
> 

-- 
Best Regards,

Chris Cheung
Center for Large-Scale Computation

Have a nice day!


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: Memory access bug in XMLString2Perl()

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
Hi Chris,

Chris Cheung <ch...@clc.cuhk.edu.hk> writes:

>   I am currently using Xerces-C 2.3.0 and Xerces-Perl 2.3.0-4.
> 
>   When I use valgrind (a popular memory debugger, 
> (http://valgrind.kde.org/) to check the following simple program:

Hah! You beat me to it. I found valgrind the other week, but
unfortunately it is an i386 specific tool (I do all my development on
a powermac).

> SV*
> XMLString2Perl(const XMLCh* input) {
>     SV *output;
>   unsigned int charsEaten = 0;
>   int length  = XMLString::stringLen(input);      // string length
> 
>   XMLByte* res = new XMLByte[length * UTF8_MAXLEN];          
>      // output string
> 
>   unsigned int total_chars =
>     UTF8_TRANSCODER->transcodeTo((const XMLCh*) input,
>                    (unsigned int) length,
>                    (XMLByte*) res,
>                    (unsigned int) length*UTF8_MAXLEN,
>                    charsEaten,
>                    XMLTranscoder::UnRep_Throw
>                    );
>   res[total_chars] = '\0';
> 
> The memory to malloc should be (length * UTF8_MAXLEN + 1):
> 
>   XMLByte* res = new XMLByte[length * UTF8_MAXLEN + 1];          // output 
> 
> so that the memory for the ending '\0' is not missed.

Does changing this make a difference? I would be surprised - the call
to transcode to gives the *maximum possible length* for the buffer,
and returns the *actual* length used. Does changing this make valgrind
happy? I've got no objection to adding this to the code, I'm just
curious. 

Cheers,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org