You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Victor Broto <vb...@gmail.com> on 2005/10/31 18:30:52 UTC

Reducing the size of Xerces, static build

We are releasing a new application making use of Xerces, but still our 
executable size is too big.

We are trying to ship a final version as small as possible (we'd like to 
get a compressed size <5 MB) and, besides other dependencies, we have 
one on xerces-c_2_5_0.dll, that is about 4.5 MB.

Can somebody figure out a simple way to reduce the dll size?

Also we are considering to use a static build, that would remove our 
dependency on the .dll, although increasing the exe size. However, after 
browsing the documentation and the lists, I haven't been able to know 
how to build Xerces statically, so we cannot test whether removing the 
.dll is worthy in that sense.

Is there any way to get a static build of Xerces?

Thanks,

Victor

Elisha Berns wrote:

>Thanks for clearing this up for me, and thanks also for the other item
>you responded to a few weeks ago.
>
>Elisha
>
>
>  
>
>>-----Original Message-----
>>From: Alberto Massari [mailto:amassari@datadirect.com]
>>Sent: Tuesday, November 01, 2005 12:21 AM
>>To: c-dev@xerces.apache.org
>>Subject: RE: Xerces issues handling recursive schema includes
>>
>>Hi Elisha,
>>
>>At 19.39 31/10/2005 -0800, Elisha Berns wrote:
>>    
>>
>>>Neil,
>>>
>>>I made a naïve implementation of an EntityResolver that only uses the
>>>absolute paths of the SystemIds it receives, but this doesn't work
>>>      
>>>
>for
>  
>
>>>the following reasons:
>>>
>>>The main schema file includes ~20 other schema files which are
>>>      
>>>
>located
>  
>
>>>in other directories using relative paths and each one of those 20
>>>      
>>>
>files
>  
>
>>>includes ~20 files (which can be included multiple times) also using
>>>relative paths.  So if I use XMLPlatformUtils::weavePaths() using the
>>>base path from the main schema file being parsed with all of those
>>>relative paths in the other included schema files, the results are
>>>invalid paths.
>>>
>>>The issue is that the EntityResolver needs to know what base path to
>>>      
>>>
>use
>  
>
>>>when it gets a SystemId in order to correctly resolve it to an
>>>      
>>>
>absolute
>  
>
>>>path. And the base path keeps changing as the SAX2XMLReader parses
>>>through the paths it finds in schemaLocation attributes.
>>>
>>>Is there any way to get this information (the correct base path to
>>>      
>>>
>use
>  
>
>>>per relative path) without having to pre-parse all the schema files
>>>      
>>>
>for
>  
>
>>>their schemaLocation attributes?  Surely there must be some simpler
>>>      
>>>
>way
>  
>
>>>to prevent the parser from mistaking two or more relative SystemIds
>>>      
>>>
>as
>  
>
>>>different SystemIds?
>>>      
>>>
>>To overcome this limitation there is a
>>XMLEntityResolver interface that you should
>>register using
>>SAX2XMLReaderImpl::setXMLEntityResolver (you may
>>have to cast your SAX2XMLReader to the
>>implementation class). In your
>>XMLEntityResolver-derived class you should
>>implement resolveEntity(XMLResourceIdentifier*)
>>resolving the entity using the getSystemId() and getBaseURI()
>>    
>>
>accessors.
>  
>
>>Hope this helps,
>>Alberto
>>
>>
>>    
>>
>>>Thanks,
>>>
>>>Elisha
>>>
>>>      
>>>
>>>>Hi Elisha,
>>>>
>>>>Recursive, or circular, includes are supposed to be handled
>>>>        
>>>>
>properly
>  
>
>>>by a
>>>      
>>>
>>>>schema parser.  While I'm not really active anymore on the code
>>>>        
>>>>
>base,
>  
>
>>>this
>>>      
>>>
>>>>question does come up periodically, usually in the context of a
>>>>        
>>>>
>set of
>  
>
>>>>schemas that get loaded purely via schemaLocation hints, or via a
>>>>        
>>>>
>>>user's
>>>      
>>>
>>>>EntityResolver which doesn't set system identifiers on the
>>>>        
>>>>
>>>InputSources it
>>>      
>>>
>>>>returns to the parser.  The usual way to get around this is to
>>>>        
>>>>
>>>register a
>>>      
>>>
>>>>custom EntityResolver instance, and take good care that system
>>>>        
>>>>
>>>identifier
>>>      
>>>
>>>>fields are always set to the same value when an InputSource is
>>>>        
>>>>
>>>returned.
>>>      
>>>
>>>>It's best if this is absolute, but I think a relative URI should
>>>>        
>>>>
>work
>  
>
>>>too.
>>>      
>>>
>>>> The reason this is important is that the parser uses system
>>>>        
>>>>
>>>identifiers
>>>      
>>>
>>>>internally to figure out whether it's processed a schema document
>>>>        
>>>>
>>>before.
>>>      
>>>
>>>>Cheers,
>>>>Neil
>>>>Neil Graham
>>>>Manager, C++ Compiler Front-End and Runtime Development
>>>>IBM Toronto Lab
>>>>Phone:  905-413-3519, T/L 969-3519
>>>>E-mail:  neilg@ca.ibm.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>"Elisha Berns" <e....@computer.org>
>>>>10/30/2005 10:30 PM
>>>>Please respond to
>>>>c-dev
>>>>
>>>>
>>>>To
>>>>"Xerces C++ Development" <c-...@xerces.apache.org>
>>>>cc
>>>>
>>>>Subject
>>>>Xerces issues handling recursive schema includes
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>Hi,
>>>>
>>>>I'm trying to determine both what Xerces does when it encounters
>>>>recursive schema includes and what to do about it because it
>>>>        
>>>>
>causes
>  
>
>>>some
>>>      
>>>
>>>>problems.
>>>>
>>>>It appears that the XercesC schema parser creates multiple XSxxx
>>>>        
>>>>
>type
>  
>
>>>>objects for the same type if the schema files are included
>>>>        
>>>>
>>>recursively.
>>>      
>>>
>>>>In addition it would appear that the load time for a schema is
>>>>        
>>>>
>much,
>  
>
>>>>much slower in the presence of recursive includes.
>>>>
>>>>I get one 'proper' globally defined type object but multiple
>>>>        
>>>>
>>>duplicates
>>>      
>>>
>>>>when the type appears as a contained type (in a complexType
>>>>        
>>>>
>>>definition).
>>>      
>>>
>>>>The only way I know this now is because I get different pointer
>>>>        
>>>>
>values
>  
>
>>>>for the XSxxx object when this situation arises, even though they
>>>>        
>>>>
>end
>  
>
>>>up
>>>      
>>>
>>>>pointing to the same type.
>>>>
>>>>Does anybody know firsthand whether there is any internal
>>>>        
>>>>
>mechanism to
>  
>
>>>>prevent this from happening (apparently not), and what can be
>>>>        
>>>>
>done, at
>  
>
>>>>present, to prevent this duplication from occuring.
>>>>
>>>>It has occurred to me that it might be a good idea to create a new
>>>>        
>>>>
>>>type
>>>      
>>>
>>>>of parser warning specifically regarding the issue of 'recursive
>>>>includes'.  This of course only makes sense if there is a strong
>>>>consensus that this is a classic anti-pattern of XML Schema
>>>>        
>>>>
>>>development
>>>      
>>>
>>>>and should be avoided at all costs.  I can see more or less how to
>>>>implement it outside of Xerces by constructing a dependency graph
>>>>        
>>>>
>of
>  
>
>>>the
>>>      
>>>
>>>>schema files and testing for back-edges.  So my question about
>>>>        
>>>>
>this
>  
>
>>>side
>>>      
>>>
>>>>of things is whether there is any desire to make this test a built
>>>>        
>>>>
>in
>  
>
>>>>part of the parser to make the parser smarter about these things?
>>>>
>>>>Thanks for some feedback here.
>>>>
>>>>Elisha Berns
>>>>e.berns@computer.org
>>>>tel. (310) 556 - 8332
>>>>fax (310) 556 - 2839
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>        
>>>>
>---------------------------------------------------------------------
>  
>
>>>>To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>>>>For additional commands, e-mail: c-dev-help@xerces.apache.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>        
>>>>
>---------------------------------------------------------------------
>  
>
>>>>To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>>>>For additional commands, e-mail: c-dev-help@xerces.apache.org
>>>>        
>>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>>>For additional commands, e-mail: c-dev-help@xerces.apache.org
>>>      
>>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>>For additional commands, e-mail: c-dev-help@xerces.apache.org
>>    
>>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>For additional commands, e-mail: c-dev-help@xerces.apache.org
>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Reducing the size of Xerces, static build

Posted by Alberto Massari <am...@datadirect.com>.
Hi Victor,
if you are using Visual C++ .NET 2003, the latest 
Xerces 2.7 distribution has a 'Static build' 
configuration; if you are using MinGW, you should 
be able to get a static library simply by invoking gcc over the *.o files.

Alberto

At 11.30 31/10/2005 -0600, Victor Broto wrote:
>We are releasing a new application making use of 
>Xerces, but still our executable size is too big.
>
>We are trying to ship a final version as small 
>as possible (we'd like to get a compressed size 
><5 MB) and, besides other dependencies, we have 
>one on xerces-c_2_5_0.dll, that is about 4.5 MB.
>
>Can somebody figure out a simple way to reduce the dll size?
>
>Also we are considering to use a static build, 
>that would remove our dependency on the .dll, 
>although increasing the exe size. However, after 
>browsing the documentation and the lists, I 
>haven't been able to know how to build Xerces 
>statically, so we cannot test whether removing 
>the .dll is worthy in that sense.
>
>Is there any way to get a static build of Xerces?
>
>Thanks,
>
>Victor
>
>Elisha Berns wrote:
>
>>Thanks for clearing this up for me, and thanks also for the other item
>>you responded to a few weeks ago.
>>
>>Elisha
>>
>>
>>
>>
>>>-----Original Message-----
>>>From: Alberto Massari [mailto:amassari@datadirect.com]
>>>Sent: Tuesday, November 01, 2005 12:21 AM
>>>To: c-dev@xerces.apache.org
>>>Subject: RE: Xerces issues handling recursive schema includes
>>>
>>>Hi Elisha,
>>>
>>>At 19.39 31/10/2005 -0800, Elisha Berns wrote:
>>>
>>>
>>>>Neil,
>>>>
>>>>I made a naïve implementation of an EntityResolver that only uses the
>>>>absolute paths of the SystemIds it receives, but this doesn't work
>>>>
>>for
>>
>>
>>>>the following reasons:
>>>>
>>>>The main schema file includes ~20 other schema files which are
>>>>
>>located
>>
>>
>>>>in other directories using relative paths and each one of those 20
>>>>
>>files
>>
>>
>>>>includes ~20 files (which can be included multiple times) also using
>>>>relative paths.  So if I use XMLPlatformUtils::weavePaths() using the
>>>>base path from the main schema file being parsed with all of those
>>>>relative paths in the other included schema files, the results are
>>>>invalid paths.
>>>>
>>>>The issue is that the EntityResolver needs to know what base path to
>>>>
>>use
>>
>>
>>>>when it gets a SystemId in order to correctly resolve it to an
>>>>
>>absolute
>>
>>
>>>>path. And the base path keeps changing as the SAX2XMLReader parses
>>>>through the paths it finds in schemaLocation attributes.
>>>>
>>>>Is there any way to get this information (the correct base path to
>>>>
>>use
>>
>>
>>>>per relative path) without having to pre-parse all the schema files
>>>>
>>for
>>
>>
>>>>their schemaLocation attributes?  Surely there must be some simpler
>>>>
>>way
>>
>>
>>>>to prevent the parser from mistaking two or more relative SystemIds
>>>>
>>as
>>
>>
>>>>different SystemIds?
>>>>
>>>To overcome this limitation there is a
>>>XMLEntityResolver interface that you should
>>>register using
>>>SAX2XMLReaderImpl::setXMLEntityResolver (you may
>>>have to cast your SAX2XMLReader to the
>>>implementation class). In your
>>>XMLEntityResolver-derived class you should
>>>implement resolveEntity(XMLResourceIdentifier*)
>>>resolving the entity using the getSystemId() and getBaseURI()
>>>
>>accessors.
>>
>>
>>>Hope this helps,
>>>Alberto
>>>
>>>
>>>
>>>
>>>>Thanks,
>>>>
>>>>Elisha
>>>>
>>>>
>>>>
>>>>>Hi Elisha,
>>>>>
>>>>>Recursive, or circular, includes are supposed to be handled
>>>>>
>>properly
>>
>>
>>>>by a
>>>>
>>>>
>>>>>schema parser.  While I'm not really active anymore on the code
>>>>>
>>base,
>>
>>
>>>>this
>>>>
>>>>
>>>>>question does come up periodically, usually in the context of a
>>>>>
>>set of
>>
>>
>>>>>schemas that get loaded purely via schemaLocation hints, or via a
>>>>>
>>>>user's
>>>>
>>>>
>>>>>EntityResolver which doesn't set system identifiers on the
>>>>>
>>>>InputSources it
>>>>
>>>>
>>>>>returns to the parser.  The usual way to get around this is to
>>>>>
>>>>register a
>>>>
>>>>
>>>>>custom EntityResolver instance, and take good care that system
>>>>>
>>>>identifier
>>>>
>>>>
>>>>>fields are always set to the same value when an InputSource is
>>>>>
>>>>returned.
>>>>
>>>>
>>>>>It's best if this is absolute, but I think a relative URI should
>>>>>
>>work
>>
>>
>>>>too.
>>>>
>>>>
>>>>>The reason this is important is that the parser uses system
>>>>>
>>>>identifiers
>>>>
>>>>
>>>>>internally to figure out whether it's processed a schema document
>>>>>
>>>>before.
>>>>
>>>>
>>>>>Cheers,
>>>>>Neil
>>>>>Neil Graham
>>>>>Manager, C++ Compiler Front-End and Runtime Development
>>>>>IBM Toronto Lab
>>>>>Phone:  905-413-3519, T/L 969-3519
>>>>>E-mail:  neilg@ca.ibm.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>"Elisha Berns" <e....@computer.org>
>>>>>10/30/2005 10:30 PM
>>>>>Please respond to
>>>>>c-dev
>>>>>
>>>>>
>>>>>To
>>>>>"Xerces C++ Development" <c-...@xerces.apache.org>
>>>>>cc
>>>>>
>>>>>Subject
>>>>>Xerces issues handling recursive schema includes
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>Hi,
>>>>>
>>>>>I'm trying to determine both what Xerces does when it encounters
>>>>>recursive schema includes and what to do about it because it
>>>>>
>>causes
>>
>>
>>>>some
>>>>
>>>>
>>>>>problems.
>>>>>
>>>>>It appears that the XercesC schema parser creates multiple XSxxx
>>>>>
>>type
>>
>>
>>>>>objects for the same type if the schema files are included
>>>>>
>>>>recursively.
>>>>
>>>>
>>>>>In addition it would appear that the load time for a schema is
>>>>>
>>much,
>>
>>
>>>>>much slower in the presence of recursive includes.
>>>>>
>>>>>I get one 'proper' globally defined type object but multiple
>>>>>
>>>>duplicates
>>>>
>>>>
>>>>>when the type appears as a contained type (in a complexType
>>>>>
>>>>definition).
>>>>
>>>>
>>>>>The only way I know this now is because I get different pointer
>>>>>
>>values
>>
>>
>>>>>for the XSxxx object when this situation arises, even though they
>>>>>
>>end
>>
>>
>>>>up
>>>>
>>>>
>>>>>pointing to the same type.
>>>>>
>>>>>Does anybody know firsthand whether there is any internal
>>>>>
>>mechanism to
>>
>>
>>>>>prevent this from happening (apparently not), and what can be
>>>>>
>>done, at
>>
>>
>>>>>present, to prevent this duplication from occuring.
>>>>>
>>>>>It has occurred to me that it might be a good idea to create a new
>>>>>
>>>>type
>>>>
>>>>
>>>>>of parser warning specifically regarding the issue of 'recursive
>>>>>includes'.  This of course only makes sense if there is a strong
>>>>>consensus that this is a classic anti-pattern of XML Schema
>>>>>
>>>>development
>>>>
>>>>
>>>>>and should be avoided at all costs.  I can see more or less how to
>>>>>implement it outside of Xerces by constructing a dependency graph
>>>>>
>>of
>>
>>
>>>>the
>>>>
>>>>
>>>>>schema files and testing for back-edges.  So my question about
>>>>>
>>this
>>
>>
>>>>side
>>>>
>>>>
>>>>>of things is whether there is any desire to make this test a built
>>>>>
>>in
>>
>>
>>>>>part of the parser to make the parser smarter about these things?
>>>>>
>>>>>Thanks for some feedback here.
>>>>>
>>>>>Elisha Berns
>>>>>e.berns@computer.org
>>>>>tel. (310) 556 - 8332
>>>>>fax (310) 556 - 2839
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>---------------------------------------------------------------------
>>
>>
>>>>>To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>>>>>For additional commands, e-mail: c-dev-help@xerces.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>---------------------------------------------------------------------
>>
>>
>>>>>To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>>>>>For additional commands, e-mail: c-dev-help@xerces.apache.org
>>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>>>>For additional commands, e-mail: c-dev-help@xerces.apache.org
>>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>>>For additional commands, e-mail: c-dev-help@xerces.apache.org
>>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>>For additional commands, e-mail: c-dev-help@xerces.apache.org
>>
>>
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>For additional commands, e-mail: c-dev-help@xerces.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org