You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by Davide Capodaglio <da...@axelsw.it> on 2022/07/20 13:44:24 UTC

Xerces-C++ and XQilla bad performance

Hi all,

I am converting a big project that heavily uses XML from Windows platform (using MSXML) to Linux, and I chose Xerces-C++ and XQilla (I need full XPath 1.0 support) as the underlying XML engine.
However, the performance of them is not satisfying.
I also modified the same application to use the Xerces-C++ instead of MSXML on Windows just for comparison purpouses.

These are some benchmarks I did:

Loading of a quite big XML file (10 mb) (using DOMCount utility, so no XQilla involved):
                - MSXML: 210ms
                - Xerces on Windows: 470ms
                - Xerces on Linux: 610ms
So Xerces on Windows seems 2x slower than MSXML, and on Linux 3x slower...


But the worst performances I get is when using XPath queries, so when XQilla is involved:

Execution of 100 selectNodes XPath queries (my own benchmark):
                - MSXML: 5ms
                - Xerces on Windows: 78ms
                - Xerces on Linux: 76ms
So here the slowdown is terrifying.

I built Xerces and XQilla on Windows using VS2017 in Release configuration, and on Linux using
./configure --host=i686-linux-gnu --enable-transcoder-gnuiconv --disable-network CFLAGS="-m32 -O3 -DNDEBUG" CXXFLAGS="-m32 -O3 -DNDEBUG" LDFLAGS=-m32
(I need to compile for 32bit architecture)


So my question is:
Is there any known way to optimize the performance of Xerces-C++ and XQilla in both loading and XPath execution?
Maybe some compilation flag or "tuning" of the two engines using some API ? (I just need XPath 1.0)

Any advice or hint from previous experiences would be appreciated...
I know this mailing list has nothing to do with XQilla, but anyway that project seems dead, and maybe someone here has some experience with it.

Maybe someone had more luck with Xalan?
Otherwise I will have to think about totally changing XML engine...

Thanks
Davide


Re: Xerces-C++ and XQilla bad performance

Posted by Phil Baer <ph...@gmail.com>.
Is there some way I can get off this mailing list, which has nothing at all
to do with me?

On Thu, Jul 21, 2022 at 4:34 AM Davide Capodaglio <
davide.capodaglio@axelsw.it> wrote:

> Hi Roger,
> I already tried to compile xerces with ICU but it makes no difference at
> all.
> However on Windows platform also MSXML internally converts all to UTF-16
> (because everything on Windows nowadays uses Unicode with wchar_t=UTF-16),
> and Xerces uses internal Windows API, but I see more than a 2x slowdown
> then MSXML for a just a load of a document.
>
> But, as I said my big performance problem is with XQilla queries.
>
> Also in my benchmarks of the selectNodes() I am already specifying u16
> literals.
>
> I will try to use valgrind, but I have very few hopes at the moment...
>
> Thanks
> Davide
>
>
> > -----Messaggio originale-----
> > Da: Roger Leigh <rl...@codelibre.net>
> > Inviato: giovedì 21 luglio 2022 01:49
> > A: c-users@xerces.apache.org
> > Oggetto: RE: Xerces-C++ and XQilla bad performance
> >
> > Hi Davide,
> >
> > I would suggest running your program under valgrind with kcallgrind and
> > profiling the execution.  When you view the results, you should see where
> > the hotspots are.
> >
> > Based upon my previous profiling experiments, I suspect you'll find that
> over
> > 50% of the runtime is spent in the transcoder doing UTF-8 to UTF-16
> > conversions (and vice-versa).
> >
> > If this is the case, I would suggest trying a different transcoder and
> > reprofiling.  This may be where the Windows performance difference lies.
> > An additional optimisation would be to reduce the number of conversions.
> > For example, if you are repeatedly transcoding a UTF-8 string for use
> with a
> > function requiring UTF-16, you can cache the conversion.  Or use char16_t
> > with u16 string literals.  At the expense of portability with older
> compilers.
> >
> > Kind regards,
> > Roger
> >
> > > -----Original Message-----
> > > From: Davide Capodaglio <da...@axelsw.it>
> > > Sent: 20 July 2022 14:44
> > > To: c-users@xerces.apache.org
> > > Subject: Xerces-C++ and XQilla bad performance
> > >
> > > Is there any known way to optimize the performance of Xerces-C++ and
> > > XQilla in both loading and XPath execution?
>
>

R: Xerces-C++ and XQilla bad performance

Posted by Davide Capodaglio <da...@axelsw.it>.
Hello Vitaly,
I tried with Xerces 3.1.4, and I get and big improvement in document loading, as described in https://issues.apache.org/jira/browse/XERCESC-2211, something between 3x and 4x.
However I can not get XQilla to build with it.

But anyway: since this is a new important project, it does NOT seem to me a good idea to choose a 6 years old dead branch, and to stick to it for years.
I am very surprised that authors do not take in consideration to rewrite the code to optimize it, as there is a tremendous slow down between 3.1.x and 3.2.x.



> -----Messaggio originale-----
> Da: Vitaly Prapirny <vi...@gmail.com>
> Inviato: giovedì 21 luglio 2022 11:32
> A: c-users@xerces.apache.org
> Oggetto: Re: Xerces-C++ and XQilla bad performance
> 
> So you can try version 3.1.4 and see if it makes things better in your case.
> 
> Good luck!
>    Vitaly
> 
> 
> On Thu, Jul 21, 2022 at 12:14 PM Davide Capodaglio <
> davide.capodaglio@axelsw.it> wrote:
> 
> > I am using latest versions, Xerces-C++ 3.2.3 and XQilla 2.3.4.
> > So probably I am seeing the dynamic_cast impact (but I can not do
> > anything obviously).
> >
> > I am using no schema, no validation, no namespaces.
> > Thanks
> > Davide
> >
> >
> > > -----Messaggio originale-----
> > > Da: Vitaly Prapirny <vi...@gmail.com>
> > > Inviato: giovedì 21 luglio 2022 10:46
> > > A: c-users@xerces.apache.org
> > > Oggetto: Re: Xerces-C++ and XQilla bad performance
> > >
> > > Hi Davide,
> > >
> > > Other factors affecting xerces performance are xerces version (
> > > https://issues.apache.org/jira/browse/XERCESC-2211) and usage of
> > > validation (https://xerces.apache.org/xerces-c/schema-3.html).
> > >
> > > Good luck!
> > >    Vitaly
> > >
> > >
> > > On Thu, Jul 21, 2022 at 11:34 AM Davide Capodaglio <
> > > davide.capodaglio@axelsw.it> wrote:
> > >
> > > > Hi Roger,
> > > > I already tried to compile xerces with ICU but it makes no
> > > > difference at all.
> > > > However on Windows platform also MSXML internally converts all to
> > > > UTF-16 (because everything on Windows nowadays uses Unicode with
> > > > wchar_t=UTF-16), and Xerces uses internal Windows API, but I see
> > > > more than a 2x slowdown then MSXML for a just a load of a document.
> > > >
> > > > But, as I said my big performance problem is with XQilla queries.
> > > >
> > > > Also in my benchmarks of the selectNodes() I am already specifying
> > > > u16 literals.
> > > >
> > > > I will try to use valgrind, but I have very few hopes at the moment...
> > > >
> > > > Thanks
> > > > Davide
> > > >
> > > >
> > > > > -----Messaggio originale-----
> > > > > Da: Roger Leigh <rl...@codelibre.net>
> > > > > Inviato: giovedì 21 luglio 2022 01:49
> > > > > A: c-users@xerces.apache.org
> > > > > Oggetto: RE: Xerces-C++ and XQilla bad performance
> > > > >
> > > > > Hi Davide,
> > > > >
> > > > > I would suggest running your program under valgrind with
> > > > > kcallgrind and profiling the execution.  When you view the
> > > > > results, you should see where the hotspots are.
> > > > >
> > > > > Based upon my previous profiling experiments, I suspect you'll
> > > > > find that
> > > > over
> > > > > 50% of the runtime is spent in the transcoder doing UTF-8 to
> > > > > UTF-16 conversions (and vice-versa).
> > > > >
> > > > > If this is the case, I would suggest trying a different
> > > > > transcoder and reprofiling.  This may be where the Windows
> > > > > performance
> > difference
> > > lies.
> > > > > An additional optimisation would be to reduce the number of
> > > conversions.
> > > > > For example, if you are repeatedly transcoding a UTF-8 string
> > > > > for use
> > > > with a
> > > > > function requiring UTF-16, you can cache the conversion.  Or use
> > > > > char16_t with u16 string literals.  At the expense of
> > > > > portability with older
> > > > compilers.
> > > > >
> > > > > Kind regards,
> > > > > Roger
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Davide Capodaglio <da...@axelsw.it>
> > > > > > Sent: 20 July 2022 14:44
> > > > > > To: c-users@xerces.apache.org
> > > > > > Subject: Xerces-C++ and XQilla bad performance
> > > > > >
> > > > > > Is there any known way to optimize the performance of
> > > > > > Xerces-C++ and XQilla in both loading and XPath execution?
> > > >
> > > >
> >

Re: Xerces-C++ and XQilla bad performance

Posted by Vitaly Prapirny <vi...@gmail.com>.
So you can try version 3.1.4 and see if it makes things better in your case.

Good luck!
   Vitaly


On Thu, Jul 21, 2022 at 12:14 PM Davide Capodaglio <
davide.capodaglio@axelsw.it> wrote:

> I am using latest versions, Xerces-C++ 3.2.3 and XQilla 2.3.4.
> So probably I am seeing the dynamic_cast impact (but I can not do anything
> obviously).
>
> I am using no schema, no validation, no namespaces.
> Thanks
> Davide
>
>
> > -----Messaggio originale-----
> > Da: Vitaly Prapirny <vi...@gmail.com>
> > Inviato: giovedì 21 luglio 2022 10:46
> > A: c-users@xerces.apache.org
> > Oggetto: Re: Xerces-C++ and XQilla bad performance
> >
> > Hi Davide,
> >
> > Other factors affecting xerces performance are xerces version (
> > https://issues.apache.org/jira/browse/XERCESC-2211) and usage of
> > validation (https://xerces.apache.org/xerces-c/schema-3.html).
> >
> > Good luck!
> >    Vitaly
> >
> >
> > On Thu, Jul 21, 2022 at 11:34 AM Davide Capodaglio <
> > davide.capodaglio@axelsw.it> wrote:
> >
> > > Hi Roger,
> > > I already tried to compile xerces with ICU but it makes no difference
> > > at all.
> > > However on Windows platform also MSXML internally converts all to
> > > UTF-16 (because everything on Windows nowadays uses Unicode with
> > > wchar_t=UTF-16), and Xerces uses internal Windows API, but I see more
> > > than a 2x slowdown then MSXML for a just a load of a document.
> > >
> > > But, as I said my big performance problem is with XQilla queries.
> > >
> > > Also in my benchmarks of the selectNodes() I am already specifying u16
> > > literals.
> > >
> > > I will try to use valgrind, but I have very few hopes at the moment...
> > >
> > > Thanks
> > > Davide
> > >
> > >
> > > > -----Messaggio originale-----
> > > > Da: Roger Leigh <rl...@codelibre.net>
> > > > Inviato: giovedì 21 luglio 2022 01:49
> > > > A: c-users@xerces.apache.org
> > > > Oggetto: RE: Xerces-C++ and XQilla bad performance
> > > >
> > > > Hi Davide,
> > > >
> > > > I would suggest running your program under valgrind with kcallgrind
> > > > and profiling the execution.  When you view the results, you should
> > > > see where the hotspots are.
> > > >
> > > > Based upon my previous profiling experiments, I suspect you'll find
> > > > that
> > > over
> > > > 50% of the runtime is spent in the transcoder doing UTF-8 to UTF-16
> > > > conversions (and vice-versa).
> > > >
> > > > If this is the case, I would suggest trying a different transcoder
> > > > and reprofiling.  This may be where the Windows performance
> difference
> > lies.
> > > > An additional optimisation would be to reduce the number of
> > conversions.
> > > > For example, if you are repeatedly transcoding a UTF-8 string for
> > > > use
> > > with a
> > > > function requiring UTF-16, you can cache the conversion.  Or use
> > > > char16_t with u16 string literals.  At the expense of portability
> > > > with older
> > > compilers.
> > > >
> > > > Kind regards,
> > > > Roger
> > > >
> > > > > -----Original Message-----
> > > > > From: Davide Capodaglio <da...@axelsw.it>
> > > > > Sent: 20 July 2022 14:44
> > > > > To: c-users@xerces.apache.org
> > > > > Subject: Xerces-C++ and XQilla bad performance
> > > > >
> > > > > Is there any known way to optimize the performance of Xerces-C++
> > > > > and XQilla in both loading and XPath execution?
> > >
> > >
>

R: Xerces-C++ and XQilla bad performance

Posted by Davide Capodaglio <da...@axelsw.it>.
I am using latest versions, Xerces-C++ 3.2.3 and XQilla 2.3.4.
So probably I am seeing the dynamic_cast impact (but I can not do anything obviously).

I am using no schema, no validation, no namespaces.
Thanks
Davide


> -----Messaggio originale-----
> Da: Vitaly Prapirny <vi...@gmail.com>
> Inviato: giovedì 21 luglio 2022 10:46
> A: c-users@xerces.apache.org
> Oggetto: Re: Xerces-C++ and XQilla bad performance
> 
> Hi Davide,
> 
> Other factors affecting xerces performance are xerces version (
> https://issues.apache.org/jira/browse/XERCESC-2211) and usage of
> validation (https://xerces.apache.org/xerces-c/schema-3.html).
> 
> Good luck!
>    Vitaly
> 
> 
> On Thu, Jul 21, 2022 at 11:34 AM Davide Capodaglio <
> davide.capodaglio@axelsw.it> wrote:
> 
> > Hi Roger,
> > I already tried to compile xerces with ICU but it makes no difference
> > at all.
> > However on Windows platform also MSXML internally converts all to
> > UTF-16 (because everything on Windows nowadays uses Unicode with
> > wchar_t=UTF-16), and Xerces uses internal Windows API, but I see more
> > than a 2x slowdown then MSXML for a just a load of a document.
> >
> > But, as I said my big performance problem is with XQilla queries.
> >
> > Also in my benchmarks of the selectNodes() I am already specifying u16
> > literals.
> >
> > I will try to use valgrind, but I have very few hopes at the moment...
> >
> > Thanks
> > Davide
> >
> >
> > > -----Messaggio originale-----
> > > Da: Roger Leigh <rl...@codelibre.net>
> > > Inviato: giovedì 21 luglio 2022 01:49
> > > A: c-users@xerces.apache.org
> > > Oggetto: RE: Xerces-C++ and XQilla bad performance
> > >
> > > Hi Davide,
> > >
> > > I would suggest running your program under valgrind with kcallgrind
> > > and profiling the execution.  When you view the results, you should
> > > see where the hotspots are.
> > >
> > > Based upon my previous profiling experiments, I suspect you'll find
> > > that
> > over
> > > 50% of the runtime is spent in the transcoder doing UTF-8 to UTF-16
> > > conversions (and vice-versa).
> > >
> > > If this is the case, I would suggest trying a different transcoder
> > > and reprofiling.  This may be where the Windows performance difference
> lies.
> > > An additional optimisation would be to reduce the number of
> conversions.
> > > For example, if you are repeatedly transcoding a UTF-8 string for
> > > use
> > with a
> > > function requiring UTF-16, you can cache the conversion.  Or use
> > > char16_t with u16 string literals.  At the expense of portability
> > > with older
> > compilers.
> > >
> > > Kind regards,
> > > Roger
> > >
> > > > -----Original Message-----
> > > > From: Davide Capodaglio <da...@axelsw.it>
> > > > Sent: 20 July 2022 14:44
> > > > To: c-users@xerces.apache.org
> > > > Subject: Xerces-C++ and XQilla bad performance
> > > >
> > > > Is there any known way to optimize the performance of Xerces-C++
> > > > and XQilla in both loading and XPath execution?
> >
> >

Re: Xerces-C++ and XQilla bad performance

Posted by Vitaly Prapirny <vi...@gmail.com>.
Hi Davide,

Other factors affecting xerces performance are xerces version (
https://issues.apache.org/jira/browse/XERCESC-2211) and usage of validation
(https://xerces.apache.org/xerces-c/schema-3.html).

Good luck!
   Vitaly


On Thu, Jul 21, 2022 at 11:34 AM Davide Capodaglio <
davide.capodaglio@axelsw.it> wrote:

> Hi Roger,
> I already tried to compile xerces with ICU but it makes no difference at
> all.
> However on Windows platform also MSXML internally converts all to UTF-16
> (because everything on Windows nowadays uses Unicode with wchar_t=UTF-16),
> and Xerces uses internal Windows API, but I see more than a 2x slowdown
> then MSXML for a just a load of a document.
>
> But, as I said my big performance problem is with XQilla queries.
>
> Also in my benchmarks of the selectNodes() I am already specifying u16
> literals.
>
> I will try to use valgrind, but I have very few hopes at the moment...
>
> Thanks
> Davide
>
>
> > -----Messaggio originale-----
> > Da: Roger Leigh <rl...@codelibre.net>
> > Inviato: giovedì 21 luglio 2022 01:49
> > A: c-users@xerces.apache.org
> > Oggetto: RE: Xerces-C++ and XQilla bad performance
> >
> > Hi Davide,
> >
> > I would suggest running your program under valgrind with kcallgrind and
> > profiling the execution.  When you view the results, you should see where
> > the hotspots are.
> >
> > Based upon my previous profiling experiments, I suspect you'll find that
> over
> > 50% of the runtime is spent in the transcoder doing UTF-8 to UTF-16
> > conversions (and vice-versa).
> >
> > If this is the case, I would suggest trying a different transcoder and
> > reprofiling.  This may be where the Windows performance difference lies.
> > An additional optimisation would be to reduce the number of conversions.
> > For example, if you are repeatedly transcoding a UTF-8 string for use
> with a
> > function requiring UTF-16, you can cache the conversion.  Or use char16_t
> > with u16 string literals.  At the expense of portability with older
> compilers.
> >
> > Kind regards,
> > Roger
> >
> > > -----Original Message-----
> > > From: Davide Capodaglio <da...@axelsw.it>
> > > Sent: 20 July 2022 14:44
> > > To: c-users@xerces.apache.org
> > > Subject: Xerces-C++ and XQilla bad performance
> > >
> > > Is there any known way to optimize the performance of Xerces-C++ and
> > > XQilla in both loading and XPath execution?
>
>

R: Xerces-C++ and XQilla bad performance

Posted by Davide Capodaglio <da...@axelsw.it>.
Hi Roger,
I already tried to compile xerces with ICU but it makes no difference at all.
However on Windows platform also MSXML internally converts all to UTF-16 (because everything on Windows nowadays uses Unicode with wchar_t=UTF-16), and Xerces uses internal Windows API, but I see more than a 2x slowdown then MSXML for a just a load of a document.

But, as I said my big performance problem is with XQilla queries.

Also in my benchmarks of the selectNodes() I am already specifying u16 literals.

I will try to use valgrind, but I have very few hopes at the moment...

Thanks
Davide


> -----Messaggio originale-----
> Da: Roger Leigh <rl...@codelibre.net>
> Inviato: giovedì 21 luglio 2022 01:49
> A: c-users@xerces.apache.org
> Oggetto: RE: Xerces-C++ and XQilla bad performance
> 
> Hi Davide,
> 
> I would suggest running your program under valgrind with kcallgrind and
> profiling the execution.  When you view the results, you should see where
> the hotspots are.
> 
> Based upon my previous profiling experiments, I suspect you'll find that over
> 50% of the runtime is spent in the transcoder doing UTF-8 to UTF-16
> conversions (and vice-versa).
> 
> If this is the case, I would suggest trying a different transcoder and
> reprofiling.  This may be where the Windows performance difference lies.
> An additional optimisation would be to reduce the number of conversions.
> For example, if you are repeatedly transcoding a UTF-8 string for use with a
> function requiring UTF-16, you can cache the conversion.  Or use char16_t
> with u16 string literals.  At the expense of portability with older compilers.
> 
> Kind regards,
> Roger
> 
> > -----Original Message-----
> > From: Davide Capodaglio <da...@axelsw.it>
> > Sent: 20 July 2022 14:44
> > To: c-users@xerces.apache.org
> > Subject: Xerces-C++ and XQilla bad performance
> >
> > Is there any known way to optimize the performance of Xerces-C++ and
> > XQilla in both loading and XPath execution?


RE: Xerces-C++ and XQilla bad performance

Posted by Roger Leigh <rl...@codelibre.net>.
Hi Davide,

I would suggest running your program under valgrind with kcallgrind and profiling the execution.  When you view the results, you should see where the hotspots are.

Based upon my previous profiling experiments, I suspect you'll find that over 50% of the runtime is spent in the transcoder doing UTF-8 to UTF-16 conversions (and vice-versa).

If this is the case, I would suggest trying a different transcoder and reprofiling.  This may be where the Windows performance difference lies.  An additional optimisation would be to reduce the number of conversions.  For example, if you are repeatedly transcoding a UTF-8 string for use with a function requiring UTF-16, you can cache the conversion.  Or use char16_t with u16 string literals.  At the expense of portability with older compilers.

Kind regards,
Roger

> -----Original Message-----
> From: Davide Capodaglio <da...@axelsw.it>
> Sent: 20 July 2022 14:44
> To: c-users@xerces.apache.org
> Subject: Xerces-C++ and XQilla bad performance
> 
> Is there any known way to optimize the performance of Xerces-C++ and
> XQilla in both loading and XPath execution?