You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by matjaz ostroversnik <ma...@ostri.org> on 2006/03/18 00:32:48 UTC

newbie questions

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all ! :-)

I am looking for a library to support C++ XML file parsing and
generation. I've gone through xerces on-line manuals, but I still have
some questions open. What is my problem?

We have files with the following structure:

<?xml version="1.0" encoding="utf-8" ?>
<file>
  <head>
    several subtags
  </head>
  <body>
    <msg_0_>
      several subtags
     </msg_0_>
    <msg_1_>
      several subtags
    </msg_1_>
  </body>
  <tail>
    several
  </tail>
</file>

Therefore we have files that decompose into:
- - head
- - body
- - tail
Body is further decomposed into set of messages. Approx number of
messages is 1M per file. Contents of the head, tail, and msg is
further divided into tags, but they can vary from file to file. Some
of the tags are vital for the application and must be present, others
are optional. Optional tags must be preserved on import-export
operation. Messages are independent of each other and can be processed
one by one.

Main problem lies in fact that there can be 1M messages within  the
file. Approx file size is 1.2 GB. Full DOM is therefore out of
question. On the other hand, since we do no know all tags in advance,
SAX solution is also a little bit problematic, since we would be
forced to develop a functional duplicate of DOM. That is not rational.

It would be perfect, if we are able to process up to head, message and
tail with SAX, and within head, tail and particular msg with DOM. *
*

   1. *Is it possible with xerces-c to partially process a file with
      SAX, then switch to DOM and so on ?*
   2. *If yes, how to do it? Some code sniplet or pointer to
      instructions would be nice.
      *


Thanks in advance

Regards

Matjaž
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEG0cfXBZjK9bIz8sRApDxAJ47vvp1KYwkYtzbweQcOvXxoXTC2ACeJ1gD
+PZjAox2CFtv+w2P4j2j6FQ=
=mmo9
-----END PGP SIGNATURE-----


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org