You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Kaustubh S. Deorukhkar" <ka...@rediffmail.com> on 2007/02/09 07:54:58 UTC

writing large xml files using DOM

Hello,

Need your valuable inputs for this scenario. Any pointers would be helpful.

I have large data set (few 100s of MB or sometimes may even go up to GB). I populate a DOM tree using this data set and then write this DOM tree to xml file. 
How should one handle such large data set in xerces, as we cannot have such a large DOM tree in memory. I have a following questions:

1> Can xerces handle such large data in DOM tree? If yes then how and also does it do some caching to some temporary file on disk??

2> Can I populate the DOM tree and flush it partially and in small chunks?? If yes how?

I think there could be other alternatives than using DOM, but I want to know whether xerces DOM can handle this.


Thanks in advance,
Kaustubh

Re: writing large xml files using DOM

Posted by Boris Kolpackov <bo...@codesynthesis.com>.
Hi,

"Kaustubh S. Deorukhkar" <ka...@rediffmail.com> writes:

> Hello, Need your valuable inputs for this scenario. Any pointers would
> be helpful. I have large data set (few 100s of MB or sometimes may ev
> en go up to GB). I populate a DOM tree using this data set and then write t
> his DOM tree to xml file. How should one handle such large data set in x
> erces, as we cannot have such a large DOM tree in memory.

Xerces-C++ DOM document is an in-memory data structure so unless you
have enough memory you won't be able to create complete document and
write it to disk. One way to approach this is to create a document
for a small chunk of your data and serialize one at a time. This works
well if your document has a repetitive structure, e.g.,

<data>
  <record>aaa</record>
  <record>bbb</record>
  <record>ccc</record>
  ...
</data>

which most large documents do at certain level.

The way to do this with DOM is to create the top level XML structure
by hand (i.e., just write it to the file as text) and write the data
chunks one at a time.

While I don't have a DOM example that shows how to do this, there is
an example in CodeSynthesis XSD[1] (examples/cxx/tree/streaming) that
shows how to do exactly that but with data-bound C++ classes instead
of DOM. It uses DOM underneath so it proves that it's possible to do
the same with DOM.


[1] http://www.codesynthesis.com/products/xsd/


hth,
-boris

-- 
Boris Kolpackov
Code Synthesis Tools CC
http://www.codesynthesis.com
Open-Source, Cross-Platform C++ XML Data Binding


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org