You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Lihong Pei <Li...@xilinx.com> on 2003/03/07 23:27:21 UTC

Questions on reading large xml file using DOM parser

Hi,
I'm trying to parse a large xml file (7MB) using DOM parser ( I have to
use DOM parser). The xml file is like the following:
<root>
<child1> ---------------------------<child1/>
<child2> ---------------------------<child2/>
                         .
                         .
                         .
<childn> ---------------------------<childn/>
<root/>
After the parser read through child1, it does not need it anymore. The
same case as child2 and ... and childn. I know this is a perfect case for
a SAX parser. However, I can't use it. I have to use DOM parser. Is there
any way to remove child1 once the parser finished reading it so that the
memory held by the parser is not too big?

There might be some similar question asked before. However, I can't find
it from the archive. Could you help with some hints?

Thanks a lot!

Lihong


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Questions on reading large xml file using DOM parser

Posted by Graham Mann <gm...@adobe.com>.
Lihong,

Just an idea - try using xslt to split the document in n smaller documents.
Then if you need to relate adjacent children you can have child x-1 x x+1 in memory and roll through them all but it sounds like you don't even require this.

>Hi,
>I'm trying to parse a large xml file (7MB) using DOM parser ( I have to
>use DOM parser). The xml file is like the following:
><root>
><child1> ---------------------------<child1/>
><child2> ---------------------------<child2/>
>                         .
>                         .
>                         .
><childn> ---------------------------<childn/>
><root/>
>After the parser read through child1, it does not need it anymore. The
>same case as child2 and ... and childn. I know this is a perfect case for
>a SAX parser. However, I can't use it. I have to use DOM parser. Is there
>any way to remove child1 once the parser finished reading it so that the
>memory held by the parser is not too big?
>
>There might be some similar question asked before. However, I can't find
>it from the archive. Could you help with some hints?
>
>Thanks a lot!
>
>Lihong
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
>For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Graham Mann
Adobe Systems Europe Ltd.



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Questions on reading large xml file using DOM parser

Posted by Lihong Pei <Li...@xilinx.com>.
David,
It did look rediculous. However, there's some reason for it. We had a wrapper subsystem on xerces-c. Recently we
upgrade the source code from version 1.3( very old one) to 2.2. 1.3 has a very different memeory management
system from 2.2. Since we only expose DOM parser interface, we could get around the problem in version 1.3 using
the "SAX" way: in endElement() call, removed the specific node that we don't need. I don't know whether there is
any easy way of wrapping around SAX parser. The wrapper code has been used for a while by other subsystems. At
present, we don't want to disturb the wrapper interface. That's why I have the above question.

Thanks,

Lihong

David N Bertoni/Cambridge/IBM wrote:

> Hi,
>
> This makes no sense.  You cannot get the document until the parser is
> finished parsing it, so there's no way to use the DOMParser for this.
> Also, how can you say the DOM parser no longer needs a particular child?
> It's building a DOM representation of the document and that means
> everything must be there when it's finished.  Aren't you really saying
> _you_ don't need the child anymore?
>
> Why do you insist you need to use the DOM parser?  Not only is this a
> perfect case for using a SAX parser, but what you're looking for makes with
> the DOM model.
>
> Dave
>
>
>                       "Lihong Pei"
>                       <Lihong.Pei@xili         To:      xerces-c-dev@xml.apache.org
>                       nx.com>                  cc:      (bcc: David N Bertoni/Cambridge/IBM)
>                                                Subject: Questions on reading large xml file using DOM parser
>                       03/07/2003 02:27
>                       PM
>                       Please respond
>                       to xerces-c-dev
>
>
> Hi,
> I'm trying to parse a large xml file (7MB) using DOM parser ( I have to
> use DOM parser). The xml file is like the following:
> <root>
> <child1> ---------------------------<child1/>
> <child2> ---------------------------<child2/>
>                          .
>                          .
>                          .
> <childn> ---------------------------<childn/>
> <root/>
> After the parser read through child1, it does not need it anymore. The
> same case as child2 and ... and childn. I know this is a perfect case for
> a SAX parser. However, I can't use it. I have to use DOM parser. Is there
> any way to remove child1 once the parser finished reading it so that the
> memory held by the parser is not too big?
>
> There might be some similar question asked before. However, I can't find
> it from the archive. Could you help with some hints?
>
> Thanks a lot!
>
> Lihong
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Questions on reading large xml file using DOM parser

Posted by David N Bertoni/Cambridge/IBM <da...@us.ibm.com>.



Hi,

This makes no sense.  You cannot get the document until the parser is
finished parsing it, so there's no way to use the DOMParser for this.
Also, how can you say the DOM parser no longer needs a particular child?
It's building a DOM representation of the document and that means
everything must be there when it's finished.  Aren't you really saying
_you_ don't need the child anymore?

Why do you insist you need to use the DOM parser?  Not only is this a
perfect case for using a SAX parser, but what you're looking for makes with
the DOM model.

Dave



                                                                                                                                               
                      "Lihong Pei"                                                                                                             
                      <Lihong.Pei@xili         To:      xerces-c-dev@xml.apache.org                                                            
                      nx.com>                  cc:      (bcc: David N Bertoni/Cambridge/IBM)                                                   
                                               Subject: Questions on reading large xml file using DOM parser                                   
                      03/07/2003 02:27                                                                                                         
                      PM                                                                                                                       
                      Please respond                                                                                                           
                      to xerces-c-dev                                                                                                          
                                                                                                                                               



Hi,
I'm trying to parse a large xml file (7MB) using DOM parser ( I have to
use DOM parser). The xml file is like the following:
<root>
<child1> ---------------------------<child1/>
<child2> ---------------------------<child2/>
                         .
                         .
                         .
<childn> ---------------------------<childn/>
<root/>
After the parser read through child1, it does not need it anymore. The
same case as child2 and ... and childn. I know this is a perfect case for
a SAX parser. However, I can't use it. I have to use DOM parser. Is there
any way to remove child1 once the parser finished reading it so that the
memory held by the parser is not too big?

There might be some similar question asked before. However, I can't find
it from the archive. Could you help with some hints?

Thanks a lot!

Lihong


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org