You are viewing a plain text version of this content. The canonical link for it is here.

Posted to c-users@xerces.apache.org by Florent Philippe <ph...@yahoo.fr> on 2006/09/08 22:04:40 UTC

How to parse an XML string in memory? and get a DOM tree out of it

Hi,
 
Is it possible to parse the output of a DOM created xml that is in a memoy buffer
to be able to parse it with functions like getchild, getroot or something?
 
I found out about using handler to retreive data on parsing but i thought it woul be easier to deal with the solution above
 
thx

Re: How to parse an XML string in memory? and get a DOM tree out of it

Posted by David Bertoni <db...@apache.org>.

Andrew Patterson wrote:
>> Sure you can, you just can't use DOMBuilderImpl::parseWithContext().  
>> Just use the regular parse() member function:
>>
>> DOMDocument* DOMBuilderImpl::parse(const DOMInputSource& source)
> 
> Ah, okay, got it working. I tried parse() first and couldn't get it 
> working right. That's when I moved on to parseWithContext() (it seemed 
> more appropriate for what I was doing). I've figured out what I was 
> doing wrong with parse() -- all is good! Many, many thanks!
> 
> One last question, purely out of curiosity. I've been dumping unused 
> nodes into document fragment and then writing it out to string for 
> storage. What I've been trying to do is pull it back out later on (the 
> XML parsers may be terminated in the interim which is why I have to save 
> it as a string temporarily instead of just leaving the DOM fragments 
> intact).
> 
> If the document fragment had one node in it, all was fine. Got the first 
> child of the parsed string and there it was. But if there was more than 
> one node, they seem to evaporate when parsed (Maybe I'm just traveling 
> the resultant DOM wrong though). I.e. when I ask the first child for 
> it's sibling, I get zero.

I doubt they evaporated.  Are you sure you installed a DOMErrorHandler 
instance in the DOMBuilder instance?  If you did, then you would have seen 
an error, since what you describe is not a well-formed XML document, 
although it is a well-formed external general parsed entity:

http://www.w3.org/TR/REC-xml/#wf-entities

> 
> I solved this by adding a 'container' element to the fragment first when 
> saving, and dumping extra nodes into that -- and, of course, moving down 
> an additional level when parsing it back out. Still, it seems like a 
> strangely unnecessary step. Is this an unavoidable result of not having 
> a root node in my fragment? Or am I just not traversing the parsed out 
> document fragment correctly?

You are using the canonical method to turn an external general parsed 
entity into a well-formed document -- wrapping it in a root element:

http://www.w3.org/TR/xslt#section-XML-Output-Method

Dave
> 
> ..............................
> Andrew Patterson
> Software Engineer
> Avenza Systems Inc.
> 
> email: andrew@avenza.com
> phone: 416.487.5116
>

Re: How to parse an XML string in memory? and get a DOM tree out of it

Posted by Andrew Patterson <an...@avenza.com>.

> Sure you can, you just can't use DOMBuilderImpl::parseWithContext().  
> Just use the regular parse() member function:
> 
> DOMDocument* DOMBuilderImpl::parse(const DOMInputSource& source)

Ah, okay, got it working. I tried parse() first and couldn't get it 
working right. That's when I moved on to parseWithContext() (it seemed 
more appropriate for what I was doing). I've figured out what I was 
doing wrong with parse() -- all is good! Many, many thanks!

One last question, purely out of curiosity. I've been dumping unused 
nodes into document fragment and then writing it out to string for 
storage. What I've been trying to do is pull it back out later on (the 
XML parsers may be terminated in the interim which is why I have to save 
it as a string temporarily instead of just leaving the DOM fragments 
intact).

If the document fragment had one node in it, all was fine. Got the first 
child of the parsed string and there it was. But if there was more than 
one node, they seem to evaporate when parsed (Maybe I'm just traveling 
the resultant DOM wrong though). I.e. when I ask the first child for 
it's sibling, I get zero.

I solved this by adding a 'container' element to the fragment first when 
saving, and dumping extra nodes into that -- and, of course, moving down 
an additional level when parsing it back out. Still, it seems like a 
strangely unnecessary step. Is this an unavoidable result of not having 
a root node in my fragment? Or am I just not traversing the parsed out 
document fragment correctly?

..............................
Andrew Patterson
Software Engineer
Avenza Systems Inc.

email: andrew@avenza.com
phone: 416.487.5116

Re: How to parse an XML string in memory? and get a DOM tree out of it

Posted by David Bertoni <db...@apache.org>.

Andrew Patterson wrote:
>> You have the source code available, so please consider using it. ;-)
> 
> Doh! Sorry, I just downloaded the source too to get the sample -- should 
> have looked in there :P So how then can I turn a string into a DOM tree? 
> Is it simply not possible to parse a string into a DOM tree (yet?) or is 
> there some other way I should be doing this?
> 

Sure you can, you just can't use DOMBuilderImpl::parseWithContext().  Just 
use the regular parse() member function:

DOMDocument* DOMBuilderImpl::parse(const DOMInputSource& source)

Dave

Re: How to parse an XML string in memory? and get a DOM tree out of it

Posted by Andrew Patterson <an...@avenza.com>.

> You have the source code available, so please consider using it. ;-)

Doh! Sorry, I just downloaded the source too to get the sample -- should 
have looked in there :P So how then can I turn a string into a DOM tree? 
Is it simply not possible to parse a string into a DOM tree (yet?) or is 
there some other way I should be doing this?

..............................
Andrew Patterson
Software Engineer
Avenza Systems Inc.

email: andrew@avenza.com
phone: 416.487.5116

Re: How to parse an XML string in memory? and get a DOM tree out of it

Posted by David Bertoni <db...@apache.org>.

Andrew Patterson wrote:
>> I don't know what the problem is with the DOMBuilder, but it's clear 
>> why the exception message is not printed properly. If you check the 
>> return type from DOMException::getMessage(), you'll see it's "const 
>> XMLCh*" which means it's a UTF-16 string.  Unfortunately, you've told 
>> sprintf that the parameter is "const char*" so you won't get anything 
>> interesting as a result.
> 
> Ah, okay -- thanks! Took about 15 seconds to add and the result is now 
> 'The implementation does not support the requested type of object or 
> operation'. Any idea which object or operation it's talking about? 
> Perhaps it's not happy about the DOMBuilder::ACTION_APPEND_AS_CHILDREN 
> action I've requested?

You have the source code available, so please consider using it. ;-)

Here's the relevant code from DOMBuilderImpl.cpp:

void DOMBuilderImpl::parseWithContext(const DOMInputSource&,
                                       DOMNode* const,
                                       const short)
{
     throw DOMException(DOMException::NOT_SUPPORTED_ERR, 0, 
getMemoryManager());
}

Dave

Re: How to parse an XML string in memory? and get a DOM tree out of it

Posted by Andrew Patterson <an...@avenza.com>.

> I don't know what the problem is with the DOMBuilder, but it's clear why 
> the exception message is not printed properly. If you check the return 
> type from DOMException::getMessage(), you'll see it's "const XMLCh*" 
> which means it's a UTF-16 string.  Unfortunately, you've told sprintf 
> that the parameter is "const char*" so you won't get anything 
> interesting as a result.

Ah, okay -- thanks! Took about 15 seconds to add and the result is now 
'The implementation does not support the requested type of object or 
operation'. Any idea which object or operation it's talking about? 
Perhaps it's not happy about the DOMBuilder::ACTION_APPEND_AS_CHILDREN 
action I've requested?

..............................
Andrew Patterson
Software Engineer
Avenza Systems Inc.

email: andrew@avenza.com
phone: 416.487.5116

Re: How to parse an XML string in memory? and get a DOM tree out of it

Posted by David Bertoni <db...@apache.org>.

Andrew Patterson wrote:
>> I am not sure I understood what you asked, but I think you need to use 
>> MemBufInputSource; see the MemParse sample for an example of its usage.
> 
> I'm working on something similar -- I posted here a month or so back 
> about it -- and I can't get it to work either. I looked at MemParse.cpp 
> but it's using a SAX parser and I need DOM -- so the sample has some 
> usefulness, but is obviously different.
> 
> Here's what I'm trying:
> 
> --------------------
> const XMLByte foo[256] = "<foo>Testing</foo>";
> int size = strlen((char*)foo);
> 
> MemBufInputSource* stringSource = new MemBufInputSource(foo, size,
>           "ignored", false);
> assert(stringSource);
> 
> try {
>   Wrapper4InputSource source(stringSource);
>   m_parser->parseWithContext(source, &node,
>         DOMBuilder::ACTION_APPEND_AS_CHILDREN);
> } catch (DOMException& e) {
>   printf("EXCEPTION: '%s'\n", e.getMessage());       
> }
> --------------------
> 
> node is a DOMNode& that's been passed in as the place to attached the 
> resultant DOM fragment & m_parser is a DOMBuilder*. I'm obviously doing 
> *something* wrong, but the exception I get is extremely unhelpful. The 
> output I get is "EXCEPTION: 'T'" -- not the most informative message ^_^
> 

I don't know what the problem is with the DOMBuilder, but it's clear why 
the exception message is not printed properly. If you check the return type 
from DOMException::getMessage(), you'll see it's "const XMLCh*" which means 
it's a UTF-16 string.  Unfortunately, you've told sprintf that the 
parameter is "const char*" so you won't get anything interesting as a result.

Look in the documentation for XMLString::transcode() to see how you can 
transcode the UTF-16 string to the local code page.

Dave

Re: How to parse an XML string in memory? and get a DOM tree out of it

Posted by Andrew Patterson <an...@avenza.com>.

> I am not sure I understood what you asked, but I think you need to use 
> MemBufInputSource; see the MemParse sample for an example of its usage.

I'm working on something similar -- I posted here a month or so back 
about it -- and I can't get it to work either. I looked at MemParse.cpp 
but it's using a SAX parser and I need DOM -- so the sample has some 
usefulness, but is obviously different.

Here's what I'm trying:

--------------------
const XMLByte foo[256] = "<foo>Testing</foo>";
int size = strlen((char*)foo);

MemBufInputSource* stringSource = new MemBufInputSource(foo, size,
           "ignored", false);
assert(stringSource);

try {
   Wrapper4InputSource source(stringSource);
   m_parser->parseWithContext(source, &node,
         DOMBuilder::ACTION_APPEND_AS_CHILDREN);
} catch (DOMException& e) {
   printf("EXCEPTION: '%s'\n", e.getMessage());		
}
--------------------

node is a DOMNode& that's been passed in as the place to attached the 
resultant DOM fragment & m_parser is a DOMBuilder*. I'm obviously doing 
*something* wrong, but the exception I get is extremely unhelpful. The 
output I get is "EXCEPTION: 'T'" -- not the most informative message ^_^

Any suggestions on what I can do to alleviate this? I think I'm on the 
right track, but clearly something isn't right.

Any help appreciated!

..............................
Andrew Patterson
Software Engineer
Avenza Systems Inc.

email: andrew@avenza.com
phone: 416.487.5116

Re: How to parse an XML string in memory? and get a DOM tree out of it

Posted by Alberto Massari <am...@datadirect.com>.

Hi,
I am not sure I understood what you asked, but I think you need to 
use MemBufInputSource; see the MemParse sample for an example of its usage.

Alberto

At 20.04 08/09/2006 +0000, Florent Philippe wrote:
>Hi,
>
>Is it possible to parse the output of a DOM created xml that is in a 
>memoy buffer
>to be able to parse it with functions like getchild, getroot or something?
>
>I found out about using handler to retreive data on parsing but i 
>thought it woul be easier to deal with the solution above
>
>thx