You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by Bhat <kp...@sta.samsung.com> on 2008/01/05 01:10:37 UTC

Carving out a subset of XML content into a separate XML file

Hi,

I have written a small function that dumps portions of the DOM tree (rooted
at nodes with the specified tag name) to a file [see below].  Obviously the
output file does not have a root node.  How can I make the output file a
proper XML file (with a single root node)?  I have tried a few things, but
nothing seems to work.  My function follows:


#define SPECIFIED_TAG “yadayada”


void
serializeSubtree (DOMNode* node, DOMWriter *theSerializer, XMLFormatTarget
*target)
{
  if(!node) return;

  const char* name = XMLString::transcode(node->getNodeName());

  if(!strcmp(name, SPECIFIED_TAG))
  {
    theSerializer->writeNode(target, *node);
  }

  for (DOMNode* childNode (node->getFirstChild ()); childNode;
           childNode = childNode->getNextSibling ())
  {
    serializeSubtree(childNode, theSerializer, target);
  }
}


Thanks,
Kong

-- 
View this message in context: http://www.nabble.com/Carving-out-a-subset-of-XML-content-into-a-separate-XML-file-tp14627707p14627707.html
Sent from the Xerces - C - Users mailing list archive at Nabble.com.


Re: Carving out a subset of XML content into a separate XML file

Posted by David Bertoni <db...@apache.org>.
Bhat wrote:
> 
> Bhat wrote:
>> Hi,
>>
>> I have written a small function that dumps portions of the DOM tree
>> (rooted at nodes with the specified tag name) to a file [see below]. 
>> Obviously the output file does not have a root node.  How can I make the
>> output file a proper XML file (with a single root node)?  I have tried a
>> few things, but nothing seems to work.  My function follows:
>>
> 
> 
> Here's what I did:
> First I created a new DOM document using the
> DOMImplementation::createDocument() method.  For this, I had to provide the
> name of the root node.  I traversed the original XML file and when I found
> the DOMNodes with the desired tag name (in the original XML file), I did a
> deep cloning of the nodes [using DOMNode::cloneNode(true)].  Then I tried
> the following options (all unsuccessful):
> 
> 1).  I appended the cloned nodes as a child to the new DOM document, using
> the appendChild() method.  This caused an exception to be thrown (sadly
> cannot decipher the exception yet)
This won't work, because the nodes don't belong to the document.  Also, a 
document can only have a single element child.

> 
> 2). I obtained the root node of the new XML document , using the
> getDocumentElement() method, and tried to append the cloned nodes to the
> root node.  Again, this caused an exception to be thrown (sadly cannot
> decipher the exception yet)
This won't work, because the nodes don't belong to the document.

> 
> 3). I imported the cloned nodes into the new DOM document, using the
> DOMDocument::importNode(DOMNode*, true) call.  This time there was no
> exception, but the subtrees rooted at the cloned nodes dif not get attached
> at all.  
Yes, because importNode doesn't "attach" the nodes.  You need to append the 
nodes to the document element of the new document.

What you are doing is incredibly inefficient for the purposes of just 
generating a single root node.

Dave

Re: Carving out a subset of XML content into a separate XML file

Posted by Bhat <kp...@sta.samsung.com>.

Bhat wrote:
> 
> Hi,
> 
> I have written a small function that dumps portions of the DOM tree
> (rooted at nodes with the specified tag name) to a file [see below]. 
> Obviously the output file does not have a root node.  How can I make the
> output file a proper XML file (with a single root node)?  I have tried a
> few things, but nothing seems to work.  My function follows:
> 


Here's what I did:
First I created a new DOM document using the
DOMImplementation::createDocument() method.  For this, I had to provide the
name of the root node.  I traversed the original XML file and when I found
the DOMNodes with the desired tag name (in the original XML file), I did a
deep cloning of the nodes [using DOMNode::cloneNode(true)].  Then I tried
the following options (all unsuccessful):

1).  I appended the cloned nodes as a child to the new DOM document, using
the appendChild() method.  This caused an exception to be thrown (sadly
cannot decipher the exception yet)

2). I obtained the root node of the new XML document , using the
getDocumentElement() method, and tried to append the cloned nodes to the
root node.  Again, this caused an exception to be thrown (sadly cannot
decipher the exception yet)

3). I imported the cloned nodes into the new DOM document, using the
DOMDocument::importNode(DOMNode*, true) call.  This time there was no
exception, but the subtrees rooted at the cloned nodes dif not get attached
at all.  

Any help will be appreciated.

Thanks,
Bhat
-- 
View this message in context: http://www.nabble.com/Carving-out-a-subset-of-XML-content-into-a-separate-XML-file-tp14627707p14628251.html
Sent from the Xerces - C - Users mailing list archive at Nabble.com.


Re: Carving out a subset of XML content into a separate XML file

Posted by Bhat <kp...@sta.samsung.com>.

David Bertoni wrote:
> 
> 
> Why not try to low budget approach, and just write out a small amount of 
> markup before and after you serialize the nodes?  Writing out "<root>" and 
> </root> isn't terribly complicated.
> 
> Dave
> 
> 

First of all, thanks to David Bertoni for his response.  I have now created
a non-recursive version of my function, using the DOMWalker.  It seems to
iterate properly, but nothing is getting written to the file.  I would
appreciate if someone could take a quick look and let me know what silly
mistake I am making.  For your convenience I am attaching both the recursive
version of the function (which works) and the walker version of the function
(which does not).

//////////////////////////////////////////
// Listing 1 (recursive --- works)
//////////////////////////////////////////
void
serializeSubtree_recursive (DOMNode* node, DOMWriter *theSerializer,
XMLFormatTarget *target, const std::string& topNodeName)
{

// May consider making topNodeName a global so that it
// does not get passed with every recursive call
//

  if(!node) return;

  std::string thisNodeName = XMLString::transcode(node->getNodeName());
  if(thisNodeName == topNodeName)
    theSerializer->writeNode(target, *node);

  for (DOMNode* childNode (node->getFirstChild ()); childNode;
           childNode = childNode->getNextSibling ())
  {
    serializeSubtree_recursive (childNode, theSerializer, target,
topNodeName);
  }
}


//////////////////////////////////////////
// Listing 2 (using DOM Walker
// ----- does not work)
//////////////////////////////////////////

void
serializeSubtree_walker (DOMNode* startingNode, DOMWriter& theSerializer,
XMLFormatTarget& target, const std::string& topNodeName)
{
  if(!startingNode) return;

  DOMDocument* docPtr = startingNode->getOwnerDocument();
  if(!docPtr) return;

  DOMTreeWalker* walker = docPtr->createTreeWalker(startingNode,
      DOMNodeFilter::SHOW_ALL, NULL, true);

  DOMNode* nodePtr;
  for(nodePtr = walker->nextNode(); nodePtr != NULL; nodePtr =
walker->nextNode())
  {
    if (nodePtr->getNodeType() == DOMNode::ELEMENT_NODE)
    {
      std::string thisNodeName =
XMLString::transcode(nodePtr->getNodeName());
      if(thisNodeName == topNodeName)
        theSerializer.writeNode(&target, *nodePtr);
    }
  }
}

Bhat
-- 
View this message in context: http://www.nabble.com/Carving-out-a-subset-of-XML-content-into-a-separate-XML-file-tp14627707p14705310.html
Sent from the Xerces - C - Users mailing list archive at Nabble.com.


Re: Carving out a subset of XML content into a separate XML file

Posted by David Bertoni <db...@apache.org>.
Bhat wrote:
> Hi,
> 
> I have written a small function that dumps portions of the DOM tree (rooted
> at nodes with the specified tag name) to a file [see below].  Obviously the
> output file does not have a root node.  How can I make the output file a
> proper XML file (with a single root node)?  I have tried a few things, but
> nothing seems to work.  My function follows:
> 
> 
> #define SPECIFIED_TAG “yadayada”

It would be better if you used the existing Xerces-C UTF-16 code point 
statics to create a static string encoded in UTF-16, instead of the local 
code page.  Then you can skip transcoding.  See src/xercesc/util/XMLUri.cpp 
for some examples.

> 
> 
> void
> serializeSubtree (DOMNode* node, DOMWriter *theSerializer, XMLFormatTarget
> *target)
> {
>   if(!node) return;
> 
>   const char* name = XMLString::transcode(node->getNodeName());
You should free this memory using XMLString::release(), or you will suffer 
a memory leak.  Also, this function can fail silently if the node name 
contains characters not representable in the local code page.

See my previous comment for a better solution.

> 
>   if(!strcmp(name, SPECIFIED_TAG))
>   {
>     theSerializer->writeNode(target, *node);
>   }
> 
>   for (DOMNode* childNode (node->getFirstChild ()); childNode;
>            childNode = childNode->getNextSibling ())
>   {
>     serializeSubtree(childNode, theSerializer, target);
>   }
> }
Why not try to low budget approach, and just write out a small amount of 
markup before and after you serialize the nodes?  Writing out "<root>" and 
</root> isn't terribly complicated.

Dave