You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Magnus Strand <ma...@tim.se> on 2003/03/27 10:16:21 UTC

Getting text off all subnodes

Hi,

Is there a method to return all text merged from all text subnodes from
a certain nod in a DOM tree?

I mean if i have this as part of an xml:
<header>It is <b>sunny</b>weather today</header>

I have a DOMNode* to the header element and would like to return all the
text:
"It is sunny weather today"

Regards,
Magnus Strand

–––––––––––––––––––––––––––––––––––––––––––––––––
System Developer, MSc

Teknik i Media Sverige AB (publ)
Södra Förstadsgatan 2, SE-211 43 Malmö, Sweden
Mobile: +46 704 20 57 16
Direct: +46 40 601 57 16
Office: +46 40 601 57 00
http://www.tim.se
–––––––––––––––––––––––––––––––––––––––––––––––––
DISCLAIMER: "The information contained in this email and any
attachment is confidential. It is intended only for the named
addressee(s). If you are not the named addressee please notify the
sender immediately and do not disclose, copy or distribute the
contents to any other person other than the intended addressee(s)."


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Re: Getting text off all subnodes

Posted by Gareth Reakes <ga...@decisionsoft.com>.
Hi,
	code looks good to me. If someone volunteers to write a couple of 
tests in tests/DOM/DOMTest/DTest.cpp then I will commit it.

Gareth




On Thu, 27 Mar 2003, Erik Rydgren wrote:

> Here is another variant that doesn't copy around the data as much.
> 
> const XMLCh*     DOMNodeImpl::getTextContent() const
> {
>   DOMNode *thisNode = castToNode(this);
>   int nContentLength = 0;
>   { // Get total length of content
>     DOMNodeIteratorImpl nodeIterator(thisNode, 0x00000004, NULL, false); //
> SHOW_TEXT
>     DOMNode *currentNode = nodeIterator.nextNode();
>     while (currentNode) {
>       nContentLength += XMLString::stringLen(currentNode->getNodeValue());
>       currentNode = nodeIterator.nextNode();
>     }
>   }
> 
>   // Allocate buffer
>   XMLCh* pzContentStr = new XMLCh[nContentLength+1];
>   *pzContentStr = 0;
> 
>   { // Fetch content
>     DOMNodeIteratorImpl nodeIterator(thisNode, 0x00000004, NULL, false); //
> SHOW_TEXT
>     DOMNode *currentNode = nodeIterator.nextNode();
>     while (currentNode) {
>       XMLString::catString(pzContentStr, currentNode->getNodeValue());
>       currentNode = nodeIterator.nextNode();
>     }
>   }
> 
>   return pzContentStr;
> }
> 
> This of course forces the user to delete the returned string, which is
> non-standard behavour.
> BTW, this is completly untested code. It compiles but not more.
> 
> Regards
> 
> Erik Rydgren
> Mandarinen systems AB
> Sweden
> 
> 
> -----Original Message-----
> From: Andreï V. FOMITCHEV [mailto:afomitchev@odixion.com]
> Sent: den 27 mars 2003 15:56
> To: xerces-c-dev@xml.apache.org
> Subject: Re: Re: Getting text off all subnodes
> 
> 
> a beginning:
> 
> String * XML::getTextContent(DOMNode * node)
> {
>   String * res = new String("");
> 
>   DOMNodeList * laListe = node->getChildNodes();
>   DOMNode * noeud_courant;
>   for(XMLSize_t i = 0; i < laListe->getLength(); i++)
>      {
>        noeud_courant = laListe->item(i);
>        // NodeType
>        switch(noeud_courant->getNodeType())
>          {
>          case DOMNode::ELEMENT_NODE:
>            {
>              printf("ELEMENT_NODE ");
>              String * tampon = getTextContent(noeud_courant);
>              *res += *tampon;
>              delete tampon;
>              break;
>            }
>          case DOMNode::ATTRIBUTE_NODE:
>            //pas de contenu texte ...
>            break;
>          case DOMNode::TEXT_NODE:
>            {
> 
>              char * tampon =
> XMLString::transcode(noeud_courant->getNodeValue());
>              printf("tampon = '%s'\n", tampon);
>              if(tampon != NULL)
>                *res += String(tampon);
>              delete [] tampon;
>              break;
>            }
>          case DOMNode::CDATA_SECTION_NODE:
>          case DOMNode::ENTITY_REFERENCE_NODE:
>          case DOMNode::ENTITY_NODE:
>          case DOMNode::PROCESSING_INSTRUCTION_NODE:
>          case DOMNode::COMMENT_NODE:
>          case DOMNode::DOCUMENT_NODE:
>          case DOMNode::DOCUMENT_TYPE_NODE:
>          case DOMNode::DOCUMENT_FRAGMENT_NODE:
>          case DOMNode::NOTATION_NODE:
>          default :
>            printf("Others: %d node\n", noeud_courant->getNodeType());
>            break;
>          }
>      }
>   return res;
> }
> where class String is a Al Dev's class.
> This piece of code functions except with the accentuated letters (There
> is always nobody to help me with the accents?).
> 
> 
> Gareth Reakes wrote:
> 
> >Hi,
> >	getTextContent does what you want but is unimplemented. If you
> >decide to implement it then we will gladly accept your code :)
> >
> >Gareth
> >
> >
> >
> >
> >On Thu, 27 Mar 2003, Magnus Strand wrote:
> >
> >
> >
> >>Hi,
> >>
> >>Is there a method to return all text merged from all text subnodes from
> >>a certain nod in a DOM tree?
> >>
> >>I mean if i have this as part of an xml:
> >><header>It is <b>sunny</b>weather today</header>
> >>
> >>I have a DOMNode* to the header element and would like to return all the
> >>text:
> >>"It is sunny weather today"
> >>
> >>Regards,
> >>Magnus Strand
> >>
> >>
> --
> Andreï V. FOMITCHEV               [Quand faut-il arrêter l'informatique]
> Software R&D Engineer    [Lorsque, dans un kilo, on trouve 1024 grammes]
> Odixion, FRANCE
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
> 
> 

-- 
Gareth Reakes, Head of Product Development  +44-1865-203192
DecisionSoft Limited                        http://www.decisionsoft.com
XML Development and Services




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Re: Getting text off all subnodes

Posted by Gareth Reakes <ga...@decisionsoft.com>.
Hi,
	thats great if you want to test it. Take a look in 
tests/DOM/DOMTest/DTest.cpp. The spec says we have to return an XMLCh so
Eriks is closer.


> const XMLCh*     DOMNodeImpl::getTextContent() const
> {
>   DOMNode *thisNode = castToNode(this);
>   int nContentLength = 0;
>   { // Get total length of content
>     DOMNodeIteratorImpl nodeIterator(thisNode, 0x00000004, NULL, false);
> //
> SHOW_TEXT
>     DOMNode *currentNode = nodeIterator.nextNode();
>     while (currentNode) {
>       nContentLength +=
> XMLString::stringLen(currentNode->getNodeValue());
>       currentNode = nodeIterator.nextNode();
>     }
>   }
> 
>   // Allocate buffer
>   XMLCh* pzContentStr = new XMLCh[nContentLength+1];
>   *pzContentStr = 0;
> 
>   { // Fetch content
>     DOMNodeIteratorImpl nodeIterator(thisNode, 0x00000004, NULL, false);
> //
> SHOW_TEXT
>     DOMNode *currentNode = nodeIterator.nextNode();
>     while (currentNode) {
>       XMLString::catString(pzContentStr, currentNode->getNodeValue());
>       currentNode = nodeIterator.nextNode();
>     }
>   }
> 
>   return pzContentStr;

//to get arround the memory problem we can do
    XMLCh *stringCopy = 
((DOMDocumentImpl*)getOwnerDocument())->cloneString(pzContentStr);
    delete pzContentStr;
    return stringCopy;

//the user does not have to delete this memory

> }


give us a shout if you need any help.


Gareth

-- 
Gareth Reakes, Head of Product Development  +44-1865-203192
DecisionSoft Limited                        http://www.decisionsoft.com
XML Development and Services




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Re: Getting text off all subnodes

Posted by Magnus Strand <ma...@tim.se>.
Thanks Erik and Andreï,

It is just what I need.
I will soon test it.

Regards,
Magnus Strand

–––––––––––––––––––––––––––––––––––––––––––––––––
System Developer, MSc

Teknik i Media Sverige AB (publ)
Södra Förstadsgatan 2, SE-211 43 Malmö, Sweden
Mobile: +46 704 20 57 16
Direct: +46 40 601 57 16
Office: +46 40 601 57 00
http://www.tim.se
–––––––––––––––––––––––––––––––––––––––––––––––––
DISCLAIMER: "The information contained in this email and any
attachment is confidential. It is intended only for the named
addressee(s). If you are not the named addressee please notify the
sender immediately and do not disclose, copy or distribute the
contents to any other person other than the intended addressee(s)."


-----Original Message-----
From: Erik Rydgren [mailto:erik.rydgren@mandarinen.se] 
Sent: den 27 mars 2003 17:55
To: xerces-c-dev@xml.apache.org; xerces-c-dev@xml.apache.org
Subject: RE: Re: Getting text off all subnodes


Here is another variant that doesn't copy around the data as much.

const XMLCh*     DOMNodeImpl::getTextContent() const
{
  DOMNode *thisNode = castToNode(this);
  int nContentLength = 0;
  { // Get total length of content
    DOMNodeIteratorImpl nodeIterator(thisNode, 0x00000004, NULL, false);
//
SHOW_TEXT
    DOMNode *currentNode = nodeIterator.nextNode();
    while (currentNode) {
      nContentLength +=
XMLString::stringLen(currentNode->getNodeValue());
      currentNode = nodeIterator.nextNode();
    }
  }

  // Allocate buffer
  XMLCh* pzContentStr = new XMLCh[nContentLength+1];
  *pzContentStr = 0;

  { // Fetch content
    DOMNodeIteratorImpl nodeIterator(thisNode, 0x00000004, NULL, false);
//
SHOW_TEXT
    DOMNode *currentNode = nodeIterator.nextNode();
    while (currentNode) {
      XMLString::catString(pzContentStr, currentNode->getNodeValue());
      currentNode = nodeIterator.nextNode();
    }
  }

  return pzContentStr;
}

This of course forces the user to delete the returned string, which is
non-standard behavour.
BTW, this is completly untested code. It compiles but not more.

Regards

Erik Rydgren
Mandarinen systems AB
Sweden


-----Original Message-----
From: Andreï V. FOMITCHEV [mailto:afomitchev@odixion.com]
Sent: den 27 mars 2003 15:56
To: xerces-c-dev@xml.apache.org
Subject: Re: Re: Getting text off all subnodes


a beginning:

String * XML::getTextContent(DOMNode * node)
{
  String * res = new String("");

  DOMNodeList * laListe = node->getChildNodes();
  DOMNode * noeud_courant;
  for(XMLSize_t i = 0; i < laListe->getLength(); i++)
     {
       noeud_courant = laListe->item(i);
       // NodeType
       switch(noeud_courant->getNodeType())
         {
         case DOMNode::ELEMENT_NODE:
           {
             printf("ELEMENT_NODE ");
             String * tampon = getTextContent(noeud_courant);
             *res += *tampon;
             delete tampon;
             break;
           }
         case DOMNode::ATTRIBUTE_NODE:
           //pas de contenu texte ...
           break;
         case DOMNode::TEXT_NODE:
           {

             char * tampon =
XMLString::transcode(noeud_courant->getNodeValue());
             printf("tampon = '%s'\n", tampon);
             if(tampon != NULL)
               *res += String(tampon);
             delete [] tampon;
             break;
           }
         case DOMNode::CDATA_SECTION_NODE:
         case DOMNode::ENTITY_REFERENCE_NODE:
         case DOMNode::ENTITY_NODE:
         case DOMNode::PROCESSING_INSTRUCTION_NODE:
         case DOMNode::COMMENT_NODE:
         case DOMNode::DOCUMENT_NODE:
         case DOMNode::DOCUMENT_TYPE_NODE:
         case DOMNode::DOCUMENT_FRAGMENT_NODE:
         case DOMNode::NOTATION_NODE:
         default :
           printf("Others: %d node\n", noeud_courant->getNodeType());
           break;
         }
     }
  return res;
}
where class String is a Al Dev's class.
This piece of code functions except with the accentuated letters (There
is always nobody to help me with the accents?).


Gareth Reakes wrote:

>Hi,
>	getTextContent does what you want but is unimplemented. If you
>decide to implement it then we will gladly accept your code :)
>
>Gareth
>
>
>
>
>On Thu, 27 Mar 2003, Magnus Strand wrote:
>
>
>
>>Hi,
>>
>>Is there a method to return all text merged from all text subnodes
from
>>a certain nod in a DOM tree?
>>
>>I mean if i have this as part of an xml:
>><header>It is <b>sunny</b>weather today</header>
>>
>>I have a DOMNode* to the header element and would like to return all
the
>>text:
>>"It is sunny weather today"
>>
>>Regards,
>>Magnus Strand
>>
>>
--
Andreï V. FOMITCHEV               [Quand faut-il arrêter l'informatique]
Software R&D Engineer    [Lorsque, dans un kilo, on trouve 1024 grammes]
Odixion, FRANCE




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Re: Getting text off all subnodes

Posted by Erik Rydgren <er...@mandarinen.se>.
Here is another variant that doesn't copy around the data as much.

const XMLCh*     DOMNodeImpl::getTextContent() const
{
  DOMNode *thisNode = castToNode(this);
  int nContentLength = 0;
  { // Get total length of content
    DOMNodeIteratorImpl nodeIterator(thisNode, 0x00000004, NULL, false); //
SHOW_TEXT
    DOMNode *currentNode = nodeIterator.nextNode();
    while (currentNode) {
      nContentLength += XMLString::stringLen(currentNode->getNodeValue());
      currentNode = nodeIterator.nextNode();
    }
  }

  // Allocate buffer
  XMLCh* pzContentStr = new XMLCh[nContentLength+1];
  *pzContentStr = 0;

  { // Fetch content
    DOMNodeIteratorImpl nodeIterator(thisNode, 0x00000004, NULL, false); //
SHOW_TEXT
    DOMNode *currentNode = nodeIterator.nextNode();
    while (currentNode) {
      XMLString::catString(pzContentStr, currentNode->getNodeValue());
      currentNode = nodeIterator.nextNode();
    }
  }

  return pzContentStr;
}

This of course forces the user to delete the returned string, which is
non-standard behavour.
BTW, this is completly untested code. It compiles but not more.

Regards

Erik Rydgren
Mandarinen systems AB
Sweden


-----Original Message-----
From: Andreï V. FOMITCHEV [mailto:afomitchev@odixion.com]
Sent: den 27 mars 2003 15:56
To: xerces-c-dev@xml.apache.org
Subject: Re: Re: Getting text off all subnodes


a beginning:

String * XML::getTextContent(DOMNode * node)
{
  String * res = new String("");

  DOMNodeList * laListe = node->getChildNodes();
  DOMNode * noeud_courant;
  for(XMLSize_t i = 0; i < laListe->getLength(); i++)
     {
       noeud_courant = laListe->item(i);
       // NodeType
       switch(noeud_courant->getNodeType())
         {
         case DOMNode::ELEMENT_NODE:
           {
             printf("ELEMENT_NODE ");
             String * tampon = getTextContent(noeud_courant);
             *res += *tampon;
             delete tampon;
             break;
           }
         case DOMNode::ATTRIBUTE_NODE:
           //pas de contenu texte ...
           break;
         case DOMNode::TEXT_NODE:
           {

             char * tampon =
XMLString::transcode(noeud_courant->getNodeValue());
             printf("tampon = '%s'\n", tampon);
             if(tampon != NULL)
               *res += String(tampon);
             delete [] tampon;
             break;
           }
         case DOMNode::CDATA_SECTION_NODE:
         case DOMNode::ENTITY_REFERENCE_NODE:
         case DOMNode::ENTITY_NODE:
         case DOMNode::PROCESSING_INSTRUCTION_NODE:
         case DOMNode::COMMENT_NODE:
         case DOMNode::DOCUMENT_NODE:
         case DOMNode::DOCUMENT_TYPE_NODE:
         case DOMNode::DOCUMENT_FRAGMENT_NODE:
         case DOMNode::NOTATION_NODE:
         default :
           printf("Others: %d node\n", noeud_courant->getNodeType());
           break;
         }
     }
  return res;
}
where class String is a Al Dev's class.
This piece of code functions except with the accentuated letters (There
is always nobody to help me with the accents?).


Gareth Reakes wrote:

>Hi,
>	getTextContent does what you want but is unimplemented. If you
>decide to implement it then we will gladly accept your code :)
>
>Gareth
>
>
>
>
>On Thu, 27 Mar 2003, Magnus Strand wrote:
>
>
>
>>Hi,
>>
>>Is there a method to return all text merged from all text subnodes from
>>a certain nod in a DOM tree?
>>
>>I mean if i have this as part of an xml:
>><header>It is <b>sunny</b>weather today</header>
>>
>>I have a DOMNode* to the header element and would like to return all the
>>text:
>>"It is sunny weather today"
>>
>>Regards,
>>Magnus Strand
>>
>>
--
Andreï V. FOMITCHEV               [Quand faut-il arrêter l'informatique]
Software R&D Engineer    [Lorsque, dans un kilo, on trouve 1024 grammes]
Odixion, FRANCE




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Re: Getting text off all subnodes

Posted by "Andreï V. FOMITCHEV" <af...@odixion.com>.
a beginning:

String * XML::getTextContent(DOMNode * node)
{
  String * res = new String("");

  DOMNodeList * laListe = node->getChildNodes();
  DOMNode * noeud_courant;
  for(XMLSize_t i = 0; i < laListe->getLength(); i++)
     {
       noeud_courant = laListe->item(i);
       // NodeType
       switch(noeud_courant->getNodeType())
         {
         case DOMNode::ELEMENT_NODE:
           {
             printf("ELEMENT_NODE ");
             String * tampon = getTextContent(noeud_courant);
             *res += *tampon;
             delete tampon;
             break;
           }
         case DOMNode::ATTRIBUTE_NODE:
           //pas de contenu texte ...
           break;
         case DOMNode::TEXT_NODE:
           {

             char * tampon = 
XMLString::transcode(noeud_courant->getNodeValue());
             printf("tampon = '%s'\n", tampon);
             if(tampon != NULL)
               *res += String(tampon);
             delete [] tampon;
             break;
           }
         case DOMNode::CDATA_SECTION_NODE:
         case DOMNode::ENTITY_REFERENCE_NODE:
         case DOMNode::ENTITY_NODE:
         case DOMNode::PROCESSING_INSTRUCTION_NODE:
         case DOMNode::COMMENT_NODE:
         case DOMNode::DOCUMENT_NODE:
         case DOMNode::DOCUMENT_TYPE_NODE:
         case DOMNode::DOCUMENT_FRAGMENT_NODE:
         case DOMNode::NOTATION_NODE:
         default :
           printf("Others: %d node\n", noeud_courant->getNodeType());
           break;
         }
     }
  return res;
}
where class String is a Al Dev's class.
This piece of code functions except with the accentuated letters (There 
is always nobody to help me with the accents?).


Gareth Reakes wrote:

>Hi,
>	getTextContent does what you want but is unimplemented. If you 
>decide to implement it then we will gladly accept your code :)
>
>Gareth
>
>
>
>
>On Thu, 27 Mar 2003, Magnus Strand wrote:
>
>  
>
>>Hi,
>>
>>Is there a method to return all text merged from all text subnodes from
>>a certain nod in a DOM tree?
>>
>>I mean if i have this as part of an xml:
>><header>It is <b>sunny</b>weather today</header>
>>
>>I have a DOMNode* to the header element and would like to return all the
>>text:
>>"It is sunny weather today"
>>
>>Regards,
>>Magnus Strand
>>    
>>
-- 
Andreï V. FOMITCHEV               [Quand faut-il arrêter l'informatique]
Software R&D Engineer    [Lorsque, dans un kilo, on trouve 1024 grammes]
Odixion, FRANCE




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Getting text off all subnodes

Posted by Gareth Reakes <ga...@decisionsoft.com>.
Hi,
	getTextContent does what you want but is unimplemented. If you 
decide to implement it then we will gladly accept your code :)

Gareth




On Thu, 27 Mar 2003, Magnus Strand wrote:

> Hi,
> 
> Is there a method to return all text merged from all text subnodes from
> a certain nod in a DOM tree?
> 
> I mean if i have this as part of an xml:
> <header>It is <b>sunny</b>weather today</header>
> 
> I have a DOMNode* to the header element and would like to return all the
> text:
> "It is sunny weather today"
> 
> Regards,
> Magnus Strand
> 

-- 
Gareth Reakes, Head of Product Development  +44-1865-203192
DecisionSoft Limited                        http://www.decisionsoft.com
XML Development and Services




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org