You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-users@xerces.apache.org by Igor PARTL <xe...@schweers-net.de> on 2001/11/17 12:50:11 UTC

whitespaceIgnoring (on/off)

Hi all!


In front of this, I have to say, that this ml is very interesting.

But now:
There's a feature for the DOMParser, which one allows me to read/write
the nodeValues including/without their whiteSpaces (Space, Return...).
But I have no idea about the functionality. (I know what I want -- and
hope, that the correct way -- but how?) Perhaps I have ignore some
facts by studing the samples...

Illustration:
<xml-node>
 Hey!
</xml-node>
Nodevalue with whiteSpace    : '\n Hey!\n'
NodeValue without whiteSpaces: 'Hey!'


Igor PARTL!

P.S.: The samples are -- specially for beginners -- very helpful!
      Congratulation to the authors.


-- 
Sometimes there's more wood in front of my eyes than in the Central
Park of N.Y.!


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

RE: whitespaceIgnoring (on/off)

Posted by Craig Collings <cr...@cabusiness.co.nz>.

The workaround for the whitespace is to trim() the resulting String;

	if (textnode.hasChildNodes()) {
		String textnodeContent = textnode.getFirstChild().getNodeValue();
		String cleanText = textnodeContent.trim();
	}
This removes the whitespaces.

-----Original Message-----
From: Igor PARTL [mailto:xerces-j-user-ml@schweers-net.de]
Sent: Wednesday, 21 November 2001 3:53 a.m.
To: Xerces-J-User
Subject: Re: whitespaceIgnoring (on/off)


Hi!


On Mon, Nov 19, 2001 at 01:45:01PM +1300, Craig Collings wrote:

Thanks a lot for your answer.

> But remember... Once you have a DOM from the parser, the
> nodevalue of the element is NOT 'Hey!' or '\n Hey \n'. In fact the
> direct nodevalue of any Element is null. (I assume we are using
> org.apache.dom)

Jepp, so it is.

> The text that we would think is the nodevalue of an element is held
> as the nodevalue of a Text node which is a child of the Element. So
> we have to get a NodeList from Element, iterate thru it till we find
> a (non-whitespace) Text node and get the nodeValue of textnode.

So I have done.

Node textnode = ....;
// <xml>
//  Hey!
// </xml>
if (textnode.hasChildNodes()) {
  // Ok, I know, that there is a node, but this is the correct and
  // safe way
  String textnodeContent = textnode.getFirstChild().getNodeValue();
  // The FirstChild is the #TextNode
}

And the result is: '\n Hey\n' - Grgggh!
I have no idea, what's going wrong...

Reading the API shows me a method 'ignorableWhitspace(int)' in
DOMParser; this seems to be a methos for ... But for what. I have noxi
idea.

> Which is all very annoying when all we want is the text from a tag.


Igor PARTL


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

RE: whitespaceIgnoring (on/off)

Posted by Craig Collings <cr...@cabusiness.co.nz>.

It looks like these methods are internal callbacks for document validation.
Perhaps they shouldn't really be public. I guess they have to negotiate
other packages. Having a look at the API docs for DOMParser and the
interface org.apache.xerces.framework.XMLDocumentHandler convinces me that I
don't want to go close to that code unless I really have to.

-----Original Message-----
From: Igor PARTL [mailto:xerces-j-user-ml@schweers-net.de]
Sent: Wednesday, 21 November 2001 3:53 a.m.
To: Xerces-J-User
Subject: Re: whitespaceIgnoring (on/off)

Hi!

On Mon, Nov 19, 2001 at 01:45:01PM +1300, Craig Collings wrote:

Thanks a lot for your answer.

> But remember... Once you have a DOM from the parser, the
> nodevalue of the element is NOT 'Hey!' or '\n Hey \n'. In fact the
> direct nodevalue of any Element is null. (I assume we are using
> org.apache.dom)

Jepp, so it is.

> The text that we would think is the nodevalue of an element is held
> as the nodevalue of a Text node which is a child of the Element. So
> we have to get a NodeList from Element, iterate thru it till we find
> a (non-whitespace) Text node and get the nodeValue of textnode.

So I have done.

Node textnode = ....;
// <xml>
//  Hey!
// </xml>
if (textnode.hasChildNodes()) {
  // Ok, I know, that there is a node, but this is the correct and
  // safe way
  String textnodeContent = textnode.getFirstChild().getNodeValue();
  // The FirstChild is the #TextNode
}

And the result is: '\n Hey\n' - Grgggh!
I have no idea, what's going wrong...

Reading the API shows me a method 'ignorableWhitspace(int)' in
DOMParser; this seems to be a methos for ... But for what. I have noxi
idea.

> Which is all very annoying when all we want is the text from a tag.

Igor PARTL

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

Re: whitespaceIgnoring (on/off)

Posted by Igor PARTL <xe...@schweers-net.de>.

Hi!

On Mon, Nov 19, 2001 at 01:45:01PM +1300, Craig Collings wrote:

Thanks a lot for your answer.

> But remember... Once you have a DOM from the parser, the
> nodevalue of the element is NOT 'Hey!' or '\n Hey \n'. In fact the
> direct nodevalue of any Element is null. (I assume we are using
> org.apache.dom)

Jepp, so it is.

> The text that we would think is the nodevalue of an element is held
> as the nodevalue of a Text node which is a child of the Element. So
> we have to get a NodeList from Element, iterate thru it till we find
> a (non-whitespace) Text node and get the nodeValue of textnode.

So I have done.

Node textnode = ....;
// <xml>
//  Hey!
// </xml>
if (textnode.hasChildNodes()) {
  // Ok, I know, that there is a node, but this is the correct and
  // safe way
  String textnodeContent = textnode.getFirstChild().getNodeValue();
  // The FirstChild is the #TextNode
}

And the result is: '\n Hey\n' - Grgggh!
I have no idea, what's going wrong...

Reading the API shows me a method 'ignorableWhitspace(int)' in
DOMParser; this seems to be a methos for ... But for what. I have noxi
idea.

> Which is all very annoying when all we want is the text from a tag.

Igor PARTL

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

RE: whitespaceIgnoring (on/off)

Posted by Craig Collings <cr...@cabusiness.co.nz>.

Hey! very good. But remember... Once you have a DOM from the parser, the
nodevalue of the element is NOT 'Hey!' or '\n Hey \n'. In fact the direct
nodevalue of any Element is null. (I assume we are using org.apache.dom)
The text that we would think is the nodevalue of an element is held as the
nodevalue of a Text node which is a child of the Element. So we have to get
a NodeList from Element, iterate thru it till we find a (non-whitespace)
Text node and get the nodeValue of textnode.
Which is all very annoying when all we want is the text from a tag.

-----Original Message-----
From: Igor PARTL [mailto:xerces-j-user-ml@schweers-net.de]
Sent: Sunday, 18 November 2001 12:50 a.m.
To: Xerces-J-User
Subject: whitespaceIgnoring (on/off)


Hi all!


In front of this, I have to say, that this ml is very interesting.

But now:
There's a feature for the DOMParser, which one allows me to read/write
the nodeValues including/without their whiteSpaces (Space, Return...).
But I have no idea about the functionality. (I know what I want -- and
hope, that the correct way -- but how?) Perhaps I have ignore some
facts by studing the samples...

Illustration:
<xml-node>
 Hey!
</xml-node>
Nodevalue with whiteSpace    : '\n Hey!\n'
NodeValue without whiteSpaces: 'Hey!'


Igor PARTL!

P.S.: The samples are -- specially for beginners -- very helpful!
      Congratulation to the authors.


--
Sometimes there's more wood in front of my eyes than in the Central
Park of N.Y.!


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org