You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xalan.apache.org by "Bharadwaj, Anand (ELS-OXF)" <A....@elsevier.com> on 2007/07/10 18:01:23 UTC

Ignoring the whitespace in text nodes

Hi All,

 

We are having an issue with XSLT parser sample (SimpleXPathAPI.exe).

 

When the XPath query (count(/*[1]/*[7]/*[5]/*[1]/*[3]/*[2]/node()))is
made on the document containing the XML node snippet:

"<para id="para13">

 

 
<italic>Expand one&apos;s cultural knowledge</italic>

 

and institutionalize it so that it can be accessed and incorporated into
the delivery of services. We must attempt to seek out sociocultural
information about the individual patient that will then help us have a
better feel for how to perform an interview or history-what to ask, how
to ask-and how to modify treatment interventions appropriately on the
basis of a <anchor id="p27"/>person&apos;s cultural reality. It is
impossible, and unnecessary, to learn all there is to know about all
cultural subgroups, but clinicians must be aware of the ethnographic
information related to the local community and relevant beliefs and
behaviors of their patients and the patients&apos; families.</para>"

 

The parser is considering the leading 'whitespace' characters as one
node and the total node count as 5 which are,

1.    whitespace before <italic>, 

2.    <italic>

3.    text between <italic> and <anchor>

4.    <anchor>

5.    text after <anchor> till end.

 

As per our requirement the count should be 4 (excluding the (1) above).

 

We request to kindly let us know how to ignore these 'whitespace' nodes
which are 

actually children of 'Text node' types, through Xpath queries since we
do not have the freedom to reload the document in any other
configuration.

 

Regards,

Anand


Re: Ignoring the whitespace in text nodes

Posted by David Bertoni <db...@apache.org>.
Bharadwaj, Anand (ELS-OXF) wrote:
> Hi,
> 
> Thanks for the update.
> 
> When we run the below Xpath query on an another application (XMLSpy),
> it ignores the first node (containing white spaces) 
> and returns the count as 4.
The Microsoft XSLT processor strips all whitespace text nodes from the 
source tree before transforming it.  Perhaps XMLSpy is relying on that 
processor.

> 
> We would like to request if it is safe to assume that the mentioned
> application handles this scenario independently and not as part of
> implementation of neither of XML nor XPath specifications.
You will have to ask Altova that question, not me.

Dave

RE: Ignoring the whitespace in text nodes

Posted by "Bharadwaj, Anand (ELS-OXF)" <A....@elsevier.com>.
Hi,

Thanks for the update.

When we run the below Xpath query on an another application (XMLSpy),
it ignores the first node (containing white spaces) 
and returns the count as 4.

We would like to request if it is safe to assume that the mentioned
application handles this scenario independently and not as part of
implementation of neither of XML nor XPath specifications.

Thanks and Regards,
Anand

-----Original Message-----
From: David Bertoni [mailto:dbertoni@apache.org] 
Sent: 10 July 2007 18:01
To: xalan-c-users@xml.apache.org
Subject: Re: Ignoring the whitespace in text nodes

Bharadwaj, Anand (ELS-OXF) wrote:
> Hi All,
> 
> We are having an issue with XSLT parser sample (SimpleXPathAPI.exe).

It's an XPath _processor_, not an XPath _parser_.

> 
> When the XPath query (count(/*[1]/*[7]/*[5]/*[1]/*[3]/*[2]/node()))is 
> made on the document containing the XML node snippet:
> 
> "<para id="para13">
>                                                                
> <italic>Expand one&apos;s cultural knowledge</italic>
>

...

> 
> The parser is considering the leading 'whitespace' characters as one 
> node and the total node count as 5 which are,
> 
> 1.    *whitespace before <italic>*,
> 
> 2.    <italic>
> 
> 3.    text between <italic> and <anchor>
> 
> 4.    <anchor>
> 
> 5.    text after <anchor> till end.
This is correct, since that's what the XML recommendation, the XPath
data 
model, and XML Information Set recommendation require.

> 
> As per our requirement the count should be 4 (excluding the (1)
above).
> 
> We request to kindly let us know how to ignore these 'whitespace'
nodes 
> which are
> 
> actually children of 'Text node' types, through Xpath queries since we

> do not have the freedom to reload the document in any other
configuration.
The only thing you can do is to build the document and strip the 
whitespace-only text nodes before they get to the XPath processor.  You 
would need to write your own code to do that, or you could write a 
stylesheet to remove them, transform the original document using the
XSLT 
processor, then use the transformed document with SimpleXPathAPI.exe.

Dave


Re: Ignoring the whitespace in text nodes

Posted by David Bertoni <db...@apache.org>.
Bharadwaj, Anand (ELS-OXF) wrote:
> Hi All,
> 
> We are having an issue with XSLT parser sample (SimpleXPathAPI.exe).

It's an XPath _processor_, not an XPath _parser_.

> 
> When the XPath query (count(/*[1]/*[7]/*[5]/*[1]/*[3]/*[2]/node()))is 
> made on the document containing the XML node snippet:
> 
> "<para id="para13">
>                                                                
> <italic>Expand one&apos;s cultural knowledge</italic>
>

...

> 
> The parser is considering the leading 'whitespace' characters as one 
> node and the total node count as 5 which are,
> 
> 1.    *whitespace before <italic>*,
> 
> 2.    <italic>
> 
> 3.    text between <italic> and <anchor>
> 
> 4.    <anchor>
> 
> 5.    text after <anchor> till end.
This is correct, since that's what the XML recommendation, the XPath data 
model, and XML Information Set recommendation require.

> 
> As per our requirement the count should be 4 (excluding the (1) above).
> 
> We request to kindly let us know how to ignore these 'whitespace' nodes 
> which are
> 
> actually children of 'Text node' types, through Xpath queries since we 
> do not have the freedom to reload the document in any other configuration.
The only thing you can do is to build the document and strip the 
whitespace-only text nodes before they get to the XPath processor.  You 
would need to write your own code to do that, or you could write a 
stylesheet to remove them, transform the original document using the XSLT 
processor, then use the transformed document with SimpleXPathAPI.exe.

Dave


Re: Ignoring the whitespace in text nodes

Posted by da...@us.ibm.com.
First, look at the whitespace stripping options in section 3.4 of the XSLT 
spec. If that doesn't get what you want, you can apply a predicate to 
deselect some nodes when you count them. Perhaps
count(blah/node()[normalize-space()!=' '])
will obtain what you want.
.................David Marston