You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Albretch Mueller <lb...@gmail.com> on 2011/07/12 00:28:03 UTC

dismissing characters such as carriage returns and spaces after an ending and before an starting tag ...

~
 I am XMLRead[er|ing] an XML file (which I am validating using the
specified schema) that looks like this:
~
<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.5/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.5/
http://www.mediawiki.org/xml/export-0.5.xsd" version="0.5"
xml:lang="en">
  <siteinfo>
    <sitename>Wikipedia</sitename>
    <base>http://en.wikipedia.org/wiki/Main_Page</base>
    <generator>MediaWiki 1.17wmf1</generator>
    <case>first-letter</case>
    <namespaces>
      <namespace key="-2" case="first-letter">Media</namespace>
      <namespace key="109" case="first-letter">Book talk</namespace>
    </namespaces>
  </siteinfo>
</mediawiki>
~
 What do you do in order for the ContentHandler not to report as
"characters" such character sequences after an ending and before an
starting tag?
~
 Than you
 lbrtchx

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: dismissing characters such as carriage returns and spaces after an ending and before an starting tag ...

Posted by ke...@us.ibm.com.
Interesting, Mike; didn't know that. Makes a certain amount of sense, 
since it's based on the definition of the containing element rather than 
what it actually contains.

(I've rarely counted on it; I get too many documents thrown at me without 
DTDs, or am processing in a context where I want to preserve the 
whitespace, so I've tended to code this into the application semantics 
instead. Which is probably why I didn't rememberi that simply specifying 
the DTD was sufficient.)


______________________________________
"You build world of steel and stone
I build worlds of words alone
Skilled tradespeople, long years taught:
You shape matter; I shape thought."
(http://www.songworm.com/lyrics/songworm-parody/ShapesofShadow.html)



From:
Michael Glavassevich <mr...@ca.ibm.com>
To:
j-users@xerces.apache.org
Date:
07/11/2011 11:22 PM
Subject:
Re: dismissing characters such as carriage returns and spaces after an 
ending and before an starting tag ...



The document would need to have a DTD, but you don't need to be 
validating. Among other things, "ignorable whitespace" is always assessed 
when the document has a DTD which has been read, regardless of whether 
you've enabled validation or not.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

keshlam@us.ibm.com wrote on 07/11/2011 10:52:32 PM:

> If you are validating against a DTD, and IF the enclosing element 
> does not have mixed content, look at the SAX/DOM defiinitions of 
> "ignorable whitespace" and how to handle it. (The term is 
> unfortunately; it's better described as "whitespace in element-only 
content")
> 
> If you are not validating the document, the parser can not make this
> distinction and you must do so in your application code. 
> 
> 
> ______________________________________
> "You build world of steel and stone 
> I build worlds of words alone 
> Skilled tradespeople, long years taught: 
> You shape matter; I shape thought." 
> (http://www.songworm.com/lyrics/songworm-parody/ShapesofShadow.html) 
> 

> 
> From: 
> 
> Albretch Mueller <lb...@gmail.com> 
> 
> To: 
> 
> j-users@xerces.apache.org 
> 
> Date: 
> 
> 07/11/2011 06:13 PM 
> 
> Subject: 
> 
> dismissing characters such as carriage returns and spaces after an 
> ending and before an starting tag ...
> 
> 
> 
> 
> 
> ~
> I am XMLRead[er|ing] an XML file (which I am validating using the
> specified schema) that looks like this:
> ~
> <mediawiki xmlns="http://www.mediawiki.org/xml/export-0.5/"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.5/
> http://www.mediawiki.org/xml/export-0.5.xsd" version="0.5"
> xml:lang="en">
>  <siteinfo>
>    <sitename>Wikipedia</sitename>
>    <base>http://en.wikipedia.org/wiki/Main_Page</base>
>    <generator>MediaWiki 1.17wmf1</generator>
>    <case>first-letter</case>
>    <namespaces>
>      <namespace key="-2" case="first-letter">Media</namespace>
>      <namespace key="109" case="first-letter">Book talk</namespace>
>    </namespaces>
>  </siteinfo>
> </mediawiki>
> ~
> What do you do in order for the ContentHandler not to report as
> "characters" such character sequences after an ending and before an
> starting tag?
> ~
> Than you
> lbrtchx
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org

Re: dismissing characters such as carriage returns and spaces after an ending and before an starting tag ...

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
The document would need to have a DTD, but you don't need to be validating.
Among other things, "ignorable whitespace" is always assessed when the
document has a DTD which has been read, regardless of whether you've
enabled validation or not.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

keshlam@us.ibm.com wrote on 07/11/2011 10:52:32 PM:

> If you are validating against a DTD, and IF the enclosing element
> does not have mixed content, look at the SAX/DOM defiinitions of
> "ignorable whitespace" and how to handle it. (The term is
> unfortunately; it's better described as "whitespace in element-only
content")
>
> If you are not validating the document, the parser can not make this
> distinction and you must do so in your application code.
>
>
> ______________________________________
> "You build world of steel and stone
> I build worlds of words alone
> Skilled tradespeople, long years taught:
> You shape matter; I shape thought."
> (http://www.songworm.com/lyrics/songworm-parody/ShapesofShadow.html)
>

>
> From:
>
> Albretch Mueller <lb...@gmail.com>
>
> To:
>
> j-users@xerces.apache.org
>
> Date:
>
> 07/11/2011 06:13 PM
>
> Subject:
>
> dismissing characters such as carriage returns and spaces after an
> ending and before an starting tag ...
>
>
>
>
>
> ~
> I am XMLRead[er|ing] an XML file (which I am validating using the
> specified schema) that looks like this:
> ~
> <mediawiki xmlns="http://www.mediawiki.org/xml/export-0.5/"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.5/
> http://www.mediawiki.org/xml/export-0.5.xsd" version="0.5"
> xml:lang="en">
>  <siteinfo>
>    <sitename>Wikipedia</sitename>
>    <base>http://en.wikipedia.org/wiki/Main_Page</base>
>    <generator>MediaWiki 1.17wmf1</generator>
>    <case>first-letter</case>
>    <namespaces>
>      <namespace key="-2" case="first-letter">Media</namespace>
>      <namespace key="109" case="first-letter">Book talk</namespace>
>    </namespaces>
>  </siteinfo>
> </mediawiki>
> ~
> What do you do in order for the ContentHandler not to report as
> "characters" such character sequences after an ending and before an
> starting tag?
> ~
> Than you
> lbrtchx
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org

Re: dismissing characters such as carriage returns and spaces after an ending and before an starting tag ...

Posted by ke...@us.ibm.com.
If you are validating against a DTD, and IF the enclosing element does not 
have mixed content, look at the SAX/DOM defiinitions of "ignorable 
whitespace" and how to handle it. (The term is unfortunately; it's better 
described as "whitespace in element-only content")

If you are not validating the document, the parser can not make this 
distinction and you must do so in your application code.


______________________________________
"You build world of steel and stone
I build worlds of words alone
Skilled tradespeople, long years taught:
You shape matter; I shape thought."
(http://www.songworm.com/lyrics/songworm-parody/ShapesofShadow.html)



From:
Albretch Mueller <lb...@gmail.com>
To:
j-users@xerces.apache.org
Date:
07/11/2011 06:13 PM
Subject:
dismissing characters such as carriage returns and spaces after an ending 
and before an starting tag ...



~
 I am XMLRead[er|ing] an XML file (which I am validating using the
specified schema) that looks like this:
~
<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.5/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.5/
http://www.mediawiki.org/xml/export-0.5.xsd" version="0.5"
xml:lang="en">
  <siteinfo>
    <sitename>Wikipedia</sitename>
    <base>http://en.wikipedia.org/wiki/Main_Page</base>
    <generator>MediaWiki 1.17wmf1</generator>
    <case>first-letter</case>
    <namespaces>
      <namespace key="-2" case="first-letter">Media</namespace>
      <namespace key="109" case="first-letter">Book talk</namespace>
    </namespaces>
  </siteinfo>
</mediawiki>
~
 What do you do in order for the ContentHandler not to report as
"characters" such character sequences after an ending and before an
starting tag?
~
 Than you
 lbrtchx

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org