You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Albretch Mueller <lb...@gmail.com> on 2011/07/12 00:28:03 UTC
dismissing characters such as carriage returns and spaces after an
ending and before an starting tag ...
~
I am XMLRead[er|ing] an XML file (which I am validating using the
specified schema) that looks like this:
~
<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.5/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.5/
http://www.mediawiki.org/xml/export-0.5.xsd" version="0.5"
xml:lang="en">
<siteinfo>
<sitename>Wikipedia</sitename>
<base>http://en.wikipedia.org/wiki/Main_Page</base>
<generator>MediaWiki 1.17wmf1</generator>
<case>first-letter</case>
<namespaces>
<namespace key="-2" case="first-letter">Media</namespace>
<namespace key="109" case="first-letter">Book talk</namespace>
</namespaces>
</siteinfo>
</mediawiki>
~
What do you do in order for the ContentHandler not to report as
"characters" such character sequences after an ending and before an
starting tag?
~
Than you
lbrtchx
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: dismissing characters such as carriage returns and spaces after an
ending and before an starting tag ...
Posted by ke...@us.ibm.com.
Interesting, Mike; didn't know that. Makes a certain amount of sense,
since it's based on the definition of the containing element rather than
what it actually contains.
(I've rarely counted on it; I get too many documents thrown at me without
DTDs, or am processing in a context where I want to preserve the
whitespace, so I've tended to code this into the application semantics
instead. Which is probably why I didn't rememberi that simply specifying
the DTD was sufficient.)
______________________________________
"You build world of steel and stone
I build worlds of words alone
Skilled tradespeople, long years taught:
You shape matter; I shape thought."
(http://www.songworm.com/lyrics/songworm-parody/ShapesofShadow.html)
From:
Michael Glavassevich <mr...@ca.ibm.com>
To:
j-users@xerces.apache.org
Date:
07/11/2011 11:22 PM
Subject:
Re: dismissing characters such as carriage returns and spaces after an
ending and before an starting tag ...
The document would need to have a DTD, but you don't need to be
validating. Among other things, "ignorable whitespace" is always assessed
when the document has a DTD which has been read, regardless of whether
you've enabled validation or not.
Thanks.
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org
keshlam@us.ibm.com wrote on 07/11/2011 10:52:32 PM:
> If you are validating against a DTD, and IF the enclosing element
> does not have mixed content, look at the SAX/DOM defiinitions of
> "ignorable whitespace" and how to handle it. (The term is
> unfortunately; it's better described as "whitespace in element-only
content")
>
> If you are not validating the document, the parser can not make this
> distinction and you must do so in your application code.
>
>
> ______________________________________
> "You build world of steel and stone
> I build worlds of words alone
> Skilled tradespeople, long years taught:
> You shape matter; I shape thought."
> (http://www.songworm.com/lyrics/songworm-parody/ShapesofShadow.html)
>
>
> From:
>
> Albretch Mueller <lb...@gmail.com>
>
> To:
>
> j-users@xerces.apache.org
>
> Date:
>
> 07/11/2011 06:13 PM
>
> Subject:
>
> dismissing characters such as carriage returns and spaces after an
> ending and before an starting tag ...
>
>
>
>
>
> ~
> I am XMLRead[er|ing] an XML file (which I am validating using the
> specified schema) that looks like this:
> ~
> <mediawiki xmlns="http://www.mediawiki.org/xml/export-0.5/"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.5/
> http://www.mediawiki.org/xml/export-0.5.xsd" version="0.5"
> xml:lang="en">
> <siteinfo>
> <sitename>Wikipedia</sitename>
> <base>http://en.wikipedia.org/wiki/Main_Page</base>
> <generator>MediaWiki 1.17wmf1</generator>
> <case>first-letter</case>
> <namespaces>
> <namespace key="-2" case="first-letter">Media</namespace>
> <namespace key="109" case="first-letter">Book talk</namespace>
> </namespaces>
> </siteinfo>
> </mediawiki>
> ~
> What do you do in order for the ContentHandler not to report as
> "characters" such character sequences after an ending and before an
> starting tag?
> ~
> Than you
> lbrtchx
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
Re: dismissing characters such as carriage returns and spaces after an
ending and before an starting tag ...
Posted by Michael Glavassevich <mr...@ca.ibm.com>.
The document would need to have a DTD, but you don't need to be validating.
Among other things, "ignorable whitespace" is always assessed when the
document has a DTD which has been read, regardless of whether you've
enabled validation or not.
Thanks.
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org
keshlam@us.ibm.com wrote on 07/11/2011 10:52:32 PM:
> If you are validating against a DTD, and IF the enclosing element
> does not have mixed content, look at the SAX/DOM defiinitions of
> "ignorable whitespace" and how to handle it. (The term is
> unfortunately; it's better described as "whitespace in element-only
content")
>
> If you are not validating the document, the parser can not make this
> distinction and you must do so in your application code.
>
>
> ______________________________________
> "You build world of steel and stone
> I build worlds of words alone
> Skilled tradespeople, long years taught:
> You shape matter; I shape thought."
> (http://www.songworm.com/lyrics/songworm-parody/ShapesofShadow.html)
>
>
> From:
>
> Albretch Mueller <lb...@gmail.com>
>
> To:
>
> j-users@xerces.apache.org
>
> Date:
>
> 07/11/2011 06:13 PM
>
> Subject:
>
> dismissing characters such as carriage returns and spaces after an
> ending and before an starting tag ...
>
>
>
>
>
> ~
> I am XMLRead[er|ing] an XML file (which I am validating using the
> specified schema) that looks like this:
> ~
> <mediawiki xmlns="http://www.mediawiki.org/xml/export-0.5/"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.5/
> http://www.mediawiki.org/xml/export-0.5.xsd" version="0.5"
> xml:lang="en">
> <siteinfo>
> <sitename>Wikipedia</sitename>
> <base>http://en.wikipedia.org/wiki/Main_Page</base>
> <generator>MediaWiki 1.17wmf1</generator>
> <case>first-letter</case>
> <namespaces>
> <namespace key="-2" case="first-letter">Media</namespace>
> <namespace key="109" case="first-letter">Book talk</namespace>
> </namespaces>
> </siteinfo>
> </mediawiki>
> ~
> What do you do in order for the ContentHandler not to report as
> "characters" such character sequences after an ending and before an
> starting tag?
> ~
> Than you
> lbrtchx
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
Re: dismissing characters such as carriage returns and spaces after an
ending and before an starting tag ...
Posted by ke...@us.ibm.com.
If you are validating against a DTD, and IF the enclosing element does not
have mixed content, look at the SAX/DOM defiinitions of "ignorable
whitespace" and how to handle it. (The term is unfortunately; it's better
described as "whitespace in element-only content")
If you are not validating the document, the parser can not make this
distinction and you must do so in your application code.
______________________________________
"You build world of steel and stone
I build worlds of words alone
Skilled tradespeople, long years taught:
You shape matter; I shape thought."
(http://www.songworm.com/lyrics/songworm-parody/ShapesofShadow.html)
From:
Albretch Mueller <lb...@gmail.com>
To:
j-users@xerces.apache.org
Date:
07/11/2011 06:13 PM
Subject:
dismissing characters such as carriage returns and spaces after an ending
and before an starting tag ...
~
I am XMLRead[er|ing] an XML file (which I am validating using the
specified schema) that looks like this:
~
<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.5/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.5/
http://www.mediawiki.org/xml/export-0.5.xsd" version="0.5"
xml:lang="en">
<siteinfo>
<sitename>Wikipedia</sitename>
<base>http://en.wikipedia.org/wiki/Main_Page</base>
<generator>MediaWiki 1.17wmf1</generator>
<case>first-letter</case>
<namespaces>
<namespace key="-2" case="first-letter">Media</namespace>
<namespace key="109" case="first-letter">Book talk</namespace>
</namespaces>
</siteinfo>
</mediawiki>
~
What do you do in order for the ContentHandler not to report as
"characters" such character sequences after an ending and before an
starting tag?
~
Than you
lbrtchx
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org