You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Klaus Malorny <Kl...@knipp.de> on 2005/07/12 10:54:58 UTC
"normalized-value" feature of Xerces
Hi,
If this is a FAQ, please excuse and point me to the right location. I did my
homework, but did not find any suitable information.
I have the following problem: I would like to parse and validate an XML document
and access this XML document via DOM afterwards. As the related schema makes
extensive use of the "normalizedString" and "token" datatypes (i.e. those with
the whiteSpace facet with "replace" and "collapse" values), I would like to
access the whitespace normalized values rather than the actual values contained
in the original XML document to avoid a manual normalization at every location
in my code.
I saw that Xerces (I tried the latest version 2.7.0) supports a feature called
"http://.../normalized-value". However, I see no difference when I set this to
"true". The DOM nodes still contain the unnormalized values. I also set the
other required features as documented. I verified that validation is actually
performed, i.e. parsing an invalid document does result in an exception.
I use the following way to create the parser (javax.xml.* classes under JDK 1.5):
- - - 8< - - -
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance ();
dbf.setNamespaceAware (true);
dbf.setSchema (profile.getSchema ());
dbf.setValidating (true);
dbf.setFeature ("http://apache.org/xml/features/validation/dynamic", true);
dbf.setFeature ("http://xml.org/sax/features/validation", true);
dbf.setFeature ("http://apache.org/xml/features/validation/schema", true);
dbf.setFeature
("http://apache.org/xml/features/validation/schema-full-checking", true);
dbf.setFeature
("http://apache.org/xml/features/validation/schema/normalized-value", true);
dbf.setFeature
("http://apache.org/xml/features/validation/schema/element-default", true);
DocumentBuilder db = dbf.newDocumentBuilder ();
SaxErrorHandler eh = new SaxErrorHandler (ctx);
db.setErrorHandler (eh);
ByteArrayInputStream bis = new ByteArrayInputStream (data);
return db.parse (bis);
- - - 8< - - -
Any ideas, comments on what I am doing wrong? Or do I misunderstand this feature?
Thanks in advance for any feedback.
regards,
Klaus
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: "normalized-value" feature of Xerces
Posted by Klaus Malorny <Kl...@knipp.de>.
Michael Glavassevich wrote:
> Hi Klaus,
>
> There was a bug in the factory finder code which locates the
> SchemaFactory. It has been fixed in the Apache version of the JAXP 1.3
> APIs, but it seems like the problem remains in the J2SE 5.0 version. You
> can find the details of the bug here [1][2]. One way to get around this
> would be to use the Endorsed Standards Override Mechanism [3] (this FAQ
> applies to J2SE 5.0 as well) with the xml-apis.jar provided with Xerces-J.
>
> Thanks.
>
> [1] http://marc.theaimsgroup.com/?l=xml-commons-dev&m=111898640501739&w=2
> [2] http://marc.theaimsgroup.com/?l=xml-commons-dev&m=111929242603968&w=2
> [3] http://xml.apache.org/xerces2-j/faq-general.html#faq-4
>
Hi Michael,
thanks a lot, I will find a way to circumvent this problem until Sun solves it.
regards,
Klaus
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: "normalized-value" feature of Xerces
Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Klaus,
There was a bug in the factory finder code which locates the
SchemaFactory. It has been fixed in the Apache version of the JAXP 1.3
APIs, but it seems like the problem remains in the J2SE 5.0 version. You
can find the details of the bug here [1][2]. One way to get around this
would be to use the Endorsed Standards Override Mechanism [3] (this FAQ
applies to J2SE 5.0 as well) with the xml-apis.jar provided with Xerces-J.
Thanks.
[1] http://marc.theaimsgroup.com/?l=xml-commons-dev&m=111898640501739&w=2
[2] http://marc.theaimsgroup.com/?l=xml-commons-dev&m=111929242603968&w=2
[3] http://xml.apache.org/xerces2-j/faq-general.html#faq-4
Klaus Malorny <Kl...@knipp.de> wrote on 07/14/2005 04:48:37 AM:
> Klaus Malorny wrote:
> >
> >
> > Hi,
> >
> > If this is a FAQ, please excuse and point me to the right location. I
> > did my homework, but did not find any suitable information.
> >
> > I have the following problem: I would like to parse and validate an
XML
> > document and access this XML document via DOM afterwards. As the
related
> > schema makes extensive use of the "normalizedString" and "token"
> > datatypes (i.e. those with the whiteSpace facet with "replace" and
> > "collapse" values), I would like to access the whitespace normalized
> > values rather than the actual values contained in the original XML
> > document to avoid a manual normalization at every location in my code.
> >
> > I saw that Xerces (I tried the latest version 2.7.0) supports a
feature
> > called "http://.../normalized-value". However, I see no difference
when
> > I set this to "true". The DOM nodes still contain the unnormalized
> > values. I also set the other required features as documented. I
verified
> > that validation is actually performed, i.e. parsing an invalid
document
> > does result in an exception.
> >
> > I use the following way to create the parser (javax.xml.* classes
under
> > JDK 1.5):
> >
> > [...]
> >
> > Any ideas, comments on what I am doing wrong? Or do I misunderstand
this
> > feature?
> >
> > Thanks in advance for any feedback.
> >
> > regards,
> >
> > Klaus
> >
> >
>
> Hi,
>
> by creating a debug version of Xerces and debugging my code along
> with Xerces, I
> accidentially discovered the source of my problem: To create a
> "Schema" object,
> I used the following code (using the javax.xml.validation package):
>
> SchemaFactory factory =
> SchemaFactory.newInstance (XMLConstants.W3C_XML_SCHEMA_NS_URI);
>
> Source[] sources = ...
>
> Schema schema = factory.newSchema (sources);
>
>
> Unfortunately, this does not create a Schema instance that uses Xerces
2.7.0
> code, instead, it creates a Schema instance of the Xerces that comeswith
J2SE
> 5, which is obviously incapable of the desired normalization feature.
Xerces
> 2.7.0 seems to detect that this class is not his own class and
> inserts the J2SE
> validator into its pipeline (with XNI <-> SAX adapters).
>
> If I create the factory directly, i.e.
>
> SchemaFactory factory =
> new org.apache.xerces.jaxp.validation.XMLSchemaFactory ();
>
> everything works as expected. My big question now is why do I not
> get a suitable
> factory from Xerces 2.7.0, while the similar JAXP parser factory is
actually
> from 2.7.0? Is this a bug? How do I manage to get the 2.7.0
> implementation with
> SchemaFactory.newInstance ()?
>
> Thanks in advance for any hints.
>
> regards,
>
> Klaus
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
>
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: "normalized-value" feature of Xerces
Posted by Klaus Malorny <Kl...@knipp.de>.
Klaus Malorny wrote:
>
>
> Hi,
>
> If this is a FAQ, please excuse and point me to the right location. I
> did my homework, but did not find any suitable information.
>
> I have the following problem: I would like to parse and validate an XML
> document and access this XML document via DOM afterwards. As the related
> schema makes extensive use of the "normalizedString" and "token"
> datatypes (i.e. those with the whiteSpace facet with "replace" and
> "collapse" values), I would like to access the whitespace normalized
> values rather than the actual values contained in the original XML
> document to avoid a manual normalization at every location in my code.
>
> I saw that Xerces (I tried the latest version 2.7.0) supports a feature
> called "http://.../normalized-value". However, I see no difference when
> I set this to "true". The DOM nodes still contain the unnormalized
> values. I also set the other required features as documented. I verified
> that validation is actually performed, i.e. parsing an invalid document
> does result in an exception.
>
> I use the following way to create the parser (javax.xml.* classes under
> JDK 1.5):
>
> [...]
>
> Any ideas, comments on what I am doing wrong? Or do I misunderstand this
> feature?
>
> Thanks in advance for any feedback.
>
> regards,
>
> Klaus
>
>
Hi,
by creating a debug version of Xerces and debugging my code along with Xerces, I
accidentially discovered the source of my problem: To create a "Schema" object,
I used the following code (using the javax.xml.validation package):
SchemaFactory factory =
SchemaFactory.newInstance (XMLConstants.W3C_XML_SCHEMA_NS_URI);
Source[] sources = ...
Schema schema = factory.newSchema (sources);
Unfortunately, this does not create a Schema instance that uses Xerces 2.7.0
code, instead, it creates a Schema instance of the Xerces that comes with J2SE
5, which is obviously incapable of the desired normalization feature. Xerces
2.7.0 seems to detect that this class is not his own class and inserts the J2SE
validator into its pipeline (with XNI <-> SAX adapters).
If I create the factory directly, i.e.
SchemaFactory factory =
new org.apache.xerces.jaxp.validation.XMLSchemaFactory ();
everything works as expected. My big question now is why do I not get a suitable
factory from Xerces 2.7.0, while the similar JAXP parser factory is actually
from 2.7.0? Is this a bug? How do I manage to get the 2.7.0 implementation with
SchemaFactory.newInstance ()?
Thanks in advance for any hints.
regards,
Klaus
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org