You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Elena Litani <el...@ca.ibm.com> on 2002/06/04 21:10:06 UTC

DOM tree (re)validation is implemented!

The latest Xerces code supports revalidation of a DOM tree against XML
Schema grammars.
The code could be downloaded at:
http://gump.covalent.net/jars/latest/xml-xerces2/

The support is available via DOM L3 implementation of
normalizeDocument() [1].

Comments/feedback/bug-reports are VERY welcome. :)


IMPORTANT!! 
-----------
The code is experimental: the methods/classes could be removed,
modified, or renamed. 

In particular, in the latest code the DOM L3 functionality is accessible
via org.apache.xerces.dom.DocumentImpl. 
However, we want to try to reorganize Xerces DOM implementation code to
separate DOM L2 from DOM L3 implementations. 
Thus, in the future DOM L3 functionality could be moved to another
class.

Parsing documents
-----------------
Use DOMBuilder parser instead of DOMParser if you plan to revalidate a
document.
You can create the DOMBuilder as follows:
(a) new org.apache.xerces.parser.DOMBuilderImpl();
(b) call org.apache.xerces.dom.DOMImplementationImpl.createDOMBuilder()


How to tell DOM implementation to revalidate the tree
------------------------------------------------------
DOM L3 provides setNormalizationFeature()[2] on the Document interface
that allows users to specify what functions normalizeDocument() should
perform. 
(1) Cast Document to DocumentImpl.
(2) Call document.setNormalizationFeature("validate", true).
(3) To start (re)validation call document.normalizeDocument().

How to specify grammar for a document
-------------------------------------
(1) The documentElement must have xsi:schemaLocation or
xsi:noSchemaLocation attributes that specify schema location(s).
(2) The documentURI [3] must be set. The location of the schema
documents will be resolved relative to documentURI.

How to register error handler
------------------------------
Use DOM L3 setErrorHandler() [4] method to attach error handler to the
Document. 
You need to cast to DocumentImpl to be able to call this method.

Limitations
-----------
- Revalidation of the DOM tree against DTD grammar is not supported.
- EntityRefence, CDATASection content will not be validated. 
- Schema normalized values won't be exposed via the tree after DOM
revalidation.
  The element default value (the one that is added by XML Schema
validator) won't be exposed via DOM tree.
- Attribute value normalization - the code does not normalize attribute
values per XML 1.0 (type CDATA). 
That means that XML Schema validator may not be normalizing attribute
values correctly.

Thanks,

[1]
http://www.w3.org/TR/2002/WD-DOM-Level-3-Core-20020409/core.html#Document3-normalizeDocument
[2] 
http://www.w3.org/TR/2002/WD-DOM-Level-3-Core-20020409/core.html#Document3-setNormalizationFeature
[3]
http://www.w3.org/TR/2002/WD-DOM-Level-3-Core-20020409/core.html#Document3-documentURI

-- 
Elena Litani / IBM Toronto

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


RE: DOM with no indentation to DOM with indentation

Posted by Aleksandar Milanovic <am...@galdosinc.com>.
How about you parse the document to a DOM structure and then serialize it
with indentation? For serialization, use the
org.apache.xml.serialize.OutputFormat and
org.apache.xml.serialize.XMLSerializer classes from the Xerces package.
In OutputFormat, use setIndenting() and setIndent() methods to set the
indentation to the desired level.

Alex

-----Original Message-----
From: Pae Choi [mailto:paechoi@earthlink.net]
Sent: October 20, 2002 3:44 PM
To: xerces-j-user@xml.apache.org; xerces-j-dev@xml.apache.org
Subject: Re: DOM with no indentation to DOM with indentation


Oops, I did not consider the line wrapping. And the initial DOM sample
in the previous message contains typos. The correct DOM sample should
be as follows:

<Parent><Child><Name>John
Doe</Name><Gender>Male</Gender></Child>><Child><Name>Jane
Doe</Name><Gender>Feale</Gender></Child></Parent>

*** Note: If above sample does not show in a single line, it's probably
cause by line wrapping. Then please consider that as a single line.
Thanks again.

Regards,


Pae




> Say we have generated a DOM as follows:
>
> <Parent><Child><Name>John
> Doe</Name><Gender>Male</Gender></Child>><Child><Name>John
> Doe</Name><Gender>Male</Gender></Child></Parent>
>
> As you can there is Node.TEXT_NODE for indentation such as "\n", "\t", or
so
> other than values for <Name> and <Gender> elements.
>
> My question is that if there is any utility available to convert the
> above DOM to "the following DOM with indentation" as follows:
>
> <Parent>
>     <Child>
>         <Name>John Doe</Name>
>         <Gender>Male</Gender>
>     </Child>
>     <Child>
>         <Name>Jane Doe</Name>
>         <Gender>Female</Gender>
>     </Child>
> </Parent>
>
> If there is a utility class something like:
>
> import    org.w3c.dom.Document;
>
> public class XMLUtil {
>
>     /** Convert the non-indented DOM to an indented DOM */
>     public Document covnertNonIndentedDOM2IndentedDOM(Document inboundXML)
{
>         Document outboundXML = null;
>
>         // snip
>
>         return outboundXML;
>     }
> }
>
> will be nice to know. So it can be done by:
>
> XMLUtil xmlUtil = new XMLUtil();
> Document respXML = xmlUtil.covnertNonIndentedDOM2IndentedDOM(reqXML);
>
> Any info on this subject are welcome and will be appreciated. Thank you.
>
> Regards,
>
>
> Pae
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: DOM with no indentation to DOM with indentation

Posted by Pae Choi <pa...@earthlink.net>.
Oops, I did not consider the line wrapping. And the initial DOM sample
in the previous message contains typos. The correct DOM sample should
be as follows:

<Parent><Child><Name>John
Doe</Name><Gender>Male</Gender></Child>><Child><Name>Jane
Doe</Name><Gender>Feale</Gender></Child></Parent>

*** Note: If above sample does not show in a single line, it's probably
cause by line wrapping. Then please consider that as a single line.
Thanks again.

Regards,


Pae




> Say we have generated a DOM as follows:
>
> <Parent><Child><Name>John
> Doe</Name><Gender>Male</Gender></Child>><Child><Name>John
> Doe</Name><Gender>Male</Gender></Child></Parent>
>
> As you can there is Node.TEXT_NODE for indentation such as "\n", "\t", or
so
> other than values for <Name> and <Gender> elements.
>
> My question is that if there is any utility available to convert the
> above DOM to "the following DOM with indentation" as follows:
>
> <Parent>
>     <Child>
>         <Name>John Doe</Name>
>         <Gender>Male</Gender>
>     </Child>
>     <Child>
>         <Name>Jane Doe</Name>
>         <Gender>Female</Gender>
>     </Child>
> </Parent>
>
> If there is a utility class something like:
>
> import    org.w3c.dom.Document;
>
> public class XMLUtil {
>
>     /** Convert the non-indented DOM to an indented DOM */
>     public Document covnertNonIndentedDOM2IndentedDOM(Document inboundXML)
{
>         Document outboundXML = null;
>
>         // snip
>
>         return outboundXML;
>     }
> }
>
> will be nice to know. So it can be done by:
>
> XMLUtil xmlUtil = new XMLUtil();
> Document respXML = xmlUtil.covnertNonIndentedDOM2IndentedDOM(reqXML);
>
> Any info on this subject are welcome and will be appreciated. Thank you.
>
> Regards,
>
>
> Pae
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: DOM with no indentation to DOM with indentation

Posted by Pae Choi <pa...@earthlink.net>.
Oops, I did not consider the line wrapping. And the initial DOM sample
in the previous message contains typos. The correct DOM sample should
be as follows:

<Parent><Child><Name>John
Doe</Name><Gender>Male</Gender></Child>><Child><Name>Jane
Doe</Name><Gender>Feale</Gender></Child></Parent>

*** Note: If above sample does not show in a single line, it's probably
cause by line wrapping. Then please consider that as a single line.
Thanks again.

Regards,


Pae




> Say we have generated a DOM as follows:
>
> <Parent><Child><Name>John
> Doe</Name><Gender>Male</Gender></Child>><Child><Name>John
> Doe</Name><Gender>Male</Gender></Child></Parent>
>
> As you can there is Node.TEXT_NODE for indentation such as "\n", "\t", or
so
> other than values for <Name> and <Gender> elements.
>
> My question is that if there is any utility available to convert the
> above DOM to "the following DOM with indentation" as follows:
>
> <Parent>
>     <Child>
>         <Name>John Doe</Name>
>         <Gender>Male</Gender>
>     </Child>
>     <Child>
>         <Name>Jane Doe</Name>
>         <Gender>Female</Gender>
>     </Child>
> </Parent>
>
> If there is a utility class something like:
>
> import    org.w3c.dom.Document;
>
> public class XMLUtil {
>
>     /** Convert the non-indented DOM to an indented DOM */
>     public Document covnertNonIndentedDOM2IndentedDOM(Document inboundXML)
{
>         Document outboundXML = null;
>
>         // snip
>
>         return outboundXML;
>     }
> }
>
> will be nice to know. So it can be done by:
>
> XMLUtil xmlUtil = new XMLUtil();
> Document respXML = xmlUtil.covnertNonIndentedDOM2IndentedDOM(reqXML);
>
> Any info on this subject are welcome and will be appreciated. Thank you.
>
> Regards,
>
>
> Pae
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: DOM with no indentation to DOM with indentation

Posted by Joseph Kesselman <ke...@us.ibm.com>.
Unless you are extremely careful about how yout do it (only inserting 
indentation in places where whitespace is not meaningful, as defined by 
the DTD and/or Schema and/or your program's understanding of the data), 
indentation can change the meaning of your document. That's why it isn't 
the default.

I'm not sure whether Xerces' serializers offer an indentation option. 

XSLT does, via the  <xml:output indent="yes"/> setting, so you could use 
Xalan and an identity transformation to get this result. (You'd also have 
to set the indentation amount; I believe Xalan currently defaults to 0 -- 
but the Xalan lists would be the right place for to go into detail on 
that.)
______________________________________
Joe Kesselman  / IBM Research

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: DOM with no indentation to DOM with indentation

Posted by Joseph Kesselman <ke...@us.ibm.com>.
Unless you are extremely careful about how yout do it (only inserting 
indentation in places where whitespace is not meaningful, as defined by 
the DTD and/or Schema and/or your program's understanding of the data), 
indentation can change the meaning of your document. That's why it isn't 
the default.

I'm not sure whether Xerces' serializers offer an indentation option. 

XSLT does, via the  <xml:output indent="yes"/> setting, so you could use 
Xalan and an identity transformation to get this result. (You'd also have 
to set the indentation amount; I believe Xalan currently defaults to 0 -- 
but the Xalan lists would be the right place for to go into detail on 
that.)
______________________________________
Joe Kesselman  / IBM Research

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


DOM with no indentation to DOM with indentation

Posted by Pae Choi <pa...@earthlink.net>.
Say we have generated a DOM as follows:

<Parent><Child><Name>John
Doe</Name><Gender>Male</Gender></Child>><Child><Name>John
Doe</Name><Gender>Male</Gender></Child></Parent>

As you can there is Node.TEXT_NODE for indentation such as "\n", "\t", or so
other than values for <Name> and <Gender> elements.

My question is that if there is any utility available to convert the
above DOM to "the following DOM with indentation" as follows:

<Parent>
    <Child>
        <Name>John Doe</Name>
        <Gender>Male</Gender>
    </Child>
    <Child>
        <Name>Jane Doe</Name>
        <Gender>Female</Gender>
    </Child>
</Parent>

If there is a utility class something like:

import    org.w3c.dom.Document;

public class XMLUtil {

    /** Convert the non-indented DOM to an indented DOM */
    public Document covnertNonIndentedDOM2IndentedDOM(Document inboundXML) {
        Document outboundXML = null;

        // snip

        return outboundXML;
    }
}

will be nice to know. So it can be done by:

XMLUtil xmlUtil = new XMLUtil();
Document respXML = xmlUtil.covnertNonIndentedDOM2IndentedDOM(reqXML);

Any info on this subject are welcome and will be appreciated. Thank you.

Regards,


Pae



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


DOM with no indentation to DOM with indentation

Posted by Pae Choi <pa...@earthlink.net>.
Say we have generated a DOM as follows:

<Parent><Child><Name>John
Doe</Name><Gender>Male</Gender></Child>><Child><Name>John
Doe</Name><Gender>Male</Gender></Child></Parent>

As you can there is Node.TEXT_NODE for indentation such as "\n", "\t", or so
other than values for <Name> and <Gender> elements.

My question is that if there is any utility available to convert the
above DOM to "the following DOM with indentation" as follows:

<Parent>
    <Child>
        <Name>John Doe</Name>
        <Gender>Male</Gender>
    </Child>
    <Child>
        <Name>Jane Doe</Name>
        <Gender>Female</Gender>
    </Child>
</Parent>

If there is a utility class something like:

import    org.w3c.dom.Document;

public class XMLUtil {

    /** Convert the non-indented DOM to an indented DOM */
    public Document covnertNonIndentedDOM2IndentedDOM(Document inboundXML) {
        Document outboundXML = null;

        // snip

        return outboundXML;
    }
}

will be nice to know. So it can be done by:

XMLUtil xmlUtil = new XMLUtil();
Document respXML = xmlUtil.covnertNonIndentedDOM2IndentedDOM(reqXML);

Any info on this subject are welcome and will be appreciated. Thank you.

Regards,


Pae



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Revalidation request from Cocoon

Posted by Elena Litani <el...@ca.ibm.com>.
Hi, 

I am forwarding yet another re-validation request we've received from
Cocoon.

Given the request from Xalan, we should think about designing generic
API for revalidation of some data structure in memory.

Thx,
-- 
Elena Litani / IBM Toronto

Torsten Curdt wrote:
> I'm a cocoon developer and I'm currently trying to improve cocoon's form
> handling. I'm trying to find a unified approach for validation of an
> abstract (or even just a small portion of a) DOM tree. So we can easily
> use different validation schemas (xsd,relaxng,schematron).
> I have a DOM structure in memory.
> We are building this DOM while going through a install-wizard-like
> webapp. So we usually start with an empty DOM add some nodes and want to
> validate those. Then we add some more nodes and want to validate only those
> added. I the end I want to validate the document again.
> 
> Now comes the tricky part: the result of the validation is supposed to be
> (1) not fail fast - but lenient (validate all - so we get all errors)
> (2) the validation result is a very detailed description of what is wrong. for
> content nodes e.g. the regexp that failed
> (3) the DOM might also be a bean - so we need an abstraction here, too. This
> should be keept in mind. But could be saved for later ;-)
> 
> Example:
> 
> first page:
>   *empty*
> 
> after first page:
> 
>   <doc>
> |      <user>
> |            <username>Torsten</username>
> |     </user>
>   </doc>
> 
> after second page:
> 
>   <doc>
>       <user>
>              <username>Torsten</username>
>       </user>
> |     <order>
> |            <item id="123"/>
> |     </order
>   </doc>
> 
> at the end:
> 
>   <doc>
> |     <user>
> |            <username>Torsten</username>
> |     </user>
> |     <order>
> |            <item id="123"/>
> |     </order
>   </doc>
> 
> | = validate
> 
> And I am wondering 
> 
> (1) if partial validation of the tree possible at all with the current 
>     Xerces API
> (2) if someone can give a little introduction (since I doubt the doc is
> already in place ;-) of the current PSVI API and related stuff :-)) *please*
> (3) have you guys thought about a more pluggable approach? since there are
> lot's of different validation techniques out there. Schematron, relaxng,...
> It would be great if someone could just drop that into xerces.
> 
> Regarding revalidation against RelaxNG, schematron
> I know it would be too much work to have all the different validation types in
> xerces (lot's of work!) but wouldn't it be useful to have an API so people or
> other projects can just plug-in their implementation?
> 
> cheers
> --
> Torsten

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


RE: DOM tree (re)validation is implemented!

Posted by Mikko Honkala <ho...@tml.hut.fi>.
Hi Elena,

I'm quite exited to hear that you've implemented the first version of DOM revalidation. I've got just a simple request: would it be
possible to add an example using the revalidation. Grepping the samples dir for the string normalizeDocument() gave no matches.

Best Regards,
	Mikko Honkala
	www.x-smiles.org
	W3C XForms WG

PS. Now I'm just waiting for the possibility to access datatype information from the DOM.

> -----Original Message-----
> From: Elena Litani [mailto:elitani@ca.ibm.com]
> Sent: 4. kesakuuta 2002 22:10
> To: xerces-j-user@xml.apache.org; xerces-j-dev@xml.apache.org
> Subject: DOM tree (re)validation is implemented!
>

...

> How to tell DOM implementation to revalidate the tree
> ------------------------------------------------------
> DOM L3 provides setNormalizationFeature()[2] on the Document interface
> that allows users to specify what functions normalizeDocument() should
> perform.
> (1) Cast Document to DocumentImpl.
> (2) Call document.setNormalizationFeature("validate", true).
> (3) To start (re)validation call document.normalizeDocument().


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org