You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Windchime <w...@zahuta.com> on 2007/12/11 18:30:30 UTC

Avoiding DOM Reparse

I am reading lots of XML documents that need schema validation.  However, I
don't know what the schema (actually grammar pool) looks like until long
after the documents are read.  Finally, I want to have full access to the
PSVI information.

What I would like to know is, is there any way to perform the schema
validation without reparsing the XML.  The reparse causes two problems:
1) Takes time
2) All cached Element references must be re-cached

The code that I have generally works just fine, except that I must reparse
the DOM a second time once the grammar has been determined.

In priority order, it would be great if I could:

   * Somehow associate the grammar pool and then just call
normalizeDocument() [ I already have a PSVIDocument using the
'http://apache.org/xml/properties/dom/document-class-name' property ]

   * Use an input source that would preserve the original DOM



Thanks much in advance,


-Windy


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Avoiding DOM Reparse

Posted by Jeff Greif <jg...@alumni.princeton.edu>.
If you have a pretty good idea of the set of schemas that will be
needed before reading the instance documents, you could preparse those
schemas into a GrammarPool associated with the instance document
parser.  That way, most or all of the documents could be validated as
they are read (the first time).  As a backup, normalizeDocument could
be used on the remaining ones.

Jeff

On 12/11/07, Michael Glavassevich <mr...@ca.ibm.com> wrote:
> Hi Windy,
>
> Before calling normalizeDocument() you should be able to set the
> "http://apache.org/xml/properties/internal/grammar-pool" property on the
> DOMConfiguration with your grammar pool. Alternatively you could use the
> JAXP Validation API [1]. If you pass the PSVI-aware version of the DOM to
> the Validator [2] as both the Source and Result it will annotate your DOM
> with PSVI.
>
> Thanks.
>
> [1]
> http://xerces.apache.org/xerces2-j/javadocs/api/javax/xml/validation/package-summary.html
> [2]
> http://xerces.apache.org/xerces2-j/javadocs/api/javax/xml/validation/Validator.html#validate(javax.xml.transform.Source,%20javax.xml.transform.Result)
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> "Windchime" <w...@zahuta.com> wrote on 12/11/2007 12:30:30 PM:
>
> > I am reading lots of XML documents that need schema validation.  However,
> I
> > don't know what the schema (actually grammar pool) looks like until long
> > after the documents are read.  Finally, I want to have full access to the
> > PSVI information.
> >
> > What I would like to know is, is there any way to perform the schema
> > validation without reparsing the XML.  The reparse causes two problems:
> > 1) Takes time
> > 2) All cached Element references must be re-cached
> >
> > The code that I have generally works just fine, except that I must
> reparse
> > the DOM a second time once the grammar has been determined.
> >
> > In priority order, it would be great if I could:
> >
> >    * Somehow associate the grammar pool and then just call
> > normalizeDocument() [ I already have a PSVIDocument using the
> > 'http://apache.org/xml/properties/dom/document-class-name' property ]
> >
> >    * Use an input source that would preserve the original DOM
> >
> >
> >
> > Thanks much in advance,
> >
> >
> > -Windy
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: j-users-help@xerces.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


RE: Avoiding DOM Reparse

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Windy,

There's no flag. Line/column info isn't part of a standard DOM. No matter
what you do you won't get the line numbers directly from the in-memory DOM
validator. You need to improvise if you need that information by doing
something like what I suggested in previous e-mails.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Windchime" <w...@zahuta.com> wrote on 12/19/2007 05:24:34 PM:

> Thanks,
>
> We are still disconnected, sorry.
>
> This thread started with I was reparsing XML, you told me how to load set
> the grammar on the DOMConfiguration.  Now the schema validation is
operating
> on the (long ago) loaded PSVIDocument.  However, when I switched to this
> in-memory validation (versus the reparse), I lost the line numbers in the
> schema validation messages.  Is this a lost cause?  Is there another flag
I
> should set on the configuration and/or while loading the PSVIDocument
> initially?
>
> Thank you so very much for all of your efforts and responses.
>
> -Windy
>
>
> > -----Original Message-----
> > From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> > Sent: Wednesday, December 19, 2007 2:00 PM
> > To: j-users@xerces.apache.org
> > Subject: RE: Avoiding DOM Reparse
> >
> > Windy,
> >
> > Since you mentioned DOMLocator I assume you're doing the validation
using
> > normalizeDocument(). When you receive a DOMError during validation you
can
> > extract the line/column information from the related node [1] attached
to
> > the DOMLocator. The user data attached to the node is out of band
> > information which the implementation has no knowledge of so you'll
still
> > get -1 from getColumnNumber() and getLineNumber() if you invoke those
> > methods.
> >
> > Thanks.
> >
> > [1]
> > http://xerces.apache.org/xerces2-
> > j/javadocs/api/org/w3c/dom/DOMLocator.html#getRelatedNode()
> >
> > Michael Glavassevich
> > XML Parser Development
> > IBM Toronto Lab
> > E-mail: mrglavas@ca.ibm.com
> > E-mail: mrglavas@apache.org
> >
> > "Windchime" <w...@zahuta.com> wrote on 12/19/2007 02:28:48 PM:
> >
> > > Michael,
> > >
> > > Thanks, but how does this interact with the DOMLocator?  Or is the
> > default
> > > schema error message even using the DOMLocator?
> > >
> > > -Windy
> > >
> > > > -----Original Message-----
> > > > From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> > > > Sent: Wednesday, December 19, 2007 9:10 AM
> > > > To: j-users@xerces.apache.org
> > > > Subject: RE: Avoiding DOM Reparse
> > > >
> > > > Hi Windy,
> > > >
> > > > There's a sample that comes with Xerces called dom.DOMAddLines [1]
> > which
> > > > shows how you can add line/column information to the DOM using
> > > > Node.setUserData() [2] and read it back with Node.getUserData()
[3].
> > Note
> > > > that this requires an extension to the DOM parser implementation.
> > There's
> > > > no standard way of doing this and if you start making modifications
to
> > the
> > > > DOM after loading it the line/column information you initially
stored
> > > > gradually becomes meaningless.
> > > >
> > > > Thanks.
> > > >
> > > > [1] http://xerces.apache.org/xerces2-j/samples-dom.html#DOMAddLines
> > > > [2]
> > > > http://xerces.apache.org/xerces2-
> > > >
> >
j/javadocs/api/org/w3c/dom/Node.html#setUserData(java.lang.String,%20java.
> > > > lang.Object,%20org.w3c.dom.UserDataHandler)
> > > > [3]
> > > > http://xerces.apache.org/xerces2-
> > > > j/javadocs/api/org/w3c/dom/Node.html#getUserData(java.lang.String)
> > > >
> > > > Michael Glavassevich
> > > > XML Parser Development
> > > > IBM Toronto Lab
> > > > E-mail: mrglavas@ca.ibm.com
> > > > E-mail: mrglavas@apache.org
> > > >
> > > > "Windchime" <w...@zahuta.com> wrote on 12/19/2007 11:21:11 AM:
> > > >
> > > > > Michael,
> > > > >
> > > > > Thanks much, this worked exceptionally well.  One minor issue
though
> > is
> > > > the
> > > > > reported line numbers are no longer there (-1).  Do I need some
> > setting
> > > > > while loading the DOM?
> > > > >
> > > > > Also, assuming the line numbers are actually determined, how
would I
> > > > access
> > > > > them from the DOM (independent of the schema validation issue)?
> > > > >
> > > > > -Windy
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> > > > > > Sent: Tuesday, December 11, 2007 10:24 AM
> > > > > > To: j-users@xerces.apache.org
> > > > > > Subject: Re: Avoiding DOM Reparse
> > > > > >
> > > > > > Hi Windy,
> > > > > >
> > > > > > Before calling normalizeDocument() you should be able to set
the
> > > > > > "http://apache.org/xml/properties/internal/grammar-pool"
property
> > on
> > > > the
> > > > > > DOMConfiguration with your grammar pool. Alternatively you
could
> > use
> > > > the
> > > > > > JAXP Validation API [1]. If you pass the PSVI-aware version of
the
> > DOM
> > > > to
> > > > > > the Validator [2] as both the Source and Result it will
annotate
> > your
> > > > DOM
> > > > > > with PSVI.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > [1]
> > > > > > http://xerces.apache.org/xerces2-
> > > > > > j/javadocs/api/javax/xml/validation/package-summary.html
> > > > > > [2]
> > > > > > http://xerces.apache.org/xerces2-
> > > > > >
> > > >
> >
j/javadocs/api/javax/xml/validation/Validator.html#validate(javax.xml.tran
> > > > > > sform.Source,%20javax.xml.transform.Result)
> > > > > >
> > > > > > Michael Glavassevich
> > > > > > XML Parser Development
> > > > > > IBM Toronto Lab
> > > > > > E-mail: mrglavas@ca.ibm.com
> > > > > > E-mail: mrglavas@apache.org
> > > > > >
> > > > > > "Windchime" <w...@zahuta.com> wrote on 12/11/2007 12:30:30 PM:
> > > > > >
> > > > > > > I am reading lots of XML documents that need schema
validation.
> > > > > > However,
> > > > > > I
> > > > > > > don't know what the schema (actually grammar pool) looks like
> > until
> > > > long
> > > > > > > after the documents are read.  Finally, I want to have full
> > access
> > > > to
> > > > > > the
> > > > > > > PSVI information.
> > > > > > >
> > > > > > > What I would like to know is, is there any way to perform the
> > schema
> > > > > > > validation without reparsing the XML.  The reparse causes two
> > > > problems:
> > > > > > > 1) Takes time
> > > > > > > 2) All cached Element references must be re-cached
> > > > > > >
> > > > > > > The code that I have generally works just fine, except that I
> > must
> > > > > > reparse
> > > > > > > the DOM a second time once the grammar has been determined.
> > > > > > >
> > > > > > > In priority order, it would be great if I could:
> > > > > > >
> > > > > > >    * Somehow associate the grammar pool and then just call
> > > > > > > normalizeDocument() [ I already have a PSVIDocument using the
> > > > > > > 'http://apache.org/xml/properties/dom/document-class-name'
> > property
> > > > ]
> > > > > > >
> > > > > > >    * Use an input source that would preserve the original DOM
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thanks much in advance,
> > > > > > >
> > > > > > >
> > > > > > > -Windy
> > > > > > >
> > > > > > >
> > > > > > >
> > --------------------------------------------------------------------
> > > > -
> > > > > > > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > > > > > > For additional commands, e-mail:
j-users-help@xerces.apache.org
> > > > > >
> > > > > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > > > > > For additional commands, e-mail: j-users-help@xerces.apache.org
> > > > >
> > > > >
> > > > >
--------------------------------------------------------------------
> > -
> > > > > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > > > > For additional commands, e-mail: j-users-help@xerces.apache.org
> > > >
> > > >
> > > >
---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > > > For additional commands, e-mail: j-users-help@xerces.apache.org
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > > For additional commands, e-mail: j-users-help@xerces.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: j-users-help@xerces.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


RE: Avoiding DOM Reparse

Posted by Windchime <w...@zahuta.com>.
Thanks,

We are still disconnected, sorry.

This thread started with I was reparsing XML, you told me how to load set
the grammar on the DOMConfiguration.  Now the schema validation is operating
on the (long ago) loaded PSVIDocument.  However, when I switched to this
in-memory validation (versus the reparse), I lost the line numbers in the
schema validation messages.  Is this a lost cause?  Is there another flag I
should set on the configuration and/or while loading the PSVIDocument
initially?

Thank you so very much for all of your efforts and responses.

-Windy


> -----Original Message-----
> From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> Sent: Wednesday, December 19, 2007 2:00 PM
> To: j-users@xerces.apache.org
> Subject: RE: Avoiding DOM Reparse
> 
> Windy,
> 
> Since you mentioned DOMLocator I assume you're doing the validation using
> normalizeDocument(). When you receive a DOMError during validation you can
> extract the line/column information from the related node [1] attached to
> the DOMLocator. The user data attached to the node is out of band
> information which the implementation has no knowledge of so you'll still
> get -1 from getColumnNumber() and getLineNumber() if you invoke those
> methods.
> 
> Thanks.
> 
> [1]
> http://xerces.apache.org/xerces2-
> j/javadocs/api/org/w3c/dom/DOMLocator.html#getRelatedNode()
> 
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
> 
> "Windchime" <w...@zahuta.com> wrote on 12/19/2007 02:28:48 PM:
> 
> > Michael,
> >
> > Thanks, but how does this interact with the DOMLocator?  Or is the
> default
> > schema error message even using the DOMLocator?
> >
> > -Windy
> >
> > > -----Original Message-----
> > > From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> > > Sent: Wednesday, December 19, 2007 9:10 AM
> > > To: j-users@xerces.apache.org
> > > Subject: RE: Avoiding DOM Reparse
> > >
> > > Hi Windy,
> > >
> > > There's a sample that comes with Xerces called dom.DOMAddLines [1]
> which
> > > shows how you can add line/column information to the DOM using
> > > Node.setUserData() [2] and read it back with Node.getUserData() [3].
> Note
> > > that this requires an extension to the DOM parser implementation.
> There's
> > > no standard way of doing this and if you start making modifications to
> the
> > > DOM after loading it the line/column information you initially stored
> > > gradually becomes meaningless.
> > >
> > > Thanks.
> > >
> > > [1] http://xerces.apache.org/xerces2-j/samples-dom.html#DOMAddLines
> > > [2]
> > > http://xerces.apache.org/xerces2-
> > >
> j/javadocs/api/org/w3c/dom/Node.html#setUserData(java.lang.String,%20java.
> > > lang.Object,%20org.w3c.dom.UserDataHandler)
> > > [3]
> > > http://xerces.apache.org/xerces2-
> > > j/javadocs/api/org/w3c/dom/Node.html#getUserData(java.lang.String)
> > >
> > > Michael Glavassevich
> > > XML Parser Development
> > > IBM Toronto Lab
> > > E-mail: mrglavas@ca.ibm.com
> > > E-mail: mrglavas@apache.org
> > >
> > > "Windchime" <w...@zahuta.com> wrote on 12/19/2007 11:21:11 AM:
> > >
> > > > Michael,
> > > >
> > > > Thanks much, this worked exceptionally well.  One minor issue though
> is
> > > the
> > > > reported line numbers are no longer there (-1).  Do I need some
> setting
> > > > while loading the DOM?
> > > >
> > > > Also, assuming the line numbers are actually determined, how would I
> > > access
> > > > them from the DOM (independent of the schema validation issue)?
> > > >
> > > > -Windy
> > > >
> > > > > -----Original Message-----
> > > > > From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> > > > > Sent: Tuesday, December 11, 2007 10:24 AM
> > > > > To: j-users@xerces.apache.org
> > > > > Subject: Re: Avoiding DOM Reparse
> > > > >
> > > > > Hi Windy,
> > > > >
> > > > > Before calling normalizeDocument() you should be able to set the
> > > > > "http://apache.org/xml/properties/internal/grammar-pool" property
> on
> > > the
> > > > > DOMConfiguration with your grammar pool. Alternatively you could
> use
> > > the
> > > > > JAXP Validation API [1]. If you pass the PSVI-aware version of the
> DOM
> > > to
> > > > > the Validator [2] as both the Source and Result it will annotate
> your
> > > DOM
> > > > > with PSVI.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > [1]
> > > > > http://xerces.apache.org/xerces2-
> > > > > j/javadocs/api/javax/xml/validation/package-summary.html
> > > > > [2]
> > > > > http://xerces.apache.org/xerces2-
> > > > >
> > >
> j/javadocs/api/javax/xml/validation/Validator.html#validate(javax.xml.tran
> > > > > sform.Source,%20javax.xml.transform.Result)
> > > > >
> > > > > Michael Glavassevich
> > > > > XML Parser Development
> > > > > IBM Toronto Lab
> > > > > E-mail: mrglavas@ca.ibm.com
> > > > > E-mail: mrglavas@apache.org
> > > > >
> > > > > "Windchime" <w...@zahuta.com> wrote on 12/11/2007 12:30:30 PM:
> > > > >
> > > > > > I am reading lots of XML documents that need schema validation.
> > > > > However,
> > > > > I
> > > > > > don't know what the schema (actually grammar pool) looks like
> until
> > > long
> > > > > > after the documents are read.  Finally, I want to have full
> access
> > > to
> > > > > the
> > > > > > PSVI information.
> > > > > >
> > > > > > What I would like to know is, is there any way to perform the
> schema
> > > > > > validation without reparsing the XML.  The reparse causes two
> > > problems:
> > > > > > 1) Takes time
> > > > > > 2) All cached Element references must be re-cached
> > > > > >
> > > > > > The code that I have generally works just fine, except that I
> must
> > > > > reparse
> > > > > > the DOM a second time once the grammar has been determined.
> > > > > >
> > > > > > In priority order, it would be great if I could:
> > > > > >
> > > > > >    * Somehow associate the grammar pool and then just call
> > > > > > normalizeDocument() [ I already have a PSVIDocument using the
> > > > > > 'http://apache.org/xml/properties/dom/document-class-name'
> property
> > > ]
> > > > > >
> > > > > >    * Use an input source that would preserve the original DOM
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks much in advance,
> > > > > >
> > > > > >
> > > > > > -Windy
> > > > > >
> > > > > >
> > > > > >
> --------------------------------------------------------------------
> > > -
> > > > > > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > > > > > For additional commands, e-mail: j-users-help@xerces.apache.org
> > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > > > > For additional commands, e-mail: j-users-help@xerces.apache.org
> > > >
> > > >
> > > > --------------------------------------------------------------------
> -
> > > > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > > > For additional commands, e-mail: j-users-help@xerces.apache.org
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > > For additional commands, e-mail: j-users-help@xerces.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: j-users-help@xerces.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


RE: Avoiding DOM Reparse

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Windy,

Since you mentioned DOMLocator I assume you're doing the validation using
normalizeDocument(). When you receive a DOMError during validation you can
extract the line/column information from the related node [1] attached to
the DOMLocator. The user data attached to the node is out of band
information which the implementation has no knowledge of so you'll still
get -1 from getColumnNumber() and getLineNumber() if you invoke those
methods.

Thanks.

[1]
http://xerces.apache.org/xerces2-j/javadocs/api/org/w3c/dom/DOMLocator.html#getRelatedNode()

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Windchime" <w...@zahuta.com> wrote on 12/19/2007 02:28:48 PM:

> Michael,
>
> Thanks, but how does this interact with the DOMLocator?  Or is the
default
> schema error message even using the DOMLocator?
>
> -Windy
>
> > -----Original Message-----
> > From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> > Sent: Wednesday, December 19, 2007 9:10 AM
> > To: j-users@xerces.apache.org
> > Subject: RE: Avoiding DOM Reparse
> >
> > Hi Windy,
> >
> > There's a sample that comes with Xerces called dom.DOMAddLines [1]
which
> > shows how you can add line/column information to the DOM using
> > Node.setUserData() [2] and read it back with Node.getUserData() [3].
Note
> > that this requires an extension to the DOM parser implementation.
There's
> > no standard way of doing this and if you start making modifications to
the
> > DOM after loading it the line/column information you initially stored
> > gradually becomes meaningless.
> >
> > Thanks.
> >
> > [1] http://xerces.apache.org/xerces2-j/samples-dom.html#DOMAddLines
> > [2]
> > http://xerces.apache.org/xerces2-
> >
j/javadocs/api/org/w3c/dom/Node.html#setUserData(java.lang.String,%20java.
> > lang.Object,%20org.w3c.dom.UserDataHandler)
> > [3]
> > http://xerces.apache.org/xerces2-
> > j/javadocs/api/org/w3c/dom/Node.html#getUserData(java.lang.String)
> >
> > Michael Glavassevich
> > XML Parser Development
> > IBM Toronto Lab
> > E-mail: mrglavas@ca.ibm.com
> > E-mail: mrglavas@apache.org
> >
> > "Windchime" <w...@zahuta.com> wrote on 12/19/2007 11:21:11 AM:
> >
> > > Michael,
> > >
> > > Thanks much, this worked exceptionally well.  One minor issue though
is
> > the
> > > reported line numbers are no longer there (-1).  Do I need some
setting
> > > while loading the DOM?
> > >
> > > Also, assuming the line numbers are actually determined, how would I
> > access
> > > them from the DOM (independent of the schema validation issue)?
> > >
> > > -Windy
> > >
> > > > -----Original Message-----
> > > > From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> > > > Sent: Tuesday, December 11, 2007 10:24 AM
> > > > To: j-users@xerces.apache.org
> > > > Subject: Re: Avoiding DOM Reparse
> > > >
> > > > Hi Windy,
> > > >
> > > > Before calling normalizeDocument() you should be able to set the
> > > > "http://apache.org/xml/properties/internal/grammar-pool" property
on
> > the
> > > > DOMConfiguration with your grammar pool. Alternatively you could
use
> > the
> > > > JAXP Validation API [1]. If you pass the PSVI-aware version of the
DOM
> > to
> > > > the Validator [2] as both the Source and Result it will annotate
your
> > DOM
> > > > with PSVI.
> > > >
> > > > Thanks.
> > > >
> > > > [1]
> > > > http://xerces.apache.org/xerces2-
> > > > j/javadocs/api/javax/xml/validation/package-summary.html
> > > > [2]
> > > > http://xerces.apache.org/xerces2-
> > > >
> >
j/javadocs/api/javax/xml/validation/Validator.html#validate(javax.xml.tran
> > > > sform.Source,%20javax.xml.transform.Result)
> > > >
> > > > Michael Glavassevich
> > > > XML Parser Development
> > > > IBM Toronto Lab
> > > > E-mail: mrglavas@ca.ibm.com
> > > > E-mail: mrglavas@apache.org
> > > >
> > > > "Windchime" <w...@zahuta.com> wrote on 12/11/2007 12:30:30 PM:
> > > >
> > > > > I am reading lots of XML documents that need schema validation.
> > > > However,
> > > > I
> > > > > don't know what the schema (actually grammar pool) looks like
until
> > long
> > > > > after the documents are read.  Finally, I want to have full
access
> > to
> > > > the
> > > > > PSVI information.
> > > > >
> > > > > What I would like to know is, is there any way to perform the
schema
> > > > > validation without reparsing the XML.  The reparse causes two
> > problems:
> > > > > 1) Takes time
> > > > > 2) All cached Element references must be re-cached
> > > > >
> > > > > The code that I have generally works just fine, except that I
must
> > > > reparse
> > > > > the DOM a second time once the grammar has been determined.
> > > > >
> > > > > In priority order, it would be great if I could:
> > > > >
> > > > >    * Somehow associate the grammar pool and then just call
> > > > > normalizeDocument() [ I already have a PSVIDocument using the
> > > > > 'http://apache.org/xml/properties/dom/document-class-name'
property
> > ]
> > > > >
> > > > >    * Use an input source that would preserve the original DOM
> > > > >
> > > > >
> > > > >
> > > > > Thanks much in advance,
> > > > >
> > > > >
> > > > > -Windy
> > > > >
> > > > >
> > > > >
--------------------------------------------------------------------
> > -
> > > > > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > > > > For additional commands, e-mail: j-users-help@xerces.apache.org
> > > >
> > > >
> > > >
---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > > > For additional commands, e-mail: j-users-help@xerces.apache.org
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > > For additional commands, e-mail: j-users-help@xerces.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: j-users-help@xerces.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


RE: Avoiding DOM Reparse

Posted by Windchime <w...@zahuta.com>.
Michael,

Thanks, but how does this interact with the DOMLocator?  Or is the default
schema error message even using the DOMLocator?

-Windy

> -----Original Message-----
> From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> Sent: Wednesday, December 19, 2007 9:10 AM
> To: j-users@xerces.apache.org
> Subject: RE: Avoiding DOM Reparse
> 
> Hi Windy,
> 
> There's a sample that comes with Xerces called dom.DOMAddLines [1] which
> shows how you can add line/column information to the DOM using
> Node.setUserData() [2] and read it back with Node.getUserData() [3]. Note
> that this requires an extension to the DOM parser implementation. There's
> no standard way of doing this and if you start making modifications to the
> DOM after loading it the line/column information you initially stored
> gradually becomes meaningless.
> 
> Thanks.
> 
> [1] http://xerces.apache.org/xerces2-j/samples-dom.html#DOMAddLines
> [2]
> http://xerces.apache.org/xerces2-
> j/javadocs/api/org/w3c/dom/Node.html#setUserData(java.lang.String,%20java.
> lang.Object,%20org.w3c.dom.UserDataHandler)
> [3]
> http://xerces.apache.org/xerces2-
> j/javadocs/api/org/w3c/dom/Node.html#getUserData(java.lang.String)
> 
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
> 
> "Windchime" <w...@zahuta.com> wrote on 12/19/2007 11:21:11 AM:
> 
> > Michael,
> >
> > Thanks much, this worked exceptionally well.  One minor issue though is
> the
> > reported line numbers are no longer there (-1).  Do I need some setting
> > while loading the DOM?
> >
> > Also, assuming the line numbers are actually determined, how would I
> access
> > them from the DOM (independent of the schema validation issue)?
> >
> > -Windy
> >
> > > -----Original Message-----
> > > From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> > > Sent: Tuesday, December 11, 2007 10:24 AM
> > > To: j-users@xerces.apache.org
> > > Subject: Re: Avoiding DOM Reparse
> > >
> > > Hi Windy,
> > >
> > > Before calling normalizeDocument() you should be able to set the
> > > "http://apache.org/xml/properties/internal/grammar-pool" property on
> the
> > > DOMConfiguration with your grammar pool. Alternatively you could use
> the
> > > JAXP Validation API [1]. If you pass the PSVI-aware version of the DOM
> to
> > > the Validator [2] as both the Source and Result it will annotate your
> DOM
> > > with PSVI.
> > >
> > > Thanks.
> > >
> > > [1]
> > > http://xerces.apache.org/xerces2-
> > > j/javadocs/api/javax/xml/validation/package-summary.html
> > > [2]
> > > http://xerces.apache.org/xerces2-
> > >
> j/javadocs/api/javax/xml/validation/Validator.html#validate(javax.xml.tran
> > > sform.Source,%20javax.xml.transform.Result)
> > >
> > > Michael Glavassevich
> > > XML Parser Development
> > > IBM Toronto Lab
> > > E-mail: mrglavas@ca.ibm.com
> > > E-mail: mrglavas@apache.org
> > >
> > > "Windchime" <w...@zahuta.com> wrote on 12/11/2007 12:30:30 PM:
> > >
> > > > I am reading lots of XML documents that need schema validation.
> > > However,
> > > I
> > > > don't know what the schema (actually grammar pool) looks like until
> long
> > > > after the documents are read.  Finally, I want to have full access
> to
> > > the
> > > > PSVI information.
> > > >
> > > > What I would like to know is, is there any way to perform the schema
> > > > validation without reparsing the XML.  The reparse causes two
> problems:
> > > > 1) Takes time
> > > > 2) All cached Element references must be re-cached
> > > >
> > > > The code that I have generally works just fine, except that I must
> > > reparse
> > > > the DOM a second time once the grammar has been determined.
> > > >
> > > > In priority order, it would be great if I could:
> > > >
> > > >    * Somehow associate the grammar pool and then just call
> > > > normalizeDocument() [ I already have a PSVIDocument using the
> > > > 'http://apache.org/xml/properties/dom/document-class-name' property
> ]
> > > >
> > > >    * Use an input source that would preserve the original DOM
> > > >
> > > >
> > > >
> > > > Thanks much in advance,
> > > >
> > > >
> > > > -Windy
> > > >
> > > >
> > > > --------------------------------------------------------------------
> -
> > > > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > > > For additional commands, e-mail: j-users-help@xerces.apache.org
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > > For additional commands, e-mail: j-users-help@xerces.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: j-users-help@xerces.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


RE: Avoiding DOM Reparse

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Windy,

There's a sample that comes with Xerces called dom.DOMAddLines [1] which
shows how you can add line/column information to the DOM using
Node.setUserData() [2] and read it back with Node.getUserData() [3]. Note
that this requires an extension to the DOM parser implementation. There's
no standard way of doing this and if you start making modifications to the
DOM after loading it the line/column information you initially stored
gradually becomes meaningless.

Thanks.

[1] http://xerces.apache.org/xerces2-j/samples-dom.html#DOMAddLines
[2]
http://xerces.apache.org/xerces2-j/javadocs/api/org/w3c/dom/Node.html#setUserData(java.lang.String,%20java.lang.Object,%20org.w3c.dom.UserDataHandler)
[3]
http://xerces.apache.org/xerces2-j/javadocs/api/org/w3c/dom/Node.html#getUserData(java.lang.String)

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Windchime" <w...@zahuta.com> wrote on 12/19/2007 11:21:11 AM:

> Michael,
>
> Thanks much, this worked exceptionally well.  One minor issue though is
the
> reported line numbers are no longer there (-1).  Do I need some setting
> while loading the DOM?
>
> Also, assuming the line numbers are actually determined, how would I
access
> them from the DOM (independent of the schema validation issue)?
>
> -Windy
>
> > -----Original Message-----
> > From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> > Sent: Tuesday, December 11, 2007 10:24 AM
> > To: j-users@xerces.apache.org
> > Subject: Re: Avoiding DOM Reparse
> >
> > Hi Windy,
> >
> > Before calling normalizeDocument() you should be able to set the
> > "http://apache.org/xml/properties/internal/grammar-pool" property on
the
> > DOMConfiguration with your grammar pool. Alternatively you could use
the
> > JAXP Validation API [1]. If you pass the PSVI-aware version of the DOM
to
> > the Validator [2] as both the Source and Result it will annotate your
DOM
> > with PSVI.
> >
> > Thanks.
> >
> > [1]
> > http://xerces.apache.org/xerces2-
> > j/javadocs/api/javax/xml/validation/package-summary.html
> > [2]
> > http://xerces.apache.org/xerces2-
> >
j/javadocs/api/javax/xml/validation/Validator.html#validate(javax.xml.tran
> > sform.Source,%20javax.xml.transform.Result)
> >
> > Michael Glavassevich
> > XML Parser Development
> > IBM Toronto Lab
> > E-mail: mrglavas@ca.ibm.com
> > E-mail: mrglavas@apache.org
> >
> > "Windchime" <w...@zahuta.com> wrote on 12/11/2007 12:30:30 PM:
> >
> > > I am reading lots of XML documents that need schema validation.
> > However,
> > I
> > > don't know what the schema (actually grammar pool) looks like until
long
> > > after the documents are read.  Finally, I want to have full access to
> > the
> > > PSVI information.
> > >
> > > What I would like to know is, is there any way to perform the schema
> > > validation without reparsing the XML.  The reparse causes two
problems:
> > > 1) Takes time
> > > 2) All cached Element references must be re-cached
> > >
> > > The code that I have generally works just fine, except that I must
> > reparse
> > > the DOM a second time once the grammar has been determined.
> > >
> > > In priority order, it would be great if I could:
> > >
> > >    * Somehow associate the grammar pool and then just call
> > > normalizeDocument() [ I already have a PSVIDocument using the
> > > 'http://apache.org/xml/properties/dom/document-class-name' property ]
> > >
> > >    * Use an input source that would preserve the original DOM
> > >
> > >
> > >
> > > Thanks much in advance,
> > >
> > >
> > > -Windy
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > > For additional commands, e-mail: j-users-help@xerces.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: j-users-help@xerces.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


RE: Avoiding DOM Reparse

Posted by Windchime <w...@zahuta.com>.
Michael,

Thanks much, this worked exceptionally well.  One minor issue though is the
reported line numbers are no longer there (-1).  Do I need some setting
while loading the DOM?

Also, assuming the line numbers are actually determined, how would I access
them from the DOM (independent of the schema validation issue)?

-Windy

> -----Original Message-----
> From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> Sent: Tuesday, December 11, 2007 10:24 AM
> To: j-users@xerces.apache.org
> Subject: Re: Avoiding DOM Reparse
> 
> Hi Windy,
> 
> Before calling normalizeDocument() you should be able to set the
> "http://apache.org/xml/properties/internal/grammar-pool" property on the
> DOMConfiguration with your grammar pool. Alternatively you could use the
> JAXP Validation API [1]. If you pass the PSVI-aware version of the DOM to
> the Validator [2] as both the Source and Result it will annotate your DOM
> with PSVI.
> 
> Thanks.
> 
> [1]
> http://xerces.apache.org/xerces2-
> j/javadocs/api/javax/xml/validation/package-summary.html
> [2]
> http://xerces.apache.org/xerces2-
> j/javadocs/api/javax/xml/validation/Validator.html#validate(javax.xml.tran
> sform.Source,%20javax.xml.transform.Result)
> 
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
> 
> "Windchime" <w...@zahuta.com> wrote on 12/11/2007 12:30:30 PM:
> 
> > I am reading lots of XML documents that need schema validation.
> However,
> I
> > don't know what the schema (actually grammar pool) looks like until long
> > after the documents are read.  Finally, I want to have full access to
> the
> > PSVI information.
> >
> > What I would like to know is, is there any way to perform the schema
> > validation without reparsing the XML.  The reparse causes two problems:
> > 1) Takes time
> > 2) All cached Element references must be re-cached
> >
> > The code that I have generally works just fine, except that I must
> reparse
> > the DOM a second time once the grammar has been determined.
> >
> > In priority order, it would be great if I could:
> >
> >    * Somehow associate the grammar pool and then just call
> > normalizeDocument() [ I already have a PSVIDocument using the
> > 'http://apache.org/xml/properties/dom/document-class-name' property ]
> >
> >    * Use an input source that would preserve the original DOM
> >
> >
> >
> > Thanks much in advance,
> >
> >
> > -Windy
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: j-users-help@xerces.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Avoiding DOM Reparse

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Windy,

Before calling normalizeDocument() you should be able to set the
"http://apache.org/xml/properties/internal/grammar-pool" property on the
DOMConfiguration with your grammar pool. Alternatively you could use the
JAXP Validation API [1]. If you pass the PSVI-aware version of the DOM to
the Validator [2] as both the Source and Result it will annotate your DOM
with PSVI.

Thanks.

[1]
http://xerces.apache.org/xerces2-j/javadocs/api/javax/xml/validation/package-summary.html
[2]
http://xerces.apache.org/xerces2-j/javadocs/api/javax/xml/validation/Validator.html#validate(javax.xml.transform.Source,%20javax.xml.transform.Result)

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Windchime" <w...@zahuta.com> wrote on 12/11/2007 12:30:30 PM:

> I am reading lots of XML documents that need schema validation.  However,
I
> don't know what the schema (actually grammar pool) looks like until long
> after the documents are read.  Finally, I want to have full access to the
> PSVI information.
>
> What I would like to know is, is there any way to perform the schema
> validation without reparsing the XML.  The reparse causes two problems:
> 1) Takes time
> 2) All cached Element references must be re-cached
>
> The code that I have generally works just fine, except that I must
reparse
> the DOM a second time once the grammar has been determined.
>
> In priority order, it would be great if I could:
>
>    * Somehow associate the grammar pool and then just call
> normalizeDocument() [ I already have a PSVIDocument using the
> 'http://apache.org/xml/properties/dom/document-class-name' property ]
>
>    * Use an input source that would preserve the original DOM
>
>
>
> Thanks much in advance,
>
>
> -Windy
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org