You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Michael Glavassevich <mr...@ca.ibm.com> on 2006/07/23 21:50:52 UTC

Re: deferred node expansion question

Hi Jake,

Jacob Kjome <ho...@visi.com> wrote on 07/22/2006 08:08:22 PM:

> 
> I noticed in the note for the document-class-name property that "When 
> the document class name is set to a value other than the name of the 
> default document factory, the deferred node expansion feature does 
> not work."  Does this mean that Xerces checks specifically for the 
> class "org.apache.xerces.dom.DocumentImpl" in order to enable 
> deferred node expansion, or can a custom DOM implementation extend 
> from said class and still have node expansion enabled?

Setting the document-class-name property to any other value than 
"org.apache.xerces.dom.DocumentImpl" will disable deferred node expansion. 
The only deferred DOM implementation Xerces knows how to handle is its 
own. It directly extends the non-deferred implementation (DocumentImpl) 
and is built using its internal methods. There's no mechanism for plugging 
in a different one.

> For instance, is deferred node expansion disabled if the 
> document-class-name is HTMLDocumentImpl? 

The HTML DOM is on another branch of the class hierarchy (see below) and 
isn't capable of deferred node expansion.

CoreDocumentImpl
|
|__DocumentImpl
   |
   |__DeferredDocumentImpl
   |
   |__HTMLDocumentImpl
   |
   |__PSVIDocumentImpl
   |
   |__WMLDocumentImpl

> I'm hoping it would work in either case or, more 
> generally, in any case where the custom DOM extends DocumentImpl.
>
> Also, if anyone on the Xerces team knows anything about XMLC's 
> LazyDOM, is the deferred node expansion feature equivalent to LazyDOM 
> or is it something a bit different?

Xerces' deferred DOM creates nodes as they are accessed from a table 
representation which is constructed when the document is parsed. I've 
never heard of the XMLC LazyDOM so I don't know how that compares to it.
 
> 
> thanks,
> 
> Jake 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: deferred node expansion question

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Jake,

(cc'ing j-users@xerces.apache.org which is a better forum for these 
questions. You should send follow ups there.)

Jacob Kjome <ho...@visi.com> wrote on 07/24/2006 12:35:55 AM:

> At 02:50 PM 7/23/2006, you wrote:
>  >Hi Jake,
>  >
>  >Jacob Kjome <ho...@visi.com> wrote on 07/22/2006 08:08:22 PM:
>  >
>  >>
>  >> I noticed in the note for the document-class-name property that 
"When
>  >> the document class name is set to a value other than the name of the
>  >> default document factory, the deferred node expansion feature does
>  >> not work."  Does this mean that Xerces checks specifically for the
>  >> class "org.apache.xerces.dom.DocumentImpl" in order to enable
>  >> deferred node expansion, or can a custom DOM implementation extend
>  >> from said class and still have node expansion enabled?
>  >
>  >Setting the document-class-name property to any other value than
>  >"org.apache.xerces.dom.DocumentImpl" will disable deferred node 
expansion.
>  >The only deferred DOM implementation Xerces knows how to handle is its
>  >own. It directly extends the non-deferred implementation 
(DocumentImpl)
>  >and is built using its internal methods. There's no mechanism for 
plugging
>  >in a different one.
>  >
>  >> For instance, is deferred node expansion disabled if the
>  >> document-class-name is HTMLDocumentImpl?
>  >
>  >The HTML DOM is on another branch of the class hierarchy (see below) 
and
>  >isn't capable of deferred node expansion.
>  >
> 
> What if the value of document-class-name was 
> "org.apache.xerces.dom.DeferredDocumentImpl"?  If that wouldn't 
> trigger deferred node expansion, I don't know what would?  I suppose, 
> though, that the feature defer-node-expansion is irrelevant in this 
> case, since it's already be made clear that said feature will be 
> used, implicitly.  Is that right?  So, why state that setting the 
> document-class-name to DocumentImpl could trigger usage of the 
> defer-node-expansion feature (the default is true, according to the 
> docs)? 

If you change the value of document-class-name from its default you don't 
get deferred node expansion regardless of whether the value of the feature 
is true. That is what's stated. It's a warning to users that deferred node 
expansion and plugging in an alternate DOM implementation are mutually 
exclusive.

> When defer-node-expansion is true and the document-class-name 
> is DocumentImpl, Xerces2 sort of ignores the the declared 
> DocumentImpl and, instead, uses DeferredDocumentImpl?  Just seems a 
> little odd.

That's just how it works. For the history you'll have to ask the older 
(and probably former) developers. I wasn't around back then.

> Also, is there a reason the other DOM's aren't given the opportunity 
> to use deferred node expansion?  Is it simply a matter of 
> effort?  Couldn't DeferredDocumentImpl be abstract and then other 
> DOM's can have two versions, one extending DocumentImpl and the other 
> extending an abstract DeferredDocumentImpl?  Is it just a matter of 
> the effort involved that no one is willing to do or is there some 
> technical reason it hasn't been done?

I don't recall this ever being discussed so I wouldn't assume this was a 
conscious decision, however I imagine the benefit folks may get from doing 
this work would be outweighed by the maintenance cost (duplicate code) and 
increased footprint (HTML/WML DOM is already considered bloat by many 
users).

>  >CoreDocumentImpl
>  >|
>  >|__DocumentImpl
>  >   |
>  >   |__DeferredDocumentImpl
>  >   |
>  >   |__HTMLDocumentImpl
>  >   |
>  >   |__PSVIDocumentImpl
>  >   |
>  >   |__WMLDocumentImpl
>  >
>  >> I'm hoping it would work in either case or, more
>  >> generally, in any case where the custom DOM extends DocumentImpl.
>  >>
>  >> Also, if anyone on the Xerces team knows anything about XMLC's
>  >> LazyDOM, is the deferred node expansion feature equivalent to 
LazyDOM
>  >> or is it something a bit different?
>  >
>  >Xerces' deferred DOM creates nodes as they are accessed from a table
>  >representation which is constructed when the document is parsed. I've
>  >never heard of the XMLC LazyDOM so I don't know how that compares to 
it.
>  >
> 
> Do you remember Lutris and the Enhydra server?  That's where XMLC 
> came from.  It's now hosted by ObjectWeb.  Mark Diekhans invented it 
> back in '99 (possibly earlier) or so and extended the Xerces1 
> DocumentImpl.  Xerces2 DocumentImpl seems to work fine with it as 
> well.  Here's a javadoc snippet from LazyDocument....
> 
>   * A DOM Document that supports lazy instantiation of a template DOM. 
Nodes
>   * in the instance DOM are created as accessed.  This can be either by
>   * traversing the tree or by direct access to a node by id number.
>   * Instantiation of nodes in the middle of the virtual tree is support. 
 Thus
>   * a node can exist without a parent being expanded. This is used by 
XMLC,
>   * where the dynamic nodes tend to be towards the leaves of the tree.
>   * <p>
>   * Instances contain a reference to a DOM that is a shared template for 
the
>   * document.  Each node in the template is assigned an integer nodeid 
that be
>   * used to index tables to directly look up the template of a node 
created
>   * from the template.
>   * <p>
>   * This DOM also supports associating pre-formatted text with some 
> nodes, which
>   * is used to avoid expensive string scanning operations during the 
output
>   * of unmodified nodes.
>   * <p>
>   * When a child of a node is requested, all direct children are 
expanded.
>   * This eliminates a lot of difficult book keep.  Attributes are 
treated
>   * as a separate set from children, only instantiated when an attribute
>   * is accessed.
> 
> Does that jog your memory, or at least provide the info needed to 
> compare/contrast the DeferredDocumentImpl functionality with that of 
LazyDOM?

I don't think this was ever in my memory :).

There's no concept of a template in the Xerces implementation. The 
deferred DOM table representation is for building one DOM only and I'm 
pretty sure the table destroys itself as the DOM is expanded to reclaim 
some memory. Also there's no way to jump to a node in the middle of the 
DOM without having its ancestors (and probably the immediate children of 
those ancestors) expanded.

> Jake
> 
>  >>
>  >> thanks,
>  >>
>  >> Jake
>  >>
>  >>
>  >> 
---------------------------------------------------------------------
>  >> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>  >> For additional commands, e-mail: general-help@xml.apache.org
>  >
>  >Thanks.
>  >
>  >Michael Glavassevich
>  >XML Parser Development
>  >IBM Toronto Lab
>  >E-mail: mrglavas@ca.ibm.com
>  >E-mail: mrglavas@apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: deferred node expansion question

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Jake,

(cc'ing j-users@xerces.apache.org which is a better forum for these 
questions. You should send follow ups there.)

Jacob Kjome <ho...@visi.com> wrote on 07/24/2006 12:35:55 AM:

> At 02:50 PM 7/23/2006, you wrote:
>  >Hi Jake,
>  >
>  >Jacob Kjome <ho...@visi.com> wrote on 07/22/2006 08:08:22 PM:
>  >
>  >>
>  >> I noticed in the note for the document-class-name property that 
"When
>  >> the document class name is set to a value other than the name of the
>  >> default document factory, the deferred node expansion feature does
>  >> not work."  Does this mean that Xerces checks specifically for the
>  >> class "org.apache.xerces.dom.DocumentImpl" in order to enable
>  >> deferred node expansion, or can a custom DOM implementation extend
>  >> from said class and still have node expansion enabled?
>  >
>  >Setting the document-class-name property to any other value than
>  >"org.apache.xerces.dom.DocumentImpl" will disable deferred node 
expansion.
>  >The only deferred DOM implementation Xerces knows how to handle is its
>  >own. It directly extends the non-deferred implementation 
(DocumentImpl)
>  >and is built using its internal methods. There's no mechanism for 
plugging
>  >in a different one.
>  >
>  >> For instance, is deferred node expansion disabled if the
>  >> document-class-name is HTMLDocumentImpl?
>  >
>  >The HTML DOM is on another branch of the class hierarchy (see below) 
and
>  >isn't capable of deferred node expansion.
>  >
> 
> What if the value of document-class-name was 
> "org.apache.xerces.dom.DeferredDocumentImpl"?  If that wouldn't 
> trigger deferred node expansion, I don't know what would?  I suppose, 
> though, that the feature defer-node-expansion is irrelevant in this 
> case, since it's already be made clear that said feature will be 
> used, implicitly.  Is that right?  So, why state that setting the 
> document-class-name to DocumentImpl could trigger usage of the 
> defer-node-expansion feature (the default is true, according to the 
> docs)? 

If you change the value of document-class-name from its default you don't 
get deferred node expansion regardless of whether the value of the feature 
is true. That is what's stated. It's a warning to users that deferred node 
expansion and plugging in an alternate DOM implementation are mutually 
exclusive.

> When defer-node-expansion is true and the document-class-name 
> is DocumentImpl, Xerces2 sort of ignores the the declared 
> DocumentImpl and, instead, uses DeferredDocumentImpl?  Just seems a 
> little odd.

That's just how it works. For the history you'll have to ask the older 
(and probably former) developers. I wasn't around back then.

> Also, is there a reason the other DOM's aren't given the opportunity 
> to use deferred node expansion?  Is it simply a matter of 
> effort?  Couldn't DeferredDocumentImpl be abstract and then other 
> DOM's can have two versions, one extending DocumentImpl and the other 
> extending an abstract DeferredDocumentImpl?  Is it just a matter of 
> the effort involved that no one is willing to do or is there some 
> technical reason it hasn't been done?

I don't recall this ever being discussed so I wouldn't assume this was a 
conscious decision, however I imagine the benefit folks may get from doing 
this work would be outweighed by the maintenance cost (duplicate code) and 
increased footprint (HTML/WML DOM is already considered bloat by many 
users).

>  >CoreDocumentImpl
>  >|
>  >|__DocumentImpl
>  >   |
>  >   |__DeferredDocumentImpl
>  >   |
>  >   |__HTMLDocumentImpl
>  >   |
>  >   |__PSVIDocumentImpl
>  >   |
>  >   |__WMLDocumentImpl
>  >
>  >> I'm hoping it would work in either case or, more
>  >> generally, in any case where the custom DOM extends DocumentImpl.
>  >>
>  >> Also, if anyone on the Xerces team knows anything about XMLC's
>  >> LazyDOM, is the deferred node expansion feature equivalent to 
LazyDOM
>  >> or is it something a bit different?
>  >
>  >Xerces' deferred DOM creates nodes as they are accessed from a table
>  >representation which is constructed when the document is parsed. I've
>  >never heard of the XMLC LazyDOM so I don't know how that compares to 
it.
>  >
> 
> Do you remember Lutris and the Enhydra server?  That's where XMLC 
> came from.  It's now hosted by ObjectWeb.  Mark Diekhans invented it 
> back in '99 (possibly earlier) or so and extended the Xerces1 
> DocumentImpl.  Xerces2 DocumentImpl seems to work fine with it as 
> well.  Here's a javadoc snippet from LazyDocument....
> 
>   * A DOM Document that supports lazy instantiation of a template DOM. 
Nodes
>   * in the instance DOM are created as accessed.  This can be either by
>   * traversing the tree or by direct access to a node by id number.
>   * Instantiation of nodes in the middle of the virtual tree is support. 
 Thus
>   * a node can exist without a parent being expanded. This is used by 
XMLC,
>   * where the dynamic nodes tend to be towards the leaves of the tree.
>   * <p>
>   * Instances contain a reference to a DOM that is a shared template for 
the
>   * document.  Each node in the template is assigned an integer nodeid 
that be
>   * used to index tables to directly look up the template of a node 
created
>   * from the template.
>   * <p>
>   * This DOM also supports associating pre-formatted text with some 
> nodes, which
>   * is used to avoid expensive string scanning operations during the 
output
>   * of unmodified nodes.
>   * <p>
>   * When a child of a node is requested, all direct children are 
expanded.
>   * This eliminates a lot of difficult book keep.  Attributes are 
treated
>   * as a separate set from children, only instantiated when an attribute
>   * is accessed.
> 
> Does that jog your memory, or at least provide the info needed to 
> compare/contrast the DeferredDocumentImpl functionality with that of 
LazyDOM?

I don't think this was ever in my memory :).

There's no concept of a template in the Xerces implementation. The 
deferred DOM table representation is for building one DOM only and I'm 
pretty sure the table destroys itself as the DOM is expanded to reclaim 
some memory. Also there's no way to jump to a node in the middle of the 
DOM without having its ancestors (and probably the immediate children of 
those ancestors) expanded.

> Jake
> 
>  >>
>  >> thanks,
>  >>
>  >> Jake
>  >>
>  >>
>  >> 
---------------------------------------------------------------------
>  >> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
>  >> For additional commands, e-mail: general-help@xml.apache.org
>  >
>  >Thanks.
>  >
>  >Michael Glavassevich
>  >XML Parser Development
>  >IBM Toronto Lab
>  >E-mail: mrglavas@ca.ibm.com
>  >E-mail: mrglavas@apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: deferred node expansion question

Posted by Jacob Kjome <ho...@visi.com>.
At 02:50 PM 7/23/2006, you wrote:
 >Hi Jake,
 >
 >Jacob Kjome <ho...@visi.com> wrote on 07/22/2006 08:08:22 PM:
 >
 >>
 >> I noticed in the note for the document-class-name property that "When
 >> the document class name is set to a value other than the name of the
 >> default document factory, the deferred node expansion feature does
 >> not work."  Does this mean that Xerces checks specifically for the
 >> class "org.apache.xerces.dom.DocumentImpl" in order to enable
 >> deferred node expansion, or can a custom DOM implementation extend
 >> from said class and still have node expansion enabled?
 >
 >Setting the document-class-name property to any other value than
 >"org.apache.xerces.dom.DocumentImpl" will disable deferred node expansion.
 >The only deferred DOM implementation Xerces knows how to handle is its
 >own. It directly extends the non-deferred implementation (DocumentImpl)
 >and is built using its internal methods. There's no mechanism for plugging
 >in a different one.
 >
 >> For instance, is deferred node expansion disabled if the
 >> document-class-name is HTMLDocumentImpl?
 >
 >The HTML DOM is on another branch of the class hierarchy (see below) and
 >isn't capable of deferred node expansion.
 >

What if the value of document-class-name was 
"org.apache.xerces.dom.DeferredDocumentImpl"?  If that wouldn't 
trigger deferred node expansion, I don't know what would?  I suppose, 
though, that the feature defer-node-expansion is irrelevant in this 
case, since it's already be made clear that said feature will be 
used, implicitly.  Is that right?  So, why state that setting the 
document-class-name to DocumentImpl could trigger usage of the 
defer-node-expansion feature (the default is true, according to the 
docs)?  When defer-node-expansion is true and the document-class-name 
is DocumentImpl, Xerces2 sort of ignores the the declared 
DocumentImpl and, instead, uses DeferredDocumentImpl?  Just seems a little odd.

Also, is there a reason the other DOM's aren't given the opportunity 
to use deferred node expansion?  Is it simply a matter of 
effort?  Couldn't DeferredDocumentImpl be abstract and then other 
DOM's can have two versions, one extending DocumentImpl and the other 
extending an abstract DeferredDocumentImpl?  Is it just a matter of 
the effort involved that no one is willing to do or is there some 
technical reason it hasn't been done?

 >CoreDocumentImpl
 >|
 >|__DocumentImpl
 >   |
 >   |__DeferredDocumentImpl
 >   |
 >   |__HTMLDocumentImpl
 >   |
 >   |__PSVIDocumentImpl
 >   |
 >   |__WMLDocumentImpl
 >
 >> I'm hoping it would work in either case or, more
 >> generally, in any case where the custom DOM extends DocumentImpl.
 >>
 >> Also, if anyone on the Xerces team knows anything about XMLC's
 >> LazyDOM, is the deferred node expansion feature equivalent to LazyDOM
 >> or is it something a bit different?
 >
 >Xerces' deferred DOM creates nodes as they are accessed from a table
 >representation which is constructed when the document is parsed. I've
 >never heard of the XMLC LazyDOM so I don't know how that compares to it.
 >

Do you remember Lutris and the Enhydra server?  That's where XMLC 
came from.  It's now hosted by ObjectWeb.  Mark Diekhans invented it 
back in '99 (possibly earlier) or so and extended the Xerces1 
DocumentImpl.  Xerces2 DocumentImpl seems to work fine with it as 
well.  Here's a javadoc snippet from LazyDocument....

  * A DOM Document that supports lazy instantiation of a template DOM.  Nodes
  * in the instance DOM are created as accessed.  This can be either by
  * traversing the tree or by direct access to a node by id number.
  * Instantiation of nodes in the middle of the virtual tree is support.  Thus
  * a node can exist without a parent being expanded. This is used by XMLC,
  * where the dynamic nodes tend to be towards the leaves of the tree.
  * <p>
  * Instances contain a reference to a DOM that is a shared template for the
  * document.  Each node in the template is assigned an integer node id that be
  * used to index tables to directly look up the template of a node created
  * from the template.
  * <p>
  * This DOM also supports associating pre-formatted text with some 
nodes, which
  * is used to avoid expensive string scanning operations during the output
  * of unmodified nodes.
  * <p>
  * When a child of a node is requested, all direct children are expanded.
  * This eliminates a lot of difficult book keep.  Attributes are treated
  * as a separate set from children, only instantiated when an attribute
  * is accessed.

Does that jog your memory, or at least provide the info needed to 
compare/contrast the DeferredDocumentImpl functionality with that of LazyDOM?

Jake

 >>
 >> thanks,
 >>
 >> Jake
 >>
 >>
 >> ---------------------------------------------------------------------
 >> To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
 >> For additional commands, e-mail: general-help@xml.apache.org
 >
 >Thanks.
 >
 >Michael Glavassevich
 >XML Parser Development
 >IBM Toronto Lab
 >E-mail: mrglavas@ca.ibm.com
 >E-mail: mrglavas@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org