You are viewing a plain text version of this content. The canonical link for it is here.

Posted to xmlbeans-dev@xml.apache.org by Aleksander Slominski <as...@cs.indiana.edu> on 2004/02/28 23:44:17 UTC

granularity of binding selection [Re: Status of Marshalling/Unmarshalling in V2? Javadoc?]

David Remy wrote:

>          XML Schema Support ----------->
>                   Inherits from:
>              Object	  ||	XmlObject
>              (unboxed)   ||    (boxed)
>  ^       |------------------------------
>  |       |  xmlbeans     ||
>  |       |  pojo         ||  xmlbeans
>  |       |  lossless     ||  'classic'
>  |       | (Not impl V2) ||
>  |       | --------------||----------
>  |       | xmlbeans      ||  xmlbeans 
> XML      |  pojo         ||  xmlobject
> Infoset  | (e.g. JAX/RPC)||  only
> Fidelity |               || (Not impl V2)
>          ------------------------------
>
>there needs to be whole paper written on this grid (also the names for each of these quadrants needs work).  the left side represents pojo, and inherit from java.lang.Object.  the right side represents high schema compliance and inherits from XmlObject, critical for supporting 100% schema.  the boundary between the left and right side is a double line because it will be a pretty big hurdle to move between the left and the right side.  a simple compilation switch will not do it since the classes themselves and the api have tight coupling with their ancestral parent (object or xmlobject).  however it should be possible to move up on the same side with a simple compilation switch.  
>  
>
hi David,

did you consider a case when different parts of XML stream (document) 
may be bound using different techniques? including case when parts of 
XML are not bound at all i.e. event stream is available for direct 
consumption in processing pipeline?

>the bottom left is one of the major objectives for v2 to support a non-lossy use case for java-xml binding and will definitely be done for v2.  the upper right is basically xmlbeans v1 plus some improvements for v2.  the bottom right is a very interesting case that would great to have for v2 but isn't yet in plan.  the idea there is that you should be able to use the high schema compliance xmlobject path but not necessary incur the overhead of having both an xmlstore and an xmlobject layer.  if you don't need full infoset fidelity and access to the underlying xmlstore you should be able to compile with 'xmlobject only', lacking a better name for it, and get all of the schema compliance.  at parse time the java objects are created and populated directly on unmarshal.  if you find you need access to the full infoset later you can just flick a compile switch, 'non-lossy' or 'full fidelity' (again naming issues), and all of your existing code will just work plus you can use xmlcursor to get to the xmlstore.  this is probably 80% or higher of the current usage for xmlbeans so we shoud scope this out better and implement it.  
>
typical case i think about is SOAP message that is framed into SAAJ/DOM 
(or any other DOM-like or XML Infoset API) and SOAP headers are bound 
using xmlbeans 'classic' or xmlbeans pojo but SOAP body content if it is 
not Fault is left untouched as XML event stream (implemented with 
streamable xmlstore that has StAX interface)? that XML event stream 
could be bound to XML beans or used to build SAAJ/DOM but decision is 
left to the application and it would allow to write processors that can 
handle very large messages (streaming) but still reap benefits of Java 
types generated from XML schema if they need to bind some parts of event 
stream ...

i played with such approach in XPP2 unfortunately i made API way too low 
level. the idea is that XML element can be in two states:
* expanded: underlying stream representing element content was fully 
processed
* non-expanded: in this case XML stream is currently pointing at this 
element and instead of creating XML children user can access event 
stream to pull events
for more details 
http://www.extreme.indiana.edu/xgws/xsoap/xpp/download/PullParser2/doc/api/org/gjt/xpp/XmlPullNode.html

i think i can use easily enough xmlbeans v1 to bind any sub-stream that 
represents subtree of XML (everything between start tag and its 
corresponding end tag) but there is no control available to tell 
xmlbeans to defer binding any sub-sub-stream content?

>the other case represented in the grid is the top left, 'lossless pojo" scenario.  this isn't as intiutive but we have run across scenarios where this has been requested (fervently).  for example if the lower left quadrant was needed for runtime cases but the desire to keep infoset fidelity (like preserve comments) at design time was required.  we have emailed/white boarded some strategies awhile back for this (not on apache dev) which involved a 'best effort' fidelity scenario which essentially involved mapping nodes to the created pojo objects such that at marshal time, with some pretty significant constraints, the infoset could be preserved.
>  
>
the last question: will it be possible to rebind parts of xmlstore with 
different schema impl. i.e would i be able in xmlbeans v2 to switch 
parts of XML from xmlbeans to pojo (and vice versa) or even have then 
all as simultaneous views  (xmlstore + pojo + xmlbeans) and XML editing 
changes to be reflected in each view?

it seems to me that if carefully designed it should be possible to have 
nice infoset API (xmlstore, StAX) and schema binding (XmlBeans pojos and 
XmlObject) and still expose to advanced users XML stream for maximum 
performance.

i am very interested to hear your opinion about those issues.

thanks,

alek

-- 
The best way to predict the future is to invent it - Alan Kay


- ---------------------------------------------------------------------
To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/

Re: granularity of binding selection [Re: Status of Marshalling/Unmarshalling in V2? Javadoc?]

Posted by Aleksander Slominski <as...@cs.indiana.edu>.

David Bau wrote:

>>did you consider a case when different parts of
>>XML stream (document) may be bound using different
>>techniques? including case when parts of XML are not
>>bound at all i.e. event stream is available for
>>direct consumption in processing pipeline?
>>    
>>
>
>Can't speak for remy, but I know the main interesting thing
>about v2 is to try to figure out how to apply different
>binding styles at once.  You want to either be able to bind
>in XmlObject style, plain POJO style, or perhaps not bind at
>all and leave it as a DOM.  Ideally, you can nest one within
>the other and vice-versa.
>  
>
yes. i think nesting is a powerful concept allowing for freedom to treat 
XML any way you need at the given moment.

>But the dichotomy between bringing the data into memory and
>letting it stream is different from binding or not-binding.
>The reason?  Streaming and in-memory are two different API
>approaches.  If you try to intermix them, then in the common
>case, you end up forcing everything into memory anyway
>(e.g., if somebody asks for "the last parameter" to be bound
>to "the last element" as a POJO, while the "first parameter"
>is left as a stream...  To actually supply the last
>parameter, you need to read the whole stream into memory
>anyway; at which point, the user might as well also have the
>ability to get a DOM).
>  
>
there is always edge case like that however i can also see usefulness of 
reading XML and allowing user to discard parts that are no longer needed 
something similar to a typical loop that reads list of entries from XML 
stream. the stream may not fit into memory but at any given time user 
needs to see only small part of it (with some context).

>So for the question of streaming versus in-memory, I feel
>like the right solution is not to treat them as peers, but
>to let them layer on top of one another.  You should be able
>to provide an in-memory model by binding on top of a stream.
>Or you should be able to get a stream from an in-memory
>model.
>  
>
they will have to be layered but the question is how to get from higher 
level layers (DOM, XmlObject) to underlying stream layers (StAX, token 
stream, byte stream).

>>i played with such approach in XPP2 unfortunately i
>>made API way too low level. the idea is that XML
>>element can be in two states:
>> * expanded: underlying stream representing
>>   element content was fully processed
>> * non-expanded: in this case XML stream is
>>   currently pointing at this element and instead
>>   of creating XML children user can access event
>>   stream to pull events
>> for more details
>>    
>>
>http://www.extreme.indiana.edu/xgws/xsoap/xpp/download/PullParser2/doc/api/org/gjt/xpp/XmlPullNode.html
>  
>
>>i think i can use easily enough xmlbeans v1 to bind
>>any sub-stream that represents subtree of XML
>>(everything between start tag and its
>> corresponding end tag) but there is no control
>> available to tell xmlbeans to defer binding any
>>sub-sub-stream content?
>>    
>>
>
>XMLBeans v1 never defers _loading_ the XML tokens into
>memory.  It might make an interesting project for somebody
>to see if loading can be made "incremental" so it's only
>done on-demand, but XMLBeans v1 always exhausts the stream
>to the end right now.
>  
>
i think that is interesting in context of trying to make XmlBeans to 
work with large inputs *and* to provide easy to use high level API like 
DOM / XmlObject.

>However, XMLBeans v1 always defers _binding_ the XML into
>Java object until the moment when you call the getter to
>access the particular part of the tree.  When you load, what
>you load is the raw XML infoset tokens, as quickly as
>possible.  The XmlObject objects don't come into existence
>until later.  In fact, we're missing an option to request
>all the binding be done up front; we always do it lazily.
>  
>
that sounds good.

>>the last question: will it be possible to rebind
>>parts of xmlstore with different schema impl. i.e
>>would i be able in xmlbeans v2 to switch parts
>>of XML from xmlbeans to pojo (and vice versa) or
>>even have then all as simultaneous views  (xmlstore
>>+ pojo + xmlbeans) and XML editing changes to
>>be reflected in each view?
>>    
>>
>
>There are technical challenges in presenting a truly pojo
>view at the same time as synchronizing editing changes.  A
>true pojo is "just the user's code", and the user doesn't
>need to tell us what they've done within a setFoo() method.
>
>I don't think anything technical prevents the three views
>from being presented simultanesouly for read-only use, or
>for xmlbeans+xmlstore from being consistent on writes.
>
>However, it does seems like a usability question.  Looking
>at the way most people use binding, they expect a _single_
>early-bound Java class to be bound to a schema type and are
>suprised if there is more than one.
>
>I'd suggest that:
>
>(1) There should be two bindings for each built-in schema
>type (i.e., xsd:string -> XmlString as well as
>java.lang.STring)
>(2) But for user-defined schema types, we should probably
>design the tools to either require or encourage that there
>be only one binding, or at least that one of them is
>"primary".
>
>(2) Purely seems like a usability question to me, though.
>
>  
>
so here is really the question whether to maintain replicated data (xml 
token, XmlObject, XML Bean, pojo have their own copy) or have them to  
access one shared  data  (XML Infoset directly).

>(IMO only...)
>
>  
>
thanks for info!

alek

-- 
The best way to predict the future is to invent it - Alan Kay


- ---------------------------------------------------------------------
To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/

Re: granularity of binding selection [Re: Status of Marshalling/Unmarshalling in V2? Javadoc?]

Posted by Aleksander Slominski <as...@cs.indiana.edu>.

David Bau wrote:

>>did you consider a case when different parts of
>>XML stream (document) may be bound using different
>>techniques? including case when parts of XML are not
>>bound at all i.e. event stream is available for
>>direct consumption in processing pipeline?
>>    
>>
>
>Can't speak for remy, but I know the main interesting thing
>about v2 is to try to figure out how to apply different
>binding styles at once.  You want to either be able to bind
>in XmlObject style, plain POJO style, or perhaps not bind at
>all and leave it as a DOM.  Ideally, you can nest one within
>the other and vice-versa.
>  
>
yes. i think nesting is a powerful concept allowing for freedom to treat 
XML any way you need at the given moment.

>But the dichotomy between bringing the data into memory and
>letting it stream is different from binding or not-binding.
>The reason?  Streaming and in-memory are two different API
>approaches.  If you try to intermix them, then in the common
>case, you end up forcing everything into memory anyway
>(e.g., if somebody asks for "the last parameter" to be bound
>to "the last element" as a POJO, while the "first parameter"
>is left as a stream...  To actually supply the last
>parameter, you need to read the whole stream into memory
>anyway; at which point, the user might as well also have the
>ability to get a DOM).
>  
>
there is always edge case like that however i can also see usefulness of 
reading XML and allowing user to discard parts that are no longer needed 
something similar to a typical loop that reads list of entries from XML 
stream. the stream may not fit into memory but at any given time user 
needs to see only small part of it (with some context).

>So for the question of streaming versus in-memory, I feel
>like the right solution is not to treat them as peers, but
>to let them layer on top of one another.  You should be able
>to provide an in-memory model by binding on top of a stream.
>Or you should be able to get a stream from an in-memory
>model.
>  
>
they will have to be layered but the question is how to get from higher 
level layers (DOM, XmlObject) to underlying stream layers (StAX, token 
stream, byte stream).

>>i played with such approach in XPP2 unfortunately i
>>made API way too low level. the idea is that XML
>>element can be in two states:
>> * expanded: underlying stream representing
>>   element content was fully processed
>> * non-expanded: in this case XML stream is
>>   currently pointing at this element and instead
>>   of creating XML children user can access event
>>   stream to pull events
>> for more details
>>    
>>
>http://www.extreme.indiana.edu/xgws/xsoap/xpp/download/PullParser2/doc/api/org/gjt/xpp/XmlPullNode.html
>  
>
>>i think i can use easily enough xmlbeans v1 to bind
>>any sub-stream that represents subtree of XML
>>(everything between start tag and its
>> corresponding end tag) but there is no control
>> available to tell xmlbeans to defer binding any
>>sub-sub-stream content?
>>    
>>
>
>XMLBeans v1 never defers _loading_ the XML tokens into
>memory.  It might make an interesting project for somebody
>to see if loading can be made "incremental" so it's only
>done on-demand, but XMLBeans v1 always exhausts the stream
>to the end right now.
>  
>
i think that is interesting in context of trying to make XmlBeans to 
work with large inputs *and* to provide easy to use high level API like 
DOM / XmlObject.

>However, XMLBeans v1 always defers _binding_ the XML into
>Java object until the moment when you call the getter to
>access the particular part of the tree.  When you load, what
>you load is the raw XML infoset tokens, as quickly as
>possible.  The XmlObject objects don't come into existence
>until later.  In fact, we're missing an option to request
>all the binding be done up front; we always do it lazily.
>  
>
that sounds good.

>>the last question: will it be possible to rebind
>>parts of xmlstore with different schema impl. i.e
>>would i be able in xmlbeans v2 to switch parts
>>of XML from xmlbeans to pojo (and vice versa) or
>>even have then all as simultaneous views  (xmlstore
>>+ pojo + xmlbeans) and XML editing changes to
>>be reflected in each view?
>>    
>>
>
>There are technical challenges in presenting a truly pojo
>view at the same time as synchronizing editing changes.  A
>true pojo is "just the user's code", and the user doesn't
>need to tell us what they've done within a setFoo() method.
>
>I don't think anything technical prevents the three views
>from being presented simultanesouly for read-only use, or
>for xmlbeans+xmlstore from being consistent on writes.
>
>However, it does seems like a usability question.  Looking
>at the way most people use binding, they expect a _single_
>early-bound Java class to be bound to a schema type and are
>suprised if there is more than one.
>
>I'd suggest that:
>
>(1) There should be two bindings for each built-in schema
>type (i.e., xsd:string -> XmlString as well as
>java.lang.STring)
>(2) But for user-defined schema types, we should probably
>design the tools to either require or encourage that there
>be only one binding, or at least that one of them is
>"primary".
>
>(2) Purely seems like a usability question to me, though.
>
>  
>
so here is really the question whether to maintain replicated data (xml 
token, XmlObject, XML Bean, pojo have their own copy) or have them to  
access one shared  data  (XML Infoset directly).

>(IMO only...)
>
>  
>
thanks for info!

alek

-- 
The best way to predict the future is to invent it - Alan Kay


- ---------------------------------------------------------------------
To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/

Re: granularity of binding selection [Re: Status of Marshalling/Unmarshalling in V2? Javadoc?]

Posted by David Bau <da...@hotmail.com>.

> did you consider a case when different parts of
> XML stream (document) may be bound using different
> techniques? including case when parts of XML are not
> bound at all i.e. event stream is available for
> direct consumption in processing pipeline?

Can't speak for remy, but I know the main interesting thing
about v2 is to try to figure out how to apply different
binding styles at once.  You want to either be able to bind
in XmlObject style, plain POJO style, or perhaps not bind at
all and leave it as a DOM.  Ideally, you can nest one within
the other and vice-versa.

But the dichotomy between bringing the data into memory and
letting it stream is different from binding or not-binding.
The reason?  Streaming and in-memory are two different API
approaches.  If you try to intermix them, then in the common
case, you end up forcing everything into memory anyway
(e.g., if somebody asks for "the last parameter" to be bound
to "the last element" as a POJO, while the "first parameter"
is left as a stream...  To actually supply the last
parameter, you need to read the whole stream into memory
anyway; at which point, the user might as well also have the
ability to get a DOM).

So for the question of streaming versus in-memory, I feel
like the right solution is not to treat them as peers, but
to let them layer on top of one another.  You should be able
to provide an in-memory model by binding on top of a stream.
Or you should be able to get a stream from an in-memory
model.

> i played with such approach in XPP2 unfortunately i
> made API way too low level. the idea is that XML
> element can be in two states:
>  * expanded: underlying stream representing
>    element content was fully processed
>  * non-expanded: in this case XML stream is
>    currently pointing at this element and instead
>    of creating XML children user can access event
>    stream to pull events
>  for more details
http://www.extreme.indiana.edu/xgws/xsoap/xpp/download/PullParser2/doc/api/org/gjt/xpp/XmlPullNode.html
> i think i can use easily enough xmlbeans v1 to bind
> any sub-stream that represents subtree of XML
> (everything between start tag and its
>  corresponding end tag) but there is no control
>  available to tell xmlbeans to defer binding any
> sub-sub-stream content?

XMLBeans v1 never defers _loading_ the XML tokens into
memory.  It might make an interesting project for somebody
to see if loading can be made "incremental" so it's only
done on-demand, but XMLBeans v1 always exhausts the stream
to the end right now.

However, XMLBeans v1 always defers _binding_ the XML into
Java object until the moment when you call the getter to
access the particular part of the tree.  When you load, what
you load is the raw XML infoset tokens, as quickly as
possible.  The XmlObject objects don't come into existence
until later.  In fact, we're missing an option to request
all the binding be done up front; we always do it lazily.

> the last question: will it be possible to rebind
> parts of xmlstore with different schema impl. i.e
> would i be able in xmlbeans v2 to switch parts
> of XML from xmlbeans to pojo (and vice versa) or
> even have then all as simultaneous views  (xmlstore
> + pojo + xmlbeans) and XML editing changes to
> be reflected in each view?

There are technical challenges in presenting a truly pojo
view at the same time as synchronizing editing changes.  A
true pojo is "just the user's code", and the user doesn't
need to tell us what they've done within a setFoo() method.

I don't think anything technical prevents the three views
from being presented simultanesouly for read-only use, or
for xmlbeans+xmlstore from being consistent on writes.

However, it does seems like a usability question.  Looking
at the way most people use binding, they expect a _single_
early-bound Java class to be bound to a schema type and are
suprised if there is more than one.

I'd suggest that:

(1) There should be two bindings for each built-in schema
type (i.e., xsd:string -> XmlString as well as
java.lang.STring)
(2) But for user-defined schema types, we should probably
design the tools to either require or encourage that there
be only one binding, or at least that one of them is
"primary".

(2) Purely seems like a usability question to me, though.

(IMO only...)

David


- ---------------------------------------------------------------------
To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/

Re: granularity of binding selection [Re: Status of Marshalling/Unmarshalling in V2? Javadoc?]

Posted by David Bau <da...@hotmail.com>.

> did you consider a case when different parts of
> XML stream (document) may be bound using different
> techniques? including case when parts of XML are not
> bound at all i.e. event stream is available for
> direct consumption in processing pipeline?

Can't speak for remy, but I know the main interesting thing
about v2 is to try to figure out how to apply different
binding styles at once.  You want to either be able to bind
in XmlObject style, plain POJO style, or perhaps not bind at
all and leave it as a DOM.  Ideally, you can nest one within
the other and vice-versa.

But the dichotomy between bringing the data into memory and
letting it stream is different from binding or not-binding.
The reason?  Streaming and in-memory are two different API
approaches.  If you try to intermix them, then in the common
case, you end up forcing everything into memory anyway
(e.g., if somebody asks for "the last parameter" to be bound
to "the last element" as a POJO, while the "first parameter"
is left as a stream...  To actually supply the last
parameter, you need to read the whole stream into memory
anyway; at which point, the user might as well also have the
ability to get a DOM).

So for the question of streaming versus in-memory, I feel
like the right solution is not to treat them as peers, but
to let them layer on top of one another.  You should be able
to provide an in-memory model by binding on top of a stream.
Or you should be able to get a stream from an in-memory
model.

> i played with such approach in XPP2 unfortunately i
> made API way too low level. the idea is that XML
> element can be in two states:
>  * expanded: underlying stream representing
>    element content was fully processed
>  * non-expanded: in this case XML stream is
>    currently pointing at this element and instead
>    of creating XML children user can access event
>    stream to pull events
>  for more details
http://www.extreme.indiana.edu/xgws/xsoap/xpp/download/PullParser2/doc/api/org/gjt/xpp/XmlPullNode.html
> i think i can use easily enough xmlbeans v1 to bind
> any sub-stream that represents subtree of XML
> (everything between start tag and its
>  corresponding end tag) but there is no control
>  available to tell xmlbeans to defer binding any
> sub-sub-stream content?

XMLBeans v1 never defers _loading_ the XML tokens into
memory.  It might make an interesting project for somebody
to see if loading can be made "incremental" so it's only
done on-demand, but XMLBeans v1 always exhausts the stream
to the end right now.

However, XMLBeans v1 always defers _binding_ the XML into
Java object until the moment when you call the getter to
access the particular part of the tree.  When you load, what
you load is the raw XML infoset tokens, as quickly as
possible.  The XmlObject objects don't come into existence
until later.  In fact, we're missing an option to request
all the binding be done up front; we always do it lazily.

> the last question: will it be possible to rebind
> parts of xmlstore with different schema impl. i.e
> would i be able in xmlbeans v2 to switch parts
> of XML from xmlbeans to pojo (and vice versa) or
> even have then all as simultaneous views  (xmlstore
> + pojo + xmlbeans) and XML editing changes to
> be reflected in each view?

There are technical challenges in presenting a truly pojo
view at the same time as synchronizing editing changes.  A
true pojo is "just the user's code", and the user doesn't
need to tell us what they've done within a setFoo() method.

I don't think anything technical prevents the three views
from being presented simultanesouly for read-only use, or
for xmlbeans+xmlstore from being consistent on writes.

However, it does seems like a usability question.  Looking
at the way most people use binding, they expect a _single_
early-bound Java class to be bound to a schema type and are
suprised if there is more than one.

I'd suggest that:

(1) There should be two bindings for each built-in schema
type (i.e., xsd:string -> XmlString as well as
java.lang.STring)
(2) But for user-defined schema types, we should probably
design the tools to either require or encourage that there
be only one binding, or at least that one of them is
"primary".

(2) Purely seems like a usability question to me, though.

(IMO only...)

David


- ---------------------------------------------------------------------
To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/