You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by sa...@ca.ibm.com on 2002/05/03 18:16:15 UTC

[Xerces2] Please review: schema component API

Hi all,

The attached is the schema component API proposed by Elena Litani, and
reviewed by some of the Xerces developers. Because it's experimental, the
interfaces are currently in the package impl.xs.psvi.

Comments are welcome. In particular, we'd appreciate your input on the
following questions.

1. Annotation

An "annotation" contains:
{application information}: A sequence of element information items.
{user information}: A sequence of element information items.
{attributes}: A sequence of attribute information items.

How can we expose *information items*, in what form?

1) DOM nodes: this is easy to implement (because we already parse schema
documents to DOM trees), and it benefits DOM users, but SAX users won't be
able to use it, and we don't want a schema API to have explicit dependency
on DOM.
2) String representation: this doesn't depend on a certain API, but it's
inefficient (we need to serialize DOM nodes to strings), and we also need
to provide namespace declarations, and xml:base, xml:lang attributes that
are defined on ancestor elements.
3) Other suggestions?

2. getName()/getNamespace()

Currently these two methods are on XSObject, and inherited by each
component interface. But some components don't have name/namespace
properties (particle, model group, wildcard, etc.). So should such methods
be on the base interface, or should individual interfaces (where
name/namespace make sense) define them?

We think it depends on what's the typical use of the components:
1) "check names; if names satisfy certain condition, cast, further
operation". If this is the typical use, then the methods should be on
XSObject.
2) "cast; get names; get other properties". In this case, casting happens
anyway, and the methods for names should be on individual interfaces.

What do you think?

3. Actual values

How to expose actual values? Do we want to define interfaces to represent
the actual values (date/time types especially)? After we have such
interfaces, we can return "Object" from related methods.

Currently the methods return a String: the lexical representation of the
value. Again we have a question: should it be any lexical representation,
or normalized, or canonical representation? The canonical one would make
more sense, but:
1) ".0" is appended to integer values (for example, the canonical
representation for integer "1" is "1.0");
2) canonical representation of "base64Binary" is not finalized yet (as far
as I know);
3) do we have support for URI canonicalization?

A related issue to "actual values" is how to expose "equality" and
"ordered" fundamental facets. "Equality" is a fundamental facets that every
value space (hence, every simple type) has, so it makes sense to have a
method on XSSimpleType to check whether two *actual* values are equal. It
also makes sense to compare the order of two *actual* values if the type is
ordered (partially or totally). But both depend on how *actual* values are
represented. We may want to add these two methods when we have interfaces
for actual values.

4. Get global type definitions

One XSModel, you can find these two methods:

    public XSNamedMap getComponents(short objectType);
    public XSTypeDefinition getTypeDefinition(String name,
                                              String namespace);

The first method *getComponents* takes a parameter indicating the kind of
components, and returns all components of that kind. For type definitions,
we currently have 2 constants: SIMPLE_TYPE and COMPLEX_TYPE, so the user
can either (using *getComponents*) get all simple type definitions, or get
all complex type definitions.

The second method "getTypeDefinition" returns a type definition that
matches the given name/namespace. So this method doesn't care about whether
it's simple or complex type.

This seems to be a bit inconsistent. Two approaches can be adopted to solve
it:
1) Remove *getTypeDefinition* and introduce two methods:
getSimpleTypeDefinition and getComplexTypeDefinition.
2) Remove constants SIMPLE_TYPE and COMPLEX_TYPE, then introduce a new one
TYPE_DEFINITION.

Of course, if approach 2 is taken, a method will be introduced to
TypeDefinition interface to tell whether the type is simple or complex.

>From the schema spec, simple and complex types share the same symbol space,
and "schema" only has one property {type definitions} for both of them, so
approach 2 seems to be more proper, conceptually.

But from certain users, we understand that approach 1 is more preferred,
because it returns what the user really wants.

What do you think?

5. min/max occurrence values

On XSParticle interface, getMin/MaxOccurs returns *int*. Is this enough.
The spec says nonNegtiveInteger (0~+inf). Should we return BigInteger?
Would *long* make it better? (*long* doesn't solve the problem, just it
allows more values than *int*).

Thanks,
Sandy Gao
Software Developer, IBM Canada
(1-905) 413-3255
sandygao@ca.ibm.com

(See attached file: psvi.zip)

Re: [Xerces2] Please review: schema component API

Posted by Fabio Riccardi <fa...@xqrl.com>.
On Friday, May 3, 2002, at 09:16 AM, sandygao@ca.ibm.com wrote:
> 1. Annotation
>
> An "annotation" contains:
> {application information}: A sequence of element information items.
> {user information}: A sequence of element information items.
> {attributes}: A sequence of attribute information items.
>
> How can we expose *information items*, in what form?

Isn't Xerces capable of generating XNI/SAX events from a DOM?

> 2. getName()/getNamespace()
> ...
> What do you think?

Aesthetic considerations aside, they can simply be implemented by a 
{ return null; } where they don't apply, making things simpler.

> 3. Actual values
>
> How to expose actual values? Do we want to define interfaces to 
> represent
> the actual values (date/time types especially)? After we have such
> interfaces, we can return "Object" from related methods.

I think that date and time objects need to be wrapped to associate 
appropriate methods to all the different variants (i.e. date vs 
duration, etc.).

The others can probably do with a simple Object representation 
containing the actual value, this would slightly complicate things for 
comparison but would keep memory occupancy to acceptable levels (adding 
a wrapper would imply adding quite a bit of overhead to the memory 
footprint).

Another solution could be to have an XSObject, XSBoolean, XSDecimal, 
etc. kind of hierarchy where equality and ordered can be defined 
appropriately in the base XSObject (vs relying on Object), without 
incurring in too much additional wrapper overhead. This route could also 
provide a simple way of obtaining the normalized value from the XSObject 
and keeping the original lexical representation separately.

> 4. Get global type definitions
> ...
> What do you think?

I'd vote for the second approach.

> 5. min/max occurrence values
>
> On XSParticle interface, getMin/MaxOccurs returns *int*. Is this enough.
> The spec says nonNegtiveInteger (0~+inf). Should we return BigInteger?
> Would *long* make it better? (*long* doesn't solve the problem, just it
> allows more values than *int*).

Considering that these are occurrences, a normal int looks like a pretty 
large number to me, I cannot easily imagine an application that has more 
than 2G occurrences of something in a single document. Of course 640K of 
RAM are enough for most people too ;)

Thanks,

  - Fabio


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: [Xerces2] Please review: schema component API

Posted by John Utz <ut...@singingfish.com>.
elena, thankyou so much for doing the work.  
sandy, thankyou so much for sending it on.

i expect to review it closely first thing next week.

On Fri, 3 May 2002 sandygao@ca.ibm.com wrote:

> Hi all,
> 
> The attached is the schema component API proposed by Elena Litani, and
> reviewed by some of the Xerces developers. Because it's experimental, the
> interfaces are currently in the package impl.xs.psvi.
> 
> Comments are welcome. In particular, we'd appreciate your input on the
> following questions.
> 
> 1. Annotation
> 
> An "annotation" contains:
> {application information}: A sequence of element information items.
> {user information}: A sequence of element information items.
> {attributes}: A sequence of attribute information items.
> 
> How can we expose *information items*, in what form?
> 
> 1) DOM nodes: this is easy to implement (because we already parse schema
> documents to DOM trees), and it benefits DOM users, but SAX users won't be
> able to use it, and we don't want a schema API to have explicit dependency
> on DOM.
> 2) String representation: this doesn't depend on a certain API, but it's
> inefficient (we need to serialize DOM nodes to strings), and we also need
> to provide namespace declarations, and xml:base, xml:lang attributes that
> are defined on ancestor elements.
> 3) Other suggestions?
> 
> 2. getName()/getNamespace()
> 
> Currently these two methods are on XSObject, and inherited by each
> component interface. But some components don't have name/namespace
> properties (particle, model group, wildcard, etc.). So should such methods
> be on the base interface, or should individual interfaces (where
> name/namespace make sense) define them?
> 
> We think it depends on what's the typical use of the components:
> 1) "check names; if names satisfy certain condition, cast, further
> operation". If this is the typical use, then the methods should be on
> XSObject.
> 2) "cast; get names; get other properties". In this case, casting happens
> anyway, and the methods for names should be on individual interfaces.
> 
> What do you think?
> 
> 3. Actual values
> 
> How to expose actual values? Do we want to define interfaces to represent
> the actual values (date/time types especially)? After we have such
> interfaces, we can return "Object" from related methods.
> 
> Currently the methods return a String: the lexical representation of the
> value. Again we have a question: should it be any lexical representation,
> or normalized, or canonical representation? The canonical one would make
> more sense, but:
> 1) ".0" is appended to integer values (for example, the canonical
> representation for integer "1" is "1.0");
> 2) canonical representation of "base64Binary" is not finalized yet (as far
> as I know);
> 3) do we have support for URI canonicalization?
> 
> A related issue to "actual values" is how to expose "equality" and
> "ordered" fundamental facets. "Equality" is a fundamental facets that every
> value space (hence, every simple type) has, so it makes sense to have a
> method on XSSimpleType to check whether two *actual* values are equal. It
> also makes sense to compare the order of two *actual* values if the type is
> ordered (partially or totally). But both depend on how *actual* values are
> represented. We may want to add these two methods when we have interfaces
> for actual values.
> 
> 4. Get global type definitions
> 
> One XSModel, you can find these two methods:
> 
>     public XSNamedMap getComponents(short objectType);
>     public XSTypeDefinition getTypeDefinition(String name,
>                                               String namespace);
> 
> The first method *getComponents* takes a parameter indicating the kind of
> components, and returns all components of that kind. For type definitions,
> we currently have 2 constants: SIMPLE_TYPE and COMPLEX_TYPE, so the user
> can either (using *getComponents*) get all simple type definitions, or get
> all complex type definitions.
> 
> The second method "getTypeDefinition" returns a type definition that
> matches the given name/namespace. So this method doesn't care about whether
> it's simple or complex type.
> 
> This seems to be a bit inconsistent. Two approaches can be adopted to solve
> it:
> 1) Remove *getTypeDefinition* and introduce two methods:
> getSimpleTypeDefinition and getComplexTypeDefinition.
> 2) Remove constants SIMPLE_TYPE and COMPLEX_TYPE, then introduce a new one
> TYPE_DEFINITION.
> 
> Of course, if approach 2 is taken, a method will be introduced to
> TypeDefinition interface to tell whether the type is simple or complex.
> 
> From the schema spec, simple and complex types share the same symbol space,
> and "schema" only has one property {type definitions} for both of them, so
> approach 2 seems to be more proper, conceptually.
> 
> But from certain users, we understand that approach 1 is more preferred,
> because it returns what the user really wants.
> 
> What do you think?
> 
> 5. min/max occurrence values
> 
> On XSParticle interface, getMin/MaxOccurs returns *int*. Is this enough.
> The spec says nonNegtiveInteger (0~+inf). Should we return BigInteger?
> Would *long* make it better? (*long* doesn't solve the problem, just it
> allows more values than *int*).
> 
> Thanks,
> Sandy Gao
> Software Developer, IBM Canada
> (1-905) 413-3255
> sandygao@ca.ibm.com
> 
> (See attached file: psvi.zip)


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org