You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@xmlbeans.apache.org by Scott Ziegler <zi...@bea.com> on 2003/10/02 21:14:23 UTC

builtin type conversions

For the v2 fast/lossy runtime, I need a fast way to marshall/unmarshall
the XML Schema builtin types.  Ideally this would be a library of static
methods that involve as few objects as possible (hopefully zero in most
cases).  A JAXB impl is required to provide this functionality by
implementing the  DatatypeConverterInterface.  I noticed that
org.apache.xmlbeans.impl.values.JavaFloatHolder has exactly what I want
with these two methods:

    public static String serialize(float f)
    public static float validateLexical(String v,
                                        ValidationContext context)

But unfortunately this pattern is not observed consistently in the other
datatype classes.  Can we agree on some pattern and have all datatype
classes implement them?  We might also consider making such methods
public as I'm sure they will be useful for some.

--Scott



- ---------------------------------------------------------------------
To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/

Re: builtin type conversions

Posted by David Bau <da...@bea.com>.

Scott writes:
> So I think this seperation is useful to avoid an odd dependency graph of
> packages (or even cycles).  Perhaps someone with a better understanding
> of the current dependency graph can suggest how to structure this?  I
> think I'll start by just gathering up the basic parse/print methods into
> a class of static methods.

Just doing the parse/print methods for now seems fine to me; let's not
worry about cleaning up the validation methods yet.

When we work out the v2 SOM we can revisit the validation issue, and
rationalize all those methods then. However, hopefully build dependencies
shouldn't block this code from going into a common place.  If you're getting
stuck by references to the impl class XmlObjectBase just to call things
like getting the intValue() of facet constraints, for example, you should be
able to cast to the public interface org.apache.xmlbeans.SimpleValue
instead.

David

- ---------------------------------------------------------------------
To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/

Re: builtin type conversions

Posted by David Bau <da...@bea.com>.

Scott writes:
> So I think this seperation is useful to avoid an odd dependency graph of
> packages (or even cycles).  Perhaps someone with a better understanding
> of the current dependency graph can suggest how to structure this?  I
> think I'll start by just gathering up the basic parse/print methods into
> a class of static methods.

Just doing the parse/print methods for now seems fine to me; let's not
worry about cleaning up the validation methods yet.

When we work out the v2 SOM we can revisit the validation issue, and
rationalize all those methods then. However, hopefully build dependencies
shouldn't block this code from going into a common place.  If you're getting
stuck by references to the impl class XmlObjectBase just to call things
like getting the intValue() of facet constraints, for example, you should be
able to cast to the public interface org.apache.xmlbeans.SimpleValue
instead.

David

- ---------------------------------------------------------------------
To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/

Re: builtin type conversions

Posted by Scott Ziegler <zi...@bea.com>.

On Fri, 2003-10-03 at 08:31, David Bau wrote:
> I'd be happy if we consolidated all these primitive lexing/validating
> functions on one big class of static functions; or if there were one class
> for each primitive, or one class for lexing and another for validating.  Any
> form of cleanup there would be welcome.

After spending some more time looking at this, I think we should have
one big class of static methods that just does the lexing and printing,
without validation.  The validation should be elsewhere.  It seems to me
that the very basic lex/print functionality should not depend on any
other major pieces of the code, excepting some util classes.  But
validation will depend on the SOM, unless there are big changes coming. 
So I think this seperation is useful to avoid an odd dependency graph of
packages (or even cycles).  Perhaps someone with a better understanding
of the current dependency graph can suggest how to structure this?  I
think I'll start by just gathering up the basic parse/print methods into
a class of static methods.

--Scott

- ---------------------------------------------------------------------
To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/

Re: builtin type conversions

Posted by Scott Ziegler <zi...@bea.com>.

Comments inline.

On Fri, Oct 03, 2003 at 11:31:52AM -0400, David Bau wrote:
> For each primitive we probably need to have four functions:
> 
> (1) Lex the characters into a logical value (and note any well-formedness
> error in doing so)
> float lexFloat(CharSequence input, Collection errors)
> 
>   (1.1) Note that customized whitespace-collapse rules may be need to be
> applied while doing (1) [strings only?]
> String lexString(CharSequence input, int wsRule, Collection errors)
> 
> (2) Validate custom lexical rules for a user-defined type (patterns, length,
> etc) (and note any error)
> void validateFloatLex(CharSequence input, SchemaType actualType, Collection
> errors)
> 
> (3) Validate custom logical rules for a user-defined type (min, max, etc)
> void validateFloatValue(float value, SchemaType acutalType, Collection
> errors)
> 
> (4) Write a value out to characters
> String printFloat(float value)
> 
> In the fastest unmarshalling path, we'd just use (1); but full validators
> would also use (2) and (3).

Those four methods sound good.  I am a little nervous about the
untyped collection for errors but I guess I can live with it.

> I _think_ the whitespace customization issue is unique to xs:string's
> subtypes - can whitespace rules be customized for any other schema types?

I just checked the spec -- you're correct, only for xsd:string and subtypes.

> I think we should probably parse using input from CharSequence rather than
> String so that we can avoid String allocation where possible.  Strings are
> CharSequences anyway, so should be no loss in power. Seem right?

In theory yes, but I have two comments:

1.  I think we should guarantee that we will not hold a ref to the
passed in CharSequence so that callers can reuse a CharSequence object
and not have to create one for each invoke of these methods.

2. Methods like java.lang.Float.parseFloat take a String, not a
CharSequence.  I really do not want to rewrite that method -- look at
the jdk source and I think you'll agree.

> A question: is there a similar strategy that can be done for the printing
> side of things so we can avoid String allocation for data that is just on
> its way into a character array (or even a byte array) anyway?

I think this is the same problem as I mentioned above.  We'd have to
get into the business of writing float -> char[] methods which
personally I'd rather leave to the jdk (or some other project).

> The parsing for the seven schema date types is currently consolidated in
> GDate (and we probalby want to continue doing so), although we might
> eventually consider parsing directly into a Calendar subclass instead of our
> GDate waypoint (any volunteers?)
> 
> I'd be happy if we consolidated all these primitive lexing/validating
> functions on one big class of static functions; or if there were one class
> for each primitive, or one class for lexing and another for validating.  Any
> form of cleanup there would be welcome.

I think I'm leaning toward one class per primitive, but I could be
talked into something else.

> In asking the Q, does this mean you're going to rationalize this stuff
> Scott?  (Please do if you want to.)

Sure I'll do it, unless someone else wants to :)

--Scott

- ---------------------------------------------------------------------
To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/

Re: builtin type conversions

Posted by Scott Ziegler <zi...@bea.com>.

Comments inline.

On Fri, Oct 03, 2003 at 11:31:52AM -0400, David Bau wrote:
> For each primitive we probably need to have four functions:
> 
> (1) Lex the characters into a logical value (and note any well-formedness
> error in doing so)
> float lexFloat(CharSequence input, Collection errors)
> 
>   (1.1) Note that customized whitespace-collapse rules may be need to be
> applied while doing (1) [strings only?]
> String lexString(CharSequence input, int wsRule, Collection errors)
> 
> (2) Validate custom lexical rules for a user-defined type (patterns, length,
> etc) (and note any error)
> void validateFloatLex(CharSequence input, SchemaType actualType, Collection
> errors)
> 
> (3) Validate custom logical rules for a user-defined type (min, max, etc)
> void validateFloatValue(float value, SchemaType acutalType, Collection
> errors)
> 
> (4) Write a value out to characters
> String printFloat(float value)
> 
> In the fastest unmarshalling path, we'd just use (1); but full validators
> would also use (2) and (3).

Those four methods sound good.  I am a little nervous about the
untyped collection for errors but I guess I can live with it.

> I _think_ the whitespace customization issue is unique to xs:string's
> subtypes - can whitespace rules be customized for any other schema types?

I just checked the spec -- you're correct, only for xsd:string and subtypes.

> I think we should probably parse using input from CharSequence rather than
> String so that we can avoid String allocation where possible.  Strings are
> CharSequences anyway, so should be no loss in power. Seem right?

In theory yes, but I have two comments:

1.  I think we should guarantee that we will not hold a ref to the
passed in CharSequence so that callers can reuse a CharSequence object
and not have to create one for each invoke of these methods.

2. Methods like java.lang.Float.parseFloat take a String, not a
CharSequence.  I really do not want to rewrite that method -- look at
the jdk source and I think you'll agree.

> A question: is there a similar strategy that can be done for the printing
> side of things so we can avoid String allocation for data that is just on
> its way into a character array (or even a byte array) anyway?

I think this is the same problem as I mentioned above.  We'd have to
get into the business of writing float -> char[] methods which
personally I'd rather leave to the jdk (or some other project).

> The parsing for the seven schema date types is currently consolidated in
> GDate (and we probalby want to continue doing so), although we might
> eventually consider parsing directly into a Calendar subclass instead of our
> GDate waypoint (any volunteers?)
> 
> I'd be happy if we consolidated all these primitive lexing/validating
> functions on one big class of static functions; or if there were one class
> for each primitive, or one class for lexing and another for validating.  Any
> form of cleanup there would be welcome.

I think I'm leaning toward one class per primitive, but I could be
talked into something else.

> In asking the Q, does this mean you're going to rationalize this stuff
> Scott?  (Please do if you want to.)

Sure I'll do it, unless someone else wants to :)

--Scott

- ---------------------------------------------------------------------
To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/

Re: builtin type conversions

Posted by Scott Ziegler <zi...@bea.com>.

On Fri, 2003-10-03 at 08:31, David Bau wrote:
> I'd be happy if we consolidated all these primitive lexing/validating
> functions on one big class of static functions; or if there were one class
> for each primitive, or one class for lexing and another for validating.  Any
> form of cleanup there would be welcome.

After spending some more time looking at this, I think we should have
one big class of static methods that just does the lexing and printing,
without validation.  The validation should be elsewhere.  It seems to me
that the very basic lex/print functionality should not depend on any
other major pieces of the code, excepting some util classes.  But
validation will depend on the SOM, unless there are big changes coming. 
So I think this seperation is useful to avoid an odd dependency graph of
packages (or even cycles).  Perhaps someone with a better understanding
of the current dependency graph can suggest how to structure this?  I
think I'll start by just gathering up the basic parse/print methods into
a class of static methods.

--Scott

- ---------------------------------------------------------------------
To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/

Re: builtin type conversions

Posted by David Bau <da...@bea.com>.

+1; rationalizing these seems like goodness.

For each primitive we probably need to have four functions:

(1) Lex the characters into a logical value (and note any well-formedness
error in doing so)
float lexFloat(CharSequence input, Collection errors)

  (1.1) Note that customized whitespace-collapse rules may be need to be
applied while doing (1) [strings only?]
String lexString(CharSequence input, int wsRule, Collection errors)

(2) Validate custom lexical rules for a user-defined type (patterns, length,
etc) (and note any error)
void validateFloatLex(CharSequence input, SchemaType actualType, Collection
errors)

(3) Validate custom logical rules for a user-defined type (min, max, etc)
void validateFloatValue(float value, SchemaType acutalType, Collection
errors)

(4) Write a value out to characters
String printFloat(float value)

In the fastest unmarshalling path, we'd just use (1); but full validators
would also use (2) and (3).

I _think_ the whitespace customization issue is unique to xs:string's
subtypes - can whitespace rules be customized for any other schema types?

I think we should probably parse using input from CharSequence rather than
String so that we can avoid String allocation where possible.  Strings are
CharSequences anyway, so should be no loss in power. Seem right?

A question: is there a similar strategy that can be done for the printing
side of things so we can avoid String allocation for data that is just on
its way into a character array (or even a byte array) anyway?

The parsing for the seven schema date types is currently consolidated in
GDate (and we probalby want to continue doing so), although we might
eventually consider parsing directly into a Calendar subclass instead of our
GDate waypoint (any volunteers?)

I'd be happy if we consolidated all these primitive lexing/validating
functions on one big class of static functions; or if there were one class
for each primitive, or one class for lexing and another for validating.  Any
form of cleanup there would be welcome.

In asking the Q, does this mean you're going to rationalize this stuff
Scott?  (Please do if you want to.)

David


- ---------------------------------------------------------------------
To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/

Re: builtin type conversions

Posted by David Bau <da...@bea.com>.

+1; rationalizing these seems like goodness.

For each primitive we probably need to have four functions:

(1) Lex the characters into a logical value (and note any well-formedness
error in doing so)
float lexFloat(CharSequence input, Collection errors)

  (1.1) Note that customized whitespace-collapse rules may be need to be
applied while doing (1) [strings only?]
String lexString(CharSequence input, int wsRule, Collection errors)

(2) Validate custom lexical rules for a user-defined type (patterns, length,
etc) (and note any error)
void validateFloatLex(CharSequence input, SchemaType actualType, Collection
errors)

(3) Validate custom logical rules for a user-defined type (min, max, etc)
void validateFloatValue(float value, SchemaType acutalType, Collection
errors)

(4) Write a value out to characters
String printFloat(float value)

In the fastest unmarshalling path, we'd just use (1); but full validators
would also use (2) and (3).

I _think_ the whitespace customization issue is unique to xs:string's
subtypes - can whitespace rules be customized for any other schema types?

I think we should probably parse using input from CharSequence rather than
String so that we can avoid String allocation where possible.  Strings are
CharSequences anyway, so should be no loss in power. Seem right?

A question: is there a similar strategy that can be done for the printing
side of things so we can avoid String allocation for data that is just on
its way into a character array (or even a byte array) anyway?

The parsing for the seven schema date types is currently consolidated in
GDate (and we probalby want to continue doing so), although we might
eventually consider parsing directly into a Calendar subclass instead of our
GDate waypoint (any volunteers?)

I'd be happy if we consolidated all these primitive lexing/validating
functions on one big class of static functions; or if there were one class
for each primitive, or one class for lexing and another for validating.  Any
form of cleanup there would be welcome.

In asking the Q, does this mean you're going to rationalize this stuff
Scott?  (Please do if you want to.)

David


- ---------------------------------------------------------------------
To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/