You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Tim Harsch <ha...@gmail.com> on 2014/05/12 19:26:52 UTC

typed literals to java classes, problem with short and byte.

According to the docs:
http://jena.apache.org/documentation/notes/typed-literals.html

These are all available as static member variables from
com.hp.hpl.jena.datatypes.xsd.XSDDatatype<http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/datatypes/xsd/XSDDatatype.html>
.

Of these types, the following are registered as the default type to use to
represent certain Java classes:
  Java class xsd type   Float float  Double double  Integer int  Long long
Short short  Byte byte  BigInteger integer  BigDecimal decimal  Boolean
Boolean  String string

This is what I am seeing for xsd:short and xsd:byte.  I'm puzzled by the
type from getValue.

CODE:

System.out.println( "RDFDatatype: " + literal.getDatatype().toString() );
System.out.println( "Datatype URI: " + literal.getDatatypeURI() );
System.out.println( "getValue java class: " +
((Literal)literal).getValue().getClass()
);

OUTPUT:

RDFDatatype: Datatype[http://www.w3.org/2001/XMLSchema#byte -> class
java.lang.Byte]
Datatype URI: http://www.w3.org/2001/XMLSchema#byte
getValue java class: class java.lang.Integer
RDFDatatype: Datatype[http://www.w3.org/2001/XMLSchema#short -> class
java.lang.Short]
Datatype URI: http://www.w3.org/2001/XMLSchema#short
getValue java class: class java.lang.Integer

So, is the expected behavior?

Thanks,
Tim

Re: typed literals to java classes, problem with short and byte.

Posted by Dave Reynolds <da...@gmail.com>.
Hi Tim,

On 13/05/14 18:53, Tim Harsch wrote:
> Thanks Dave.  Makes sense.   Why though does RDFDatatype says the class
> would be Byte and would be Short ?  I guess there is no code that consults
> RDFDatatype to ask what they type should be before creating it.   Is this
> just an inconsistency in the API?  Or bug in the code?

Arguably an insufficiently clear javadoc.

The issue is that the TypeMapper, which tells you what datatype to use 
when *encoding* a java type, is currently initialized from the 
getJavaClass() for those datatypes. We wanted people to be able to use 
shorts and bytes in java and still get them encoded appropriately.

Which is why the javadoc for RDFDatatype#getJavaClass says:

"""
If this datatype is used as the cannonical representation for a 
particular java datatype then return that java type, otherwise returns null.
"""

I.e. it records the java to xsd mapping, which is not the same as the 
xsd to java mapping if we don't enforce strict round tripping.

In fact the type mapper allows you to register types directly, which 
allows us to have a many-to-one map from java class to RDF datatype. So 
the use of getJavaClass is not really necessary and arguably confusing 
in a world without round tripping guarantees.

Dave

> On Tue, May 13, 2014 at 12:51 AM, Dave Reynolds
> <da...@gmail.com>wrote:
>
>> On 12/05/14 18:26, Tim Harsch wrote:
>>
>>> According to the docs:
>>> http://jena.apache.org/documentation/notes/typed-literals.html
>>>
>>> These are all available as static member variables from
>>> com.hp.hpl.jena.datatypes.xsd.XSDDatatype<http://jena.
>>> apache.org/documentation/javadoc/jena/com/hp/hpl/jena/
>>> datatypes/xsd/XSDDatatype.html>
>>>
>>> .
>>>
>>> Of these types, the following are registered as the default type to use to
>>> represent certain Java classes:
>>>     Java class xsd type   Float float  Double double  Integer int  Long
>>> long
>>> Short short  Byte byte  BigInteger integer  BigDecimal decimal  Boolean
>>> Boolean  String string
>>>
>>> This is what I am seeing for xsd:short and xsd:byte.  I'm puzzled by the
>>> type from getValue.
>>>
>>> CODE:
>>>
>>> System.out.println( "RDFDatatype: " + literal.getDatatype().toString() );
>>> System.out.println( "Datatype URI: " + literal.getDatatypeURI() );
>>> System.out.println( "getValue java class: " +
>>> ((Literal)literal).getValue().getClass()
>>> );
>>>
>>> OUTPUT:
>>>
>>> RDFDatatype: Datatype[http://www.w3.org/2001/XMLSchema#byte -> class
>>> java.lang.Byte]
>>> Datatype URI: http://www.w3.org/2001/XMLSchema#byte
>>> getValue java class: class java.lang.Integer
>>> RDFDatatype: Datatype[http://www.w3.org/2001/XMLSchema#short -> class
>>> java.lang.Short]
>>> Datatype URI: http://www.w3.org/2001/XMLSchema#short
>>> getValue java class: class java.lang.Integer
>>>
>>> So, is the expected behavior?
>>>
>>
>> Yes, or at least that's the implemented behaviour and has been for some
>> time.
>>
>> The getValue() code picks a Java datatype big enough for the actual value
>> out of Integer, Long and BigInteger.
>>
>> Arguably it would be better if it round tripped so that a java short would
>> become an xsd:short and would return a Short from getValue.
>>
>> The issue is largely historical. Partly its that the code was developed
>> while the RDF datatype handling was still in flux. Partly it's convenience
>> - a lot of people use xsd:integer (i.e. arbitrary size) in their RDF
>> (because that's what you get in Turtle if you use number syntax) but expect
>> them to be Integers in java "unless they are too big". Round-tripping from
>> java was never a requirement. Having once implemented it that way we
>> created a backward compatibility issue if we wanted to change it.
>>
>> I suspect that changing so that short and byte round tripped would be OK.
>> But equally I suspect that dropping the truncation of smaller BigIntegers
>> to Integers would cause problems.
>>
>> This might be something to revisit in any future Jena 3 though doesn't
>> seem like much of a priority - xsd:byte or xsd:short don't seem to be very
>> much used in RDF in the wild.
>>
>> Dave
>>
>>
>


Re: typed literals to java classes, problem with short and byte.

Posted by Tim Harsch <ha...@gmail.com>.
Thanks Dave.  Makes sense.   Why though does RDFDatatype says the class
would be Byte and would be Short ?  I guess there is no code that consults
RDFDatatype to ask what they type should be before creating it.   Is this
just an inconsistency in the API?  Or bug in the code?

Thanks,
Tim


On Tue, May 13, 2014 at 12:51 AM, Dave Reynolds
<da...@gmail.com>wrote:

> On 12/05/14 18:26, Tim Harsch wrote:
>
>> According to the docs:
>> http://jena.apache.org/documentation/notes/typed-literals.html
>>
>> These are all available as static member variables from
>> com.hp.hpl.jena.datatypes.xsd.XSDDatatype<http://jena.
>> apache.org/documentation/javadoc/jena/com/hp/hpl/jena/
>> datatypes/xsd/XSDDatatype.html>
>>
>> .
>>
>> Of these types, the following are registered as the default type to use to
>> represent certain Java classes:
>>    Java class xsd type   Float float  Double double  Integer int  Long
>> long
>> Short short  Byte byte  BigInteger integer  BigDecimal decimal  Boolean
>> Boolean  String string
>>
>> This is what I am seeing for xsd:short and xsd:byte.  I'm puzzled by the
>> type from getValue.
>>
>> CODE:
>>
>> System.out.println( "RDFDatatype: " + literal.getDatatype().toString() );
>> System.out.println( "Datatype URI: " + literal.getDatatypeURI() );
>> System.out.println( "getValue java class: " +
>> ((Literal)literal).getValue().getClass()
>> );
>>
>> OUTPUT:
>>
>> RDFDatatype: Datatype[http://www.w3.org/2001/XMLSchema#byte -> class
>> java.lang.Byte]
>> Datatype URI: http://www.w3.org/2001/XMLSchema#byte
>> getValue java class: class java.lang.Integer
>> RDFDatatype: Datatype[http://www.w3.org/2001/XMLSchema#short -> class
>> java.lang.Short]
>> Datatype URI: http://www.w3.org/2001/XMLSchema#short
>> getValue java class: class java.lang.Integer
>>
>> So, is the expected behavior?
>>
>
> Yes, or at least that's the implemented behaviour and has been for some
> time.
>
> The getValue() code picks a Java datatype big enough for the actual value
> out of Integer, Long and BigInteger.
>
> Arguably it would be better if it round tripped so that a java short would
> become an xsd:short and would return a Short from getValue.
>
> The issue is largely historical. Partly its that the code was developed
> while the RDF datatype handling was still in flux. Partly it's convenience
> - a lot of people use xsd:integer (i.e. arbitrary size) in their RDF
> (because that's what you get in Turtle if you use number syntax) but expect
> them to be Integers in java "unless they are too big". Round-tripping from
> java was never a requirement. Having once implemented it that way we
> created a backward compatibility issue if we wanted to change it.
>
> I suspect that changing so that short and byte round tripped would be OK.
> But equally I suspect that dropping the truncation of smaller BigIntegers
> to Integers would cause problems.
>
> This might be something to revisit in any future Jena 3 though doesn't
> seem like much of a priority - xsd:byte or xsd:short don't seem to be very
> much used in RDF in the wild.
>
> Dave
>
>

Re: typed literals to java classes, problem with short and byte.

Posted by Dave Reynolds <da...@gmail.com>.
On 12/05/14 18:26, Tim Harsch wrote:
> According to the docs:
> http://jena.apache.org/documentation/notes/typed-literals.html
>
> These are all available as static member variables from
> com.hp.hpl.jena.datatypes.xsd.XSDDatatype<http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/datatypes/xsd/XSDDatatype.html>
> .
>
> Of these types, the following are registered as the default type to use to
> represent certain Java classes:
>    Java class xsd type   Float float  Double double  Integer int  Long long
> Short short  Byte byte  BigInteger integer  BigDecimal decimal  Boolean
> Boolean  String string
>
> This is what I am seeing for xsd:short and xsd:byte.  I'm puzzled by the
> type from getValue.
>
> CODE:
>
> System.out.println( "RDFDatatype: " + literal.getDatatype().toString() );
> System.out.println( "Datatype URI: " + literal.getDatatypeURI() );
> System.out.println( "getValue java class: " +
> ((Literal)literal).getValue().getClass()
> );
>
> OUTPUT:
>
> RDFDatatype: Datatype[http://www.w3.org/2001/XMLSchema#byte -> class
> java.lang.Byte]
> Datatype URI: http://www.w3.org/2001/XMLSchema#byte
> getValue java class: class java.lang.Integer
> RDFDatatype: Datatype[http://www.w3.org/2001/XMLSchema#short -> class
> java.lang.Short]
> Datatype URI: http://www.w3.org/2001/XMLSchema#short
> getValue java class: class java.lang.Integer
>
> So, is the expected behavior?

Yes, or at least that's the implemented behaviour and has been for some 
time.

The getValue() code picks a Java datatype big enough for the actual 
value out of Integer, Long and BigInteger.

Arguably it would be better if it round tripped so that a java short 
would become an xsd:short and would return a Short from getValue.

The issue is largely historical. Partly its that the code was developed 
while the RDF datatype handling was still in flux. Partly it's 
convenience - a lot of people use xsd:integer (i.e. arbitrary size) in 
their RDF (because that's what you get in Turtle if you use number 
syntax) but expect them to be Integers in java "unless they are too 
big". Round-tripping from java was never a requirement. Having once 
implemented it that way we created a backward compatibility issue if we 
wanted to change it.

I suspect that changing so that short and byte round tripped would be 
OK. But equally I suspect that dropping the truncation of smaller 
BigIntegers to Integers would cause problems.

This might be something to revisit in any future Jena 3 though doesn't 
seem like much of a priority - xsd:byte or xsd:short don't seem to be 
very much used in RDF in the wild.

Dave