You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Paolo Castagna <ca...@googlemail.com> on 2011/10/26 14:38:38 UTC

xsd:int, xsd:integer, tdbloader and tdbdump...

Hi,
I have this data:

----
<foo:bar1> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#int> .
<foo:bar2> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#integer> .
<foo:bar3> <foo:p> "6"^^<http://www.w3.org/2001/XMLSchema#int> .
<foo:bar4> <foo:p> "6"^^<http://www.w3.org/2001/XMLSchema#integer> .
----

I load it:

tdbloader --loc /tmp/tdb data.nt

Then I dump it out:

tdbdump --loc /tmp/tdb

---
<foo:bar1> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#int> .
<foo:bar2> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#integer> .
<foo:bar3> <foo:p> "6"^^<http://www.w3.org/2001/XMLSchema#integer> .
<foo:bar4> <foo:p> "6"^^<http://www.w3.org/2001/XMLSchema#integer> .
----

I am not sure if these two triples in my data are both "correct", are they?

----
<foo:bar1> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#int> .
<foo:bar2> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#integer> .
----

They appear to be correct, I tried with http://sparql.org/data-validator.html
and Jena rdfcat or rdfparse commands. No errors.

Also:

<foo:bar3> <foo:p> "6"^^<http://www.w3.org/2001/XMLSchema#int> .

has been dumped out as:

<foo:bar3> <foo:p> "6"^^<http://www.w3.org/2001/XMLSchema#integer> .

This is because TDB value canonicalization:
http://openjena.org/wiki/TDB/ValueCanonicalization

However, I am not sure I understand why this does not happen for:

<foo:bar1> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#int> .

To further clarify my doubts on the difference between xsd:int and xsd:integer
I went to double check here:

  - http://www.w3.org/TR/xmlschema-2/#int
  - http://www.w3.org/TR/xmlschema-2/#integer

I think everything is fine and working as it is supposed to work,
but a confirmation would be good!

Thanks,
Paolo

Re: xsd:int, xsd:integer, tdbloader and tdbdump...

Posted by Paolo Castagna <ca...@googlemail.com>.
Damian Steer wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 26/10/11 15:44, Dave Reynolds wrote:
> 
>>> I have a large N-Triples or N-Quads file, what's the best way
>>> (i.e. the more strict the better for me) to validate the data in
>>> it, before ingestion?
>> Use Eyeball?
>>
>> Dave
> 
> The riot command line in ARQ has options for checking:
> 
> riot --validate <file>
> 
> or
> 
> riot --check=true <file> | more_processing
> 
> <http://openjena.org/wiki/RIOT>
> 
> Damian

Hi Damian

I should have RTFM ;-)

riot --validate data.nt

WARN  [line: 1, col: 20] Lexical form '6.0' not valid for datatype 
http://www.w3.org/2001/XMLSchema#int
WARN  [line: 2, col: 20] Lexical form '6.0' not valid for datatype 
http://www.w3.org/2001/XMLSchema#integer

riot --check=true data.nt

WARN  [line: 1, col: 20] Lexical form '6.0' not valid for datatype 
http://www.w3.org/2001/XMLSchema#int
<foo:bar1> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#int> .
WARN  [line: 2, col: 20] Lexical form '6.0' not valid for datatype 
http://www.w3.org/2001/XMLSchema#integer
<foo:bar2> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#integer> .
<foo:bar3> <foo:p> "6"^^<http://www.w3.org/2001/XMLSchema#int> .
<foo:bar4> <foo:p> "6"^^<http://www.w3.org/2001/XMLSchema#integer> .

Thanks,
Paolo


> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iEYEARECAAYFAk6oHn8ACgkQAyLCB+mTtylGQQCgn65gJocIz8kcYmJIx6YWFE4U
> LJEAnR++yM8HRxcIXMFOwtbQowGDUdIW
> =GZPa
> -----END PGP SIGNATURE-----


Re: xsd:int, xsd:integer, tdbloader and tdbdump...

Posted by Damian Steer <d....@bristol.ac.uk>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 26/10/11 15:44, Dave Reynolds wrote:

>> I have a large N-Triples or N-Quads file, what's the best way
>> (i.e. the more strict the better for me) to validate the data in
>> it, before ingestion?
> 
> Use Eyeball?
> 
> Dave

The riot command line in ARQ has options for checking:

riot --validate <file>

or

riot --check=true <file> | more_processing

<http://openjena.org/wiki/RIOT>

Damian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6oHn8ACgkQAyLCB+mTtylGQQCgn65gJocIz8kcYmJIx6YWFE4U
LJEAnR++yM8HRxcIXMFOwtbQowGDUdIW
=GZPa
-----END PGP SIGNATURE-----

Re: xsd:int, xsd:integer, tdbloader and tdbdump...

Posted by Dave Reynolds <da...@gmail.com>.
On Wed, 2011-10-26 at 15:27 +0100, Paolo Castagna wrote: 
> Dave Reynolds wrote:
> > Hi Paolo,
> > 
> > On Wed, 2011-10-26 at 14:34 +0100, Paolo Castagna wrote: 
> >> Dave Reynolds wrote:
> >>> On Wed, 2011-10-26 at 13:38 +0100, Paolo Castagna wrote:
> >>>
> >>>> I am not sure if these two triples in my data are both "correct", are they?
> >>>>
> >>>> ----
> >>>> <foo:bar1> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#int> .
> >>>> <foo:bar2> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#integer> .
> >>>> ----
> >>> No. The lexical forms for int and integer do not allow ".". See:
> >>> http://www.w3.org/TR/xmlschema-2/#integer etc and
> >>> http://www.w3.org/TR/xmlschema11-2/#integer etc
> >>>
> >>> Perhaps that's why they aren't cannonicalized by TDB.
> >> Thank you Dave.
> >>
> >> I still do not understand why I do not see errors or warnings when
> >> I validate my data with http://sparql.org/data-validator.html [1]
> > 
> > Humm. When I go to [1] and type in:
> > 
> >     <foo:bar1> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#int> .
> > 
> > Select "Turtle" (default) and press the validate button then I see the
> > error:
> > 
> > """
> > [line: 10, col: 20] Lexical form '6.0' not valid for datatype
> > http://www.w3.org/2001/XMLSchema#int
> > <foo:bar1>  <foo:p>  "6.0"^^<http://www.w3.org/2001/XMLSchema#int> .
> > """
> > 
> > Not sure what might be different in your case.
> 
> Interesting...
> 
> Sorry, I assumed I would have had the same answer no matter the input format.
> But, the validator (in Joseki) is giving different answers for the same data.
> In particular, when N-Triples format is used as input, there is no error.

N-Triples was originally (and technically still is) just a format for
writing down RDF/XML test cases so it has to be able to represent all
syntactically well-formed data even if it isn't legal by other criteria.
So it would be incorrect for an N-Triple reader to raise an error on
that data.

In fact in RDF an ill-formed datatype is not only not a syntax error,
it's not even an semantic inconsistency it "just" represents a value
which is not in the space of literals. Also, of course, the set of
datatypes (other than rdfs:XMLLiteral) is open ended in RDF so there is
no guarantee a given processor recognizes xsd:int. 

Though in practice it's best to do eager checking of such things :)

> I have a large N-Triples or N-Quads file, what's the best way (i.e. the more
> strict the better for me) to validate the data in it, before ingestion?

Use Eyeball?

Dave



Re: xsd:int, xsd:integer, tdbloader and tdbdump...

Posted by Paolo Castagna <ca...@googlemail.com>.
Dave Reynolds wrote:
> Hi Paolo,
> 
> On Wed, 2011-10-26 at 14:34 +0100, Paolo Castagna wrote: 
>> Dave Reynolds wrote:
>>> On Wed, 2011-10-26 at 13:38 +0100, Paolo Castagna wrote:
>>>
>>>> I am not sure if these two triples in my data are both "correct", are they?
>>>>
>>>> ----
>>>> <foo:bar1> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#int> .
>>>> <foo:bar2> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#integer> .
>>>> ----
>>> No. The lexical forms for int and integer do not allow ".". See:
>>> http://www.w3.org/TR/xmlschema-2/#integer etc and
>>> http://www.w3.org/TR/xmlschema11-2/#integer etc
>>>
>>> Perhaps that's why they aren't cannonicalized by TDB.
>> Thank you Dave.
>>
>> I still do not understand why I do not see errors or warnings when
>> I validate my data with http://sparql.org/data-validator.html [1]
> 
> Humm. When I go to [1] and type in:
> 
>     <foo:bar1> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#int> .
> 
> Select "Turtle" (default) and press the validate button then I see the
> error:
> 
> """
> [line: 10, col: 20] Lexical form '6.0' not valid for datatype
> http://www.w3.org/2001/XMLSchema#int
> <foo:bar1>  <foo:p>  "6.0"^^<http://www.w3.org/2001/XMLSchema#int> .
> """
> 
> Not sure what might be different in your case.

Interesting...

Sorry, I assumed I would have had the same answer no matter the input format.
But, the validator (in Joseki) is giving different answers for the same data.
In particular, when N-Triples format is used as input, there is no error.

I've just tried with Fuseki and it's fine, good! :-)

I have a large N-Triples or N-Quads file, what's the best way (i.e. the more
strict the better for me) to validate the data in it, before ingestion?

Thanks a lot.

Paolo

> 
> Dave
> 
> 


Re: xsd:int, xsd:integer, tdbloader and tdbdump...

Posted by Dave Reynolds <da...@gmail.com>.
Hi Paolo,

On Wed, 2011-10-26 at 14:34 +0100, Paolo Castagna wrote: 
> Dave Reynolds wrote:
> > On Wed, 2011-10-26 at 13:38 +0100, Paolo Castagna wrote:
> > 
> >> I am not sure if these two triples in my data are both "correct", are they?
> >>
> >> ----
> >> <foo:bar1> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#int> .
> >> <foo:bar2> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#integer> .
> >> ----
> > 
> > No. The lexical forms for int and integer do not allow ".". See:
> > http://www.w3.org/TR/xmlschema-2/#integer etc and
> > http://www.w3.org/TR/xmlschema11-2/#integer etc
> > 
> > Perhaps that's why they aren't cannonicalized by TDB.
> 
> Thank you Dave.
> 
> I still do not understand why I do not see errors or warnings when
> I validate my data with http://sparql.org/data-validator.html [1]

Humm. When I go to [1] and type in:

    <foo:bar1> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#int> .

Select "Turtle" (default) and press the validate button then I see the
error:

"""
[line: 10, col: 20] Lexical form '6.0' not valid for datatype
http://www.w3.org/2001/XMLSchema#int
<foo:bar1>  <foo:p>  "6.0"^^<http://www.w3.org/2001/XMLSchema#int> .
"""

Not sure what might be different in your case.

Dave



Re: xsd:int, xsd:integer, tdbloader and tdbdump...

Posted by Paolo Castagna <ca...@googlemail.com>.
Dave Reynolds wrote:
> On Wed, 2011-10-26 at 13:38 +0100, Paolo Castagna wrote:
> 
>> I am not sure if these two triples in my data are both "correct", are they?
>>
>> ----
>> <foo:bar1> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#int> .
>> <foo:bar2> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> ----
> 
> No. The lexical forms for int and integer do not allow ".". See:
> http://www.w3.org/TR/xmlschema-2/#integer etc and
> http://www.w3.org/TR/xmlschema11-2/#integer etc
> 
> Perhaps that's why they aren't cannonicalized by TDB.

Thank you Dave.

I still do not understand why I do not see errors or warnings when
I validate my data with http://sparql.org/data-validator.html [1]
or Jena rdfcat or rdfparse commands. No errors.

I also tried the -s (i.e. strict) option for rdfparse.

Cheers,
Paolo

  [1] http://bit.ly/tnKaT5

> 
> Dave
> 
> 


Re: xsd:int, xsd:integer, tdbloader and tdbdump...

Posted by Dave Reynolds <da...@gmail.com>.
On Wed, 2011-10-26 at 13:38 +0100, Paolo Castagna wrote:

> I am not sure if these two triples in my data are both "correct", are they?
> 
> ----
> <foo:bar1> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#int> .
> <foo:bar2> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#integer> .
> ----

No. The lexical forms for int and integer do not allow ".". See:
http://www.w3.org/TR/xmlschema-2/#integer etc and
http://www.w3.org/TR/xmlschema11-2/#integer etc

Perhaps that's why they aren't cannonicalized by TDB.

Dave