You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Andy Seaborne <an...@apache.org> on 2013/11/06 12:07:33 UTC
RDF 1.1 -- changes to plain literals -- impact assessment

This email covers the changes in RDF 1.1 around plain literals.

If you think you are affected by this, please let us now as soon as 
possible.

(If you read dev@jena you'll have seen this already - it's being sent to 
users@jena to get the wider audience.)

== Summary

In RDF 1.1, all literals have a datatype.

* simple literals (e..g "foo") have datatype xsd:string.

* literals with a language tag (e.g. "foo"@en)
   have a datatype rdf:langString.

This change may have an impact on databases.

== RDF 1.1

The current situation for RDF (know as RDF-2004) is that "plain 
literals" are literals which have no datatype.  They are either "simple 
literals" (no datatype, no language tag) or have a language tag.  A 
literal does not have both a language tag and a datatype in RDF-2004.

In RDF 1.1, all literals have a datatype always.

* simple literals have datatype xsd:string.
   simple literals and xsd:strings are the same RDF term.

* literals with a language tag have datatype rdf:langString.

This is a change but the working group believes it is a small one. Mixed 
data, with both plain literals and xsd:string is assumed to be rare.

The first one, simple literal/xsd:string, is the more significant change.

== Example

Previously:

:s :p "foo" .
:s :p "foo"^^xsd:string .

was 2 triples.  In RDF 1.1 there is a graph of one triple there because 
a graph is a set of triples; "foo" and "foo"^^xsd:string are different 
ways of writing the same thing much like this shows two ways to write 
the same triple:

---------
@prefix : <http://example/> .

:x :p 123 .
<http://example/x> :p 123 .
---------

== Syntax

This change happens because of the treatment of syntax, input and output:

On input, simple literal and xsd:string create the same RDF term, with 
datatype xsd:string. Langtags cause a literal with type rdf:langString, 
and a language tag, to be created.

On output, the plain literal forms are used.  xsd:string and 
xsd:langString do not appear in the output.

(Aside: rdf:plainLiteral should never appear in RDF data but we could do 
the same transforms to the canonical value form)

== Effects
(due to xsd:string)

Systems using xsd:string, and sensitive to an explicit type, are 
affected.  At a guess, OWL systems, maybe Protégé (but I have no 
evidence one way of the other. They see to have xsd:strings in the data 
and until converted may see data without explicit xsd:string and get 
confused.)

The numbers of triples changes IF the same subject/predicate is used 
with simple literals and with xsd:strings.

Generally, I see data that either uses xsd:string or uses simple 
literals.  Mixing seems quite rare.

== Jena
(xsd:string)

Jena in-memory already equates simple literals and xsd:strings for 
searching (i.e. Graph.find) so while the number of results can change, 
it should not a case of not finding data.

The worse case is producing data for other systems that are not RDF 1.1 
and do expect an explicit xsd:string datatype on literals.

== RDF API users
(rdf:langString)

The key is "test language before datatype" - if tested that way round 
the appearance of rdf:langString will not matter.  If the test is 
"datatype first, null meaning plain literal", it will matter.

I doubt much code outside Jena does this sort of thing - it's something 
writers do so that needs completely checking but it's just a case of 
finding all the calls of getLiteralLanguage().

This is the most significant rdf:langString related change as far as I 
can see.

== SPARQL
(xsd:string)

SPARQL already has some adaptation:
    datatype("x") = xsd:string           (SPARQL 1.0)
    datatype("x"@en) = rdf:langString    (SPARQL 1.1)

Due to the xsd:string change, matching basic graph patterns may produce 
a result it didn't before:

{ ?x :p "foo"^^xsd:string }  will match data  :x :p "foo"
{ ?x :p "foo" }              will match data  :x :p "foo"^^xsd:string

It makes it easier to optimize FILTER(?x = "foo")

== Databases
(xsd:string)

Anything that relies on a hash of literal in a system that uses 
xsd:string will need to reload.  Currently, if keeping simple literals 
and xsd:strings apart includes hashing them differently, then this 
change is significant.

This does affect TDB and SDB.

= Compatibility

We could provide some compatibility

1/ The ability to write data with explicit xsd:string
2/ Hide rdf:langString from Node.getLiteralDatatype()

What does not work is recording whether an RDF term was originally 
written as xsd:string or as a simple literal.  That could end up with 
two different terms (Nodes) that represent the same term, or 
non-determinism depending on which term is seen first.

     Andy