You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2011/11/27 17:32:40 UTC

[jira] [Issue Comment Edited] (JENA-170) hexBinary whitespace issue

    [ https://issues.apache.org/jira/browse/JENA-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157925#comment-13157925 ] 

Andy Seaborne edited comment on JENA-170 at 11/27/11 4:30 PM:
--------------------------------------------------------------

Henry - 

 Node n1 = SSE.parseNode("'AA'^^xsd:hexBinary") ;
 Node n2 = SSE.parseNode("' AA '^^xsd:hexBinary") ;
        
 System.out.println(n1.equals(n2)) ;             // ==> false
 System.out.println(n1.sameValueAs(n2)) ;  // ==> true

The same would be true for Literal.sameValueAs.

You are right that xsd:hexBinary has the whitespace facet enabled (oddly, so does xsd:anyURI).

Jena keeps the lexical for the literal as given, and in creating nodes, it does not modify the presented lexicial form (one eception rdf:XMLLiterals, because parseType="literal" requires XC14N to be applied).  RDF/XML makes the lexical form of a literal to be the text of the XML element, which does not apply XSD rules.  The RDF abstract syntax is agnostic to value processing (i.e. D-entailment).  

There is no XML scheme processing in parsing RDF/XML so no applying the whitespace facet.

This case is the same as integers 0001 and 1.  Same value but different lexical forms so different RDF literals.  I guess Clerezza uses .equals not .sameValueAs.

With the emergence of Turtle, this situation will be messier.

We have talked about canonicalization of all input (see 
org.openjena.riot.pipeline.normalize.CanonicalizeLiteral).  Whitespace processing could be included.  Canonicalization is not free thiorugh 

But loosing the layout on large xsd:hexBinary/xsd:base64Binary might mean needing to teach the writers to layout these literals.

The situation in ARQ is that in basic graph pattern matching, matching is by exact node equality (simple entailment unless you are using a reasoner).

Filters however do value testing for certain well-known datatypes.  ARQ adds various types over the minimum required by SPARQL - it adds the Gregorian dates (gYear, gMonthyDay etc), xsd:date, and XSD durations.  It does not include xsd:hexBinary though.

If the input data is canonicalized, the equality of node will be the same as value-equality.


                
      was (Author: andy.seaborne):
    Henry - 

 Node n1 = SSE.parseNode("'AA'^^xsd:hexBinary") ;
 Node n2 = SSE.parseNode("' AA '^^xsd:hexBinary") ;
        
 System.out.println(n1.equals(n2)) ;             // ==> false
 System.out.println(n1.sameValueAs(n2)) ;  // ==> true

The same would be true for Literal.sameValueAs.

You are right that xsd:hexBinary has the whitespace facet enabled (oddly, so does xsd:anyURI).

Jena keeps the lexical for the literal as given, and in creating nodes, it does not modify the presented lexicial form (one eception rdf:XMLLiterals, because parseType="literal" requires XC14N to be applied).  RDF/XML makes the lexical form of a literal to be the text of the XML element, which does not apply XSD rules.  The RDF abstract syntax is agnostic to value processing (i.e. D-entailment).  

There is no XML scheme processing in parsing RDF/XML so no applying the whitespace facet.

This case is the same as integers 0001 and 1.  Same value but different lexical forms so different RDF literals.  I guess Clerezza uses .equals not .sameValueAs.

With the emergence of Turtle, this situation will be messier.

We have talked about canonicalization of all input (see 
org.openjena.riot.pipeline.normalize.CanonicalizeLiteral).  Whitespace processing could be included.  Canonicalization is not free thiorugh 

But loosing the layout on large xsd:hexBinary/xsd:base64Binary might mean needing to teach the writers to layout these literals.

The situation in ARQ is that in basic graph pattern matching, matching is by exact node equality (simple entailment unless you are using a reasoner).

Filters however do value testing for certain well-known datatypes.  ARQ adds various types over the minimum required by SPARQL - it adds the Gregorian dates (gYear, gMonthyDay etc), xsd:date, and XSD durations.  It does not include xsd:hexBinary though.


                  
> hexBinary whitespace issue
> --------------------------
>
>                 Key: JENA-170
>                 URL: https://issues.apache.org/jira/browse/JENA-170
>             Project: Jena
>          Issue Type: Bug
>          Components: ARQ, Jena, RDF/XML
>         Environment: 2.6.4
>            Reporter: Henry Story
>            Assignee: Andy Seaborne
>            Priority: Minor
>
> As I understand, initial and final white spaces in xsd:hexBinary in xml should be ignored
>    http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#hexBinary
>  
> because of the whitespace facet.
> With Jena 2.6.4 this is not the case, as shown by the test below. 
> I found that in Clerezza when using the graph api, so this is a problem even when one does not use SPARQL.
> Removing the white space solves the proble. 
> xsd:hexBinary is already a very fragile encoding. Making it this fragile is bound to lead to issues in communication.
> The same is true with the N3 encoding.
> -----------------------------------------------------------------
> hjs@bblfish[0]$ cat q1.sparql 
> PREFIX : <http://me.example/p#> 
> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
> SELECT ?S WHERE {
>   ?S :related "AAAA"^^xsd:hexBinary .
> }
> hjs@bblfish[0]$ cat c1.rdf 
> <rdf:RDF xmlns="http://me.example/p#"
>     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
>     <rdf:Description rdf:about="http://me.example/p#me">
>         <related rdf:datatype="http://www.w3.org/2001/XMLSchema#hexBinary">
> AAAA
> </related>
>     </rdf:Description>
> </rdf:RDF>
> hjs@bblfish[0]$ arq --query=q1.sparql --data=c1.rdf
> -----
> | S |
> =====
> -----

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira