You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by afs <gi...@git.apache.org> on 2017/11/14 16:41:24 UTC

[GitHub] jena pull request #308: JENA-1384: Canonical literals: lexical form and lang...

GitHub user afs opened a pull request:

    https://github.com/apache/jena/pull/308

    JENA-1384: Canonical literals: lexical form and langTags

    RDFParser options to handle canonical lexcial form for values ("+0123" becomes "123" for an xsd:integer) and options for handling language tags (to RFC-canonical form; to lowercase).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/afs/jena canonical-langtags

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/308.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #308
    
----
commit 3b59d0b6db85b5d36cec78f5c02aae21cc8aff6f
Author: Andy Seaborne <an...@apache.org>
Date:   2017-11-14T16:31:54Z

    JENA-1384: Canonical literals: lexical form and langTags

----


---

[GitHub] jena issue #308: JENA-1384: Canonical literals: lexical form and langTags

Posted by eroux <gi...@git.apache.org>.
Github user eroux commented on the issue:

    https://github.com/apache/jena/pull/308
  
    Thanks a lot for the pull request!


---

[GitHub] jena pull request #308: JENA-1384: Canonical literals: lexical form and lang...

Posted by rvesse <gi...@git.apache.org>.
Github user rvesse commented on a diff in the pull request:

    https://github.com/apache/jena/pull/308#discussion_r152523288
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/riot/process/normalize/CanonicalizeLiteral.java ---
    @@ -73,6 +76,36 @@ public Node apply(Node node) {
             return n2 ;
         }
         
    +    /** Convert the lexical form to a canonical form if one of the known datatypes,
    +     * otherwise return the node argument. (same object :: {@code ==})  
    +     */
    +    public static Node canonicalValue(Node node) {
    +        if ( ! node.isLiteral() )
    +            return node ;
    +        // Fast-track
    +        if ( NodeUtils.isLangString(node) )
    +            return node;
    +        if ( NodeUtils.isSimpleString(node) )
    +            return node;
    +
    +        if ( ! node.getLiteralDatatype().isValid(node.getLiteralLexicalForm()) )
    +            // Invalid lexical form for the datatype - do nothing.
    +            return node;
    +            
    +        RDFDatatype dt = node.getLiteralDatatype() ;
    +        // Datatype, not rdf:langString (RDF 1.1). 
    +        DatatypeHandler handler = dispatch.get(dt) ;
    +        if ( handler == null )
    +            return node ;
    +        Node n2 = handler.handle(node, node.getLiteralLexicalForm(), dt) ;
    +        if ( n2 == null )
    +            return node ;
    +        return n2 ;
    +    }
    +    
    +    /** Convert the language tag of a lexical form to a canonical form if one of the known datatypes,
    +     * otherwise return the node argument. (same object; compare by {@code ==})  
    +     */
         private static Node canonicalLangtag(String lexicalForm, String langTag) {
             String langTag2 = LangTag.canonical(langTag);
             if ( langTag2.equals(langTag) )
    --- End diff --
    
    Shouldn't we be returning `node` not `null` in the subsequent line?


---

[GitHub] jena pull request #308: JENA-1384: Canonical literals: lexical form and lang...

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/308#discussion_r152531097
  
    --- Diff: jena-arq/src/main/java/org/apache/jena/riot/process/normalize/CanonicalizeLiteral.java ---
    @@ -73,6 +76,36 @@ public Node apply(Node node) {
             return n2 ;
         }
         
    +    /** Convert the lexical form to a canonical form if one of the known datatypes,
    +     * otherwise return the node argument. (same object :: {@code ==})  
    +     */
    +    public static Node canonicalValue(Node node) {
    +        if ( ! node.isLiteral() )
    +            return node ;
    +        // Fast-track
    +        if ( NodeUtils.isLangString(node) )
    +            return node;
    +        if ( NodeUtils.isSimpleString(node) )
    +            return node;
    +
    +        if ( ! node.getLiteralDatatype().isValid(node.getLiteralLexicalForm()) )
    +            // Invalid lexical form for the datatype - do nothing.
    +            return node;
    +            
    +        RDFDatatype dt = node.getLiteralDatatype() ;
    +        // Datatype, not rdf:langString (RDF 1.1). 
    +        DatatypeHandler handler = dispatch.get(dt) ;
    +        if ( handler == null )
    +            return node ;
    +        Node n2 = handler.handle(node, node.getLiteralLexicalForm(), dt) ;
    +        if ( n2 == null )
    +            return node ;
    +        return n2 ;
    +    }
    +    
    +    /** Convert the language tag of a lexical form to a canonical form if one of the known datatypes,
    +     * otherwise return the node argument. (same object; compare by {@code ==})  
    +     */
         private static Node canonicalLangtag(String lexicalForm, String langTag) {
             String langTag2 = LangTag.canonical(langTag);
             if ( langTag2.equals(langTag) )
    --- End diff --
    
    Here, node isn't passed in so it can't be returned. Style thing. Node is already known to have a language tag so I don't like passing in a Node which can be wrong e.g.through mis-call from somewhere else.. Passing lex+lang forces it to be the information for a language tagged literal.
    
    It's tested at line 74
    ```
            if ( n2 == null )
                return node ;
    ```
    and elsewhere conversion also sometimes returns `null` for "no conversion" which means no new node is needed which is more efficient (meaureably).



---

[GitHub] jena pull request #308: JENA-1384: Canonical literals: lexical form and lang...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/jena/pull/308


---