You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by afs <gi...@git.apache.org> on 2017/11/14 16:41:24 UTC
[GitHub] jena pull request #308: JENA-1384: Canonical literals: lexical form and lang...
GitHub user afs opened a pull request:
https://github.com/apache/jena/pull/308
JENA-1384: Canonical literals: lexical form and langTags
RDFParser options to handle canonical lexcial form for values ("+0123" becomes "123" for an xsd:integer) and options for handling language tags (to RFC-canonical form; to lowercase).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/afs/jena canonical-langtags
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/jena/pull/308.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #308
----
commit 3b59d0b6db85b5d36cec78f5c02aae21cc8aff6f
Author: Andy Seaborne <an...@apache.org>
Date: 2017-11-14T16:31:54Z
JENA-1384: Canonical literals: lexical form and langTags
----
---
[GitHub] jena issue #308: JENA-1384: Canonical literals: lexical form and langTags
Posted by eroux <gi...@git.apache.org>.
Github user eroux commented on the issue:
https://github.com/apache/jena/pull/308
Thanks a lot for the pull request!
---
[GitHub] jena pull request #308: JENA-1384: Canonical literals: lexical form and lang...
Posted by rvesse <gi...@git.apache.org>.
Github user rvesse commented on a diff in the pull request:
https://github.com/apache/jena/pull/308#discussion_r152523288
--- Diff: jena-arq/src/main/java/org/apache/jena/riot/process/normalize/CanonicalizeLiteral.java ---
@@ -73,6 +76,36 @@ public Node apply(Node node) {
return n2 ;
}
+ /** Convert the lexical form to a canonical form if one of the known datatypes,
+ * otherwise return the node argument. (same object :: {@code ==})
+ */
+ public static Node canonicalValue(Node node) {
+ if ( ! node.isLiteral() )
+ return node ;
+ // Fast-track
+ if ( NodeUtils.isLangString(node) )
+ return node;
+ if ( NodeUtils.isSimpleString(node) )
+ return node;
+
+ if ( ! node.getLiteralDatatype().isValid(node.getLiteralLexicalForm()) )
+ // Invalid lexical form for the datatype - do nothing.
+ return node;
+
+ RDFDatatype dt = node.getLiteralDatatype() ;
+ // Datatype, not rdf:langString (RDF 1.1).
+ DatatypeHandler handler = dispatch.get(dt) ;
+ if ( handler == null )
+ return node ;
+ Node n2 = handler.handle(node, node.getLiteralLexicalForm(), dt) ;
+ if ( n2 == null )
+ return node ;
+ return n2 ;
+ }
+
+ /** Convert the language tag of a lexical form to a canonical form if one of the known datatypes,
+ * otherwise return the node argument. (same object; compare by {@code ==})
+ */
private static Node canonicalLangtag(String lexicalForm, String langTag) {
String langTag2 = LangTag.canonical(langTag);
if ( langTag2.equals(langTag) )
--- End diff --
Shouldn't we be returning `node` not `null` in the subsequent line?
---
[GitHub] jena pull request #308: JENA-1384: Canonical literals: lexical form and lang...
Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:
https://github.com/apache/jena/pull/308#discussion_r152531097
--- Diff: jena-arq/src/main/java/org/apache/jena/riot/process/normalize/CanonicalizeLiteral.java ---
@@ -73,6 +76,36 @@ public Node apply(Node node) {
return n2 ;
}
+ /** Convert the lexical form to a canonical form if one of the known datatypes,
+ * otherwise return the node argument. (same object :: {@code ==})
+ */
+ public static Node canonicalValue(Node node) {
+ if ( ! node.isLiteral() )
+ return node ;
+ // Fast-track
+ if ( NodeUtils.isLangString(node) )
+ return node;
+ if ( NodeUtils.isSimpleString(node) )
+ return node;
+
+ if ( ! node.getLiteralDatatype().isValid(node.getLiteralLexicalForm()) )
+ // Invalid lexical form for the datatype - do nothing.
+ return node;
+
+ RDFDatatype dt = node.getLiteralDatatype() ;
+ // Datatype, not rdf:langString (RDF 1.1).
+ DatatypeHandler handler = dispatch.get(dt) ;
+ if ( handler == null )
+ return node ;
+ Node n2 = handler.handle(node, node.getLiteralLexicalForm(), dt) ;
+ if ( n2 == null )
+ return node ;
+ return n2 ;
+ }
+
+ /** Convert the language tag of a lexical form to a canonical form if one of the known datatypes,
+ * otherwise return the node argument. (same object; compare by {@code ==})
+ */
private static Node canonicalLangtag(String lexicalForm, String langTag) {
String langTag2 = LangTag.canonical(langTag);
if ( langTag2.equals(langTag) )
--- End diff --
Here, node isn't passed in so it can't be returned. Style thing. Node is already known to have a language tag so I don't like passing in a Node which can be wrong e.g.through mis-call from somewhere else.. Passing lex+lang forces it to be the information for a language tagged literal.
It's tested at line 74
```
if ( n2 == null )
return node ;
```
and elsewhere conversion also sometimes returns `null` for "no conversion" which means no new node is needed which is more efficient (meaureably).
---
[GitHub] jena pull request #308: JENA-1384: Canonical literals: lexical form and lang...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/jena/pull/308
---