You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2017/07/08 20:27:00 UTC

[jira] [Commented] (JENA-1313) Language-specific collation in ARQ

    [ https://issues.apache.org/jira/browse/JENA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16079303#comment-16079303 ] 

Andy Seaborne commented on JENA-1313:
-------------------------------------

A note for the record:

"Function and Operators" v3.1 adds {{fn:collation-key}} which returns an {{xs:base64Binary}}. The collation argument is similar to the lang tag but not the same - it's a string which is for format xs:anyURI.

https://www.w3.org/TR/xpath-functions-3/#func-collation-key

https://www.w3.org/TR/xpath-functions-3/#collations


> Language-specific collation in ARQ
> ----------------------------------
>
>                 Key: JENA-1313
>                 URL: https://issues.apache.org/jira/browse/JENA-1313
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>    Affects Versions: Jena 3.2.0
>            Reporter: Osma Suominen
>
> As [discussed|http://markmail.org/message/v2bvsnsza5ksl2cv] on the users mailing list in October 2016, I would like to change ARQ collation of literal values to be language-aware and respect language-specific collation rules.
> This would probably involve changing at least the [NodeUtils.compareLiteralsBySyntax|https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/util/NodeUtils.java#L199] method.
> It currently sorts by lexical value first, then by language tag. Since the collation order needs to be stable across all possible literal values, I think the safest way would be to sort by language tag first, then by lexical value according to the collation rules for that language.
> But what about subtags like {{@en-US}} or {{@pt-BR}}? Can they have different collation rules than the main language? It would be a bit strange if all {{@en-US}} literals sorted after {{@en}} literals...
> It would be good to check how Dydra does this and possibly take the same approach. See the message linked above for further backgound.
> I've been talking with [~kinow] about this and he may be interested in implementing it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)