You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Osma Suominen (JIRA)" <ji...@apache.org> on 2017/04/03 11:18:41 UTC
[jira] [Comment Edited] (JENA-1313) Language-specific collation in ARQ

    [ https://issues.apache.org/jira/browse/JENA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953277#comment-15953277 ] 

Osma Suominen edited comment on JENA-1313 at 4/3/17 11:17 AM:
--------------------------------------------------------------

[~andy.seaborne] The discussion lately has been around implementing collation as an ARQ extension function and leaving the default ARQ sorting (and comparison) operations unchanged. Other than that, your summary looks accurate to me.

Wearing my application developer hat, I don't really care which way this is implemented (changing the defaults vs. extension function) as long as it is possible to do locale-aware collation in SPARQL ORDER BY clauses. Changing the ARQ defaults would be slightly better since then I wouldn't have to change my SPARQL queries at all, but I understand that this could create problems. Maybe the ARQ locale-mode switch would help too, since I'd presumably only have to change the Fuseki configuration but not my SPARQL queries.



was (Author: osma):
[~andy.seaborne] The discussion lately has been around implementing collation as an ARQ extension function and leaving the default ARQ sorting (and comparison) operations unchanged. Other than that, your summary looks accurate to me.

Wearing my application developer hat, I don't really care which way this is implemented (changing the defaults vs. extension function) as long as it is possible to do locale-aware collation.

> Language-specific collation in ARQ
> ----------------------------------
>
>                 Key: JENA-1313
>                 URL: https://issues.apache.org/jira/browse/JENA-1313
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>    Affects Versions: Jena 3.2.0
>            Reporter: Osma Suominen
>
> As [discussed|http://markmail.org/message/v2bvsnsza5ksl2cv] on the users mailing list in October 2016, I would like to change ARQ collation of literal values to be language-aware and respect language-specific collation rules.
> This would probably involve changing at least the [NodeUtils.compareLiteralsBySyntax|https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/util/NodeUtils.java#L199] method.
> It currently sorts by lexical value first, then by language tag. Since the collation order needs to be stable across all possible literal values, I think the safest way would be to sort by language tag first, then by lexical value according to the collation rules for that language.
> But what about subtags like {{@en-US}} or {{@pt-BR}}? Can they have different collation rules than the main language? It would be a bit strange if all {{@en-US}} literals sorted after {{@en}} literals...
> It would be good to check how Dydra does this and possibly take the same approach. See the message linked above for further backgound.
> I've been talking with [~kinow] about this and he may be interested in implementing it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)