You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commonsrdf.apache.org by Reto Gmür <re...@apache.org> on 2015/04/11 13:04:49 UTC

Why have dedicated type for Language rather than using Strings (was Re: Moderate use of classes or only interfaces and strings)

On Mon, Mar 30, 2015 at 9:40 PM, Peter Ansell <an...@gmail.com>
wrote:

> The use of interfaces is a deliberate step to improve the usefulness
> of the API by not mandating any particular implementation. We are not
> going to change that.
>
> I don't understand what makes a simple String a bad choice for
> representing language tags. There are no other attributes that could
> be attached to the string, per the BCP47 design to have all of the
> information required in a simple string. In addition that BCP47 string
> literal is all that is referred to in RDF-1.1, which is our core
> reference, not the JVM or other libraries design choices. Ie, the
> RDF-1.1 specs do not disect the language tag, so we see no need to do
> so here. The equality rules (lower case comparison with any casing for
> the tag literal itself) are all defined at the total level, so in that
> case it also doesn't make sense to decompile the string.
>

By that reasoning you could also use Strings for IRIs.

It's about object orientation, to represent a IRI we have a type IRI, to
representation a Language a type Language and to represent an immutable
sequence of characters and only for that we have String.

So if our APIs would be focused on the concrete syntax then for the
serializations "Hello"@en and "Hello"@EN it might be justified to have two
different language tags and

literal1.getLanguageTag().equals(literal2.getLanguageTag()) evaluating to
false.

But as our API models the abstract syntax and as in the RDF data model
there is no difference between "Hello"@en and "Hello"@EN and the string
representation of the language in some serialization is irrelevant for what
we model, we should only care about the language (which can be represented
as a string but which is not a string)

literal1.getLanguage().equals(literal2.getLanguage()) must evaluate to
true, because the literals have the same language.

The Language type is similar to java.util.Locale but modelling only BCP-47.

Cheers,
Reto



>
> On 31 March 2015 at 00:21, Reto Gmür <re...@apache.org> wrote:
> > Hi Andy,
> >
> >>>
> >>> and you have evolved to something for Clerezza that is not interface
> >>> based, which, as already commented (no response from you BTW) is a
> >>> roadblock for some.  There was a point about scalability as well.
> >
> > I was waiting for jira, I will create an issue to address this. I think
> > IRIs and and language should be glasses. The current code uses an
> interface
> > IRI (a different from URL and URI in the java core library for which I
> fail
> > to understand the justifying use cases) and a String to express the
> > language tag (poor OO and wrong identify criterion, as the casing of the
> > language tag makes is irrelevant).
> >
> > As for scalability I don't know what you are referring to.
> >
> > I will create issues and answer your other points when I'm back on a
> better
> > connection.
> >
> > Cheers,
> > Reto
>

Re: Why have dedicated type for Language rather than using Strings (was Re: Moderate use of classes or only interfaces and strings)

Posted by Reto Gmür <re...@apache.org>.
On Sat, Apr 11, 2015 at 1:15 PM, Peter Ansell <an...@gmail.com>
wrote:

> On 11 April 2015 at 21:04, Reto Gmür <re...@apache.org> wrote:
> > On Mon, Mar 30, 2015 at 9:40 PM, Peter Ansell <an...@gmail.com>
> > wrote:
> >
> >> The use of interfaces is a deliberate step to improve the usefulness
> >> of the API by not mandating any particular implementation. We are not
> >> going to change that.
> >>
> >> I don't understand what makes a simple String a bad choice for
> >> representing language tags. There are no other attributes that could
> >> be attached to the string, per the BCP47 design to have all of the
> >> information required in a simple string. In addition that BCP47 string
> >> literal is all that is referred to in RDF-1.1, which is our core
> >> reference, not the JVM or other libraries design choices. Ie, the
> >> RDF-1.1 specs do not disect the language tag, so we see no need to do
> >> so here. The equality rules (lower case comparison with any casing for
> >> the tag literal itself) are all defined at the total level, so in that
> >> case it also doesn't make sense to decompile the string.
> >>
> >
> > By that reasoning you could also use Strings for IRIs.
>
> There are limits to the amount of object orientation that is required.
>
Why should we not be fully object oriented?


> If there is agreement that there is benefit from adding a type for
> LanguageTags then it could be added.

I think it should be Language and not LanguageTag as a BCP-47 tag
identifies a Language (or Dialect/writing system/ortography lanaguage
variant).


> However, there is no reason to
> generalise to IRIs, which are generally agreed to need to be
> represented by a type due to their complexity.
>
I obviously agree that IRI should be a type, but given that the identity of
IRIs depends only on its unicode string I don't see why the complexity is
bigger that for Language tags,


>
> > It's about object orientation, to represent a IRI we have a type IRI, to
> > representation a Language a type Language and to represent an immutable
> > sequence of characters and only for that we have String.
> >
> > So if our APIs would be focused on the concrete syntax then for the
> > serializations "Hello"@en and "Hello"@EN it might be justified to have
> two
> > different language tags and
> >
> > literal1.getLanguageTag().equals(literal2.getLanguageTag()) evaluating to
> > false.
>
> The RDF-1.1 specification sets up specific rules based solely on lower
> case string comparison by stating that the lower case language tags
> are the value space. There is no ambiguity about the fact that a plain
> .equals is not suitable for language tag strings.
>

RDF 1.1 defines literal equality based on the equality of the string
representation of their elements. Exactly because the value space is only
containes lowe case Strings we need something which matches this better
than java Strings which can also be uppercase.

Rereading the spec
literal1.getLanguageTag().equals(literal2.getLanguageTag()) would have to
return true because "EN" is not in the value space for language tags.
literal1.getLanguageTagLexicalRepresentation().equals(literal2.getLanguageTagLexicalRepresentation())
might return false.


>
> > But as our API models the abstract syntax and as in the RDF data model
> > there is no difference between "Hello"@en and "Hello"@EN and the string
> > representation of the language in some serialization is irrelevant for
> what
> > we model, we should only care about the language (which can be
> represented
> > as a string but which is not a string)
> >
> > literal1.getLanguage().equals(literal2.getLanguage()) must evaluate to
> > true, because the literals have the same language.
> >
> > The Language type is similar to java.util.Locale but modelling only
> BCP-47.
>
> As long as the Language type allowed for the preservation of the
> original casing for systems that wish to support that, it would not be
> any different to String, apart from formalising the case-insensitive
> comparison.
>
Why do you think is preserving the original casing a requirement?

Cheers,
Reto

>
> Feel free to submit a pull request which adds this functionality (or
> any other patch submission method that you prefer) and that may make
> it simpler to discuss the specifics, including whether it should be
> added or not.
>
> Thanks,
>
> Peter
>

Re: Why have dedicated type for Language rather than using Strings (was Re: Moderate use of classes or only interfaces and strings)

Posted by Peter Ansell <an...@gmail.com>.
On 11 April 2015 at 21:04, Reto Gmür <re...@apache.org> wrote:
> On Mon, Mar 30, 2015 at 9:40 PM, Peter Ansell <an...@gmail.com>
> wrote:
>
>> The use of interfaces is a deliberate step to improve the usefulness
>> of the API by not mandating any particular implementation. We are not
>> going to change that.
>>
>> I don't understand what makes a simple String a bad choice for
>> representing language tags. There are no other attributes that could
>> be attached to the string, per the BCP47 design to have all of the
>> information required in a simple string. In addition that BCP47 string
>> literal is all that is referred to in RDF-1.1, which is our core
>> reference, not the JVM or other libraries design choices. Ie, the
>> RDF-1.1 specs do not disect the language tag, so we see no need to do
>> so here. The equality rules (lower case comparison with any casing for
>> the tag literal itself) are all defined at the total level, so in that
>> case it also doesn't make sense to decompile the string.
>>
>
> By that reasoning you could also use Strings for IRIs.

There are limits to the amount of object orientation that is required.
If there is agreement that there is benefit from adding a type for
LanguageTags then it could be added. However, there is no reason to
generalise to IRIs, which are generally agreed to need to be
represented by a type due to their complexity.

> It's about object orientation, to represent a IRI we have a type IRI, to
> representation a Language a type Language and to represent an immutable
> sequence of characters and only for that we have String.
>
> So if our APIs would be focused on the concrete syntax then for the
> serializations "Hello"@en and "Hello"@EN it might be justified to have two
> different language tags and
>
> literal1.getLanguageTag().equals(literal2.getLanguageTag()) evaluating to
> false.

The RDF-1.1 specification sets up specific rules based solely on lower
case string comparison by stating that the lower case language tags
are the value space. There is no ambiguity about the fact that a plain
.equals is not suitable for language tag strings.

> But as our API models the abstract syntax and as in the RDF data model
> there is no difference between "Hello"@en and "Hello"@EN and the string
> representation of the language in some serialization is irrelevant for what
> we model, we should only care about the language (which can be represented
> as a string but which is not a string)
>
> literal1.getLanguage().equals(literal2.getLanguage()) must evaluate to
> true, because the literals have the same language.
>
> The Language type is similar to java.util.Locale but modelling only BCP-47.

As long as the Language type allowed for the preservation of the
original casing for systems that wish to support that, it would not be
any different to String, apart from formalising the case-insensitive
comparison.

Feel free to submit a pull request which adds this functionality (or
any other patch submission method that you prefer) and that may make
it simpler to discuss the specifics, including whether it should be
added or not.

Thanks,

Peter