You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commonsrdf.apache.org by Andy Seaborne <an...@apache.org> on 2015/03/30 20:25:56 UTC

On URL and URIs

On 30/03/15 14:21, Reto Gmür wrote:
> The current code uses an interface
> IRI (a different from URL and URI in the java core library for which I fail
> to understand the justifying use cases)

java.net.URLs has inappropriate operations (.open)

The stumbling block is the desired for a typed interface
and the fact that subjects are "BlankNodeOrIRI"

java.net.URI is a class and is final.

That can be worked around although IMO it's not pretty to have, say, a 
union for BlankNodeOrIRI + variations on all method calls mentioning 
BlankNodeOrIRI as arguments.

java.net.URI is not bad and it's UTF-8 + IPv6 additions, not strict 
US-ASCII.

The constructor has an implicit parser behind it so it is not a simple 
wrapper of little cost. The parser is good for the syntax. It does not 
do punycode in toASCIIString().


There is a question of how to treat bad data - whether to allow bad IRIs 
at all so that an application can use the API and then clean the data up 
by processing the RDF or whether it needs to clean it beforehand.

That's a style issue, not a purely technical one.

   Check early vs be as permissive as possible.

(example: other people's data is often fit for their purpose but may be 
strictly "bad". An ETL pipeline might want to just get stuff in and then 
fix it working in RDFTerms, e.g. apply " " to "%20" or NFC rules.)

As commons-rdf is supposed to be neutral to underlying providers, making 
that call is not right.

	Andy

Re: On URL and URIs

Posted by Stian Soiland-Reyes <st...@apache.org>.
I think the current approach is quite good - we have interfaces so the
implementation can have whatever class-hierarchy you want, e.g. a
disk-based RDF store might have a common RDFTermImpl superclass
containing 64-bit identifiers and on-demand loading of the long
strings.

On the other hand, the implementations in commons-rdf-simple can be
used as-is when they are good enough - so say you are happy with
simple.LiteralImpl, then you should just use that directly, and still
have your own IRIOnDiskImpl. That is - your RDFTermFactory-created
instances are not required to all be from the same Java package.




On 31 March 2015 at 03:55, Peter Ansell <an...@gmail.com> wrote:
> On 31 March 2015 at 05:49, Gary Gregory <ga...@gmail.com> wrote:
>> I was surprised to see an interace called IRI. When I see IRI, I expect a
>> class like the JRE's URI (or URL), not an interface.
>>
>> This interface looks more like a IRIProvider to me.
>
> If it was implemented as a class, then every implementation would need
> to extend it to make it suit their system, or we create it as a final
> class and hope it suits everyone, which doesn't work.
>
> It is not within our scope to dictate the internal workings of the
> systems we are trying to provide interoperability for. In particular,
> libraries written in non-Java-JVM languages will have different
> assumptions to the Java based implementations. Although they mostly
> provide interoperability with Java, the interfaces should be
> applicable to them in the same way as the Java libraries. If the
> entire API is written using interfaces, they have less Java-specific
> details to work around.
>
> Cheers,
>
> Peter



-- 
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons RDF (incubating)
http://orcid.org/0000-0001-9842-9718

Re: On URL and URIs

Posted by Reto Gmür <re...@apache.org>.
On Tue, Mar 31, 2015 at 2:55 AM, Peter Ansell <an...@gmail.com>
wrote:

> On 31 March 2015 at 05:49, Gary Gregory <ga...@gmail.com> wrote:
> > I was surprised to see an interace called IRI. When I see IRI, I expect a
> > class like the JRE's URI (or URL), not an interface.
> >
> > This interface looks more like a IRIProvider to me.
>
> If it was implemented as a class, then every implementation would need
> to extend it to make it suit their system, or we create it as a final
> class and hope it suits everyone, which doesn't work.
>
I wasn't proposing it should be final.


>
> It is not within our scope to dictate the internal workings of the
> systems we are trying to provide interoperability for. In particular,
> libraries written in non-Java-JVM languages will have different
> assumptions to the Java based implementations. Although they mostly
> provide interoperability with Java, the interfaces should be
> applicable to them in the same way as the Java libraries. If the
> entire API is written using interfaces, they have less Java-specific
> details to work around.
>

Could you explain this is more detail or show a simple scenario where
having interfaces makes things easier for such an implementations?
Keep in mind that such an implementation also has to deal with other IRI
implementations which would then have to be replaced with their own
versions, so things might actually get more complicated as implementations
have to handle two cases.

I see a clear increase of complexity on the other side: with the interfaces
just having a constant value of type IRI (as in classes generated by
something like schemagen) requires either implementing the interface or
have a runtime dependency on the RdfTermFactory.

Cheers,
Reto

Re: On URL and URIs

Posted by Peter Ansell <an...@gmail.com>.
On 31 March 2015 at 05:49, Gary Gregory <ga...@gmail.com> wrote:
> I was surprised to see an interace called IRI. When I see IRI, I expect a
> class like the JRE's URI (or URL), not an interface.
>
> This interface looks more like a IRIProvider to me.

If it was implemented as a class, then every implementation would need
to extend it to make it suit their system, or we create it as a final
class and hope it suits everyone, which doesn't work.

It is not within our scope to dictate the internal workings of the
systems we are trying to provide interoperability for. In particular,
libraries written in non-Java-JVM languages will have different
assumptions to the Java based implementations. Although they mostly
provide interoperability with Java, the interfaces should be
applicable to them in the same way as the Java libraries. If the
entire API is written using interfaces, they have less Java-specific
details to work around.

Cheers,

Peter

Re: On URL and URIs

Posted by Andy Seaborne <an...@apache.org>.
On 30/03/15 19:49, Gary Gregory wrote:
> I was surprised to see an interace called IRI. When I see IRI, I expect a
> class like the JRE's URI (or URL), not an interface.
>
> This interface looks more like a IRIProvider to me.

Gary - How does that help with BlankNodeOrIRI?

Triple.getSubject() -> BlankNodeOrIRI
Triple.getPredicate() -> IRI

	Andy

>
> Gary
>
> On Mon, Mar 30, 2015 at 11:25 AM, Andy Seaborne <an...@apache.org> wrote:
>
>> On 30/03/15 14:21, Reto Gmür wrote:
>>
>>> The current code uses an interface
>>> IRI (a different from URL and URI in the java core library for which I
>>> fail
>>> to understand the justifying use cases)
>>>
>>
>> java.net.URLs has inappropriate operations (.open)
>>
>> The stumbling block is the desired for a typed interface
>> and the fact that subjects are "BlankNodeOrIRI"
>>
>> java.net.URI is a class and is final.
>>
>> That can be worked around although IMO it's not pretty to have, say, a
>> union for BlankNodeOrIRI + variations on all method calls mentioning
>> BlankNodeOrIRI as arguments.
>>
>> java.net.URI is not bad and it's UTF-8 + IPv6 additions, not strict
>> US-ASCII.
>>
>> The constructor has an implicit parser behind it so it is not a simple
>> wrapper of little cost. The parser is good for the syntax. It does not do
>> punycode in toASCIIString().
>>
>>
>> There is a question of how to treat bad data - whether to allow bad IRIs
>> at all so that an application can use the API and then clean the data up by
>> processing the RDF or whether it needs to clean it beforehand.
>>
>> That's a style issue, not a purely technical one.
>>
>>    Check early vs be as permissive as possible.
>>
>> (example: other people's data is often fit for their purpose but may be
>> strictly "bad". An ETL pipeline might want to just get stuff in and then
>> fix it working in RDFTerms, e.g. apply " " to "%20" or NFC rules.)
>>
>> As commons-rdf is supposed to be neutral to underlying providers, making
>> that call is not right.
>>
>>          Andy
>>
>
>
>


Re: On URL and URIs

Posted by Gary Gregory <ga...@gmail.com>.
I was surprised to see an interace called IRI. When I see IRI, I expect a
class like the JRE's URI (or URL), not an interface.

This interface looks more like a IRIProvider to me.

Gary

On Mon, Mar 30, 2015 at 11:25 AM, Andy Seaborne <an...@apache.org> wrote:

> On 30/03/15 14:21, Reto Gmür wrote:
>
>> The current code uses an interface
>> IRI (a different from URL and URI in the java core library for which I
>> fail
>> to understand the justifying use cases)
>>
>
> java.net.URLs has inappropriate operations (.open)
>
> The stumbling block is the desired for a typed interface
> and the fact that subjects are "BlankNodeOrIRI"
>
> java.net.URI is a class and is final.
>
> That can be worked around although IMO it's not pretty to have, say, a
> union for BlankNodeOrIRI + variations on all method calls mentioning
> BlankNodeOrIRI as arguments.
>
> java.net.URI is not bad and it's UTF-8 + IPv6 additions, not strict
> US-ASCII.
>
> The constructor has an implicit parser behind it so it is not a simple
> wrapper of little cost. The parser is good for the syntax. It does not do
> punycode in toASCIIString().
>
>
> There is a question of how to treat bad data - whether to allow bad IRIs
> at all so that an application can use the API and then clean the data up by
> processing the RDF or whether it needs to clean it beforehand.
>
> That's a style issue, not a purely technical one.
>
>   Check early vs be as permissive as possible.
>
> (example: other people's data is often fit for their purpose but may be
> strictly "bad". An ETL pipeline might want to just get stuff in and then
> fix it working in RDFTerms, e.g. apply " " to "%20" or NFC rules.)
>
> As commons-rdf is supposed to be neutral to underlying providers, making
> that call is not right.
>
>         Andy
>



-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory