You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Maxime Lefrançois <ma...@emse.fr> on 2018/06/15 13:49:36 UTC

Re: Contribution proposal for Jena: support of a datatype for quantity values

Dear all,

Regarding our contribution proposal to enable extensions to override SPARQL
operators in Jena

We finally got the agreement from our institution to contribute to the
Apache foundation.
Question 1: what is the procedure to upload the form?

About the how, I would like to discuss first with you

In a nutshell this is what I was thinking about:

Add use of the standard Java Service Provider API to load things
automatically found in the classpath:

- In TypeMapper --> a method that uses the Service Provider API to find
more Datatypes
- Datatype subclasses are not for just one URI, but could be for a set of
URIs
- ValueSpaceClassification should not be an enum any more --> maybe use a
class ValueSpace ...
- should add some interface like NodeValueComparator, with some methods
like:
  - canCompare(ValueSpace vs, ValueSpace vs)
  - sameAs(NodeValue nv, NodeValue nv)
  - compare(NodeValue nv, NodeValue nv)
  - add(NodeValue nv, NodeValue nv)
  - substract(NodeValue nv, NodeValue nv)
  - sameAs(NodeValue nv, NodeValue nv)

- in NodeValue class, method sameAs(NodeValue nv1, NodeValue nv2) and
compare(...) should  uses the Service Provider API to find
NodeValueComparators in the classpath
- in class NodeValueOps, method divisionNV(NodeValue nv1, NodeValue nv2),
multiplicationNV(...) additionNV(...)  , subtractionNV(...)   should  uses
the Service Provider API to find more NodeValueComparators in the classpath


Any thoughts about this?

Best regards,
Maxime Lefrançois



Le sam. 7 avr. 2018 à 15:13, ajs6f <aj...@apache.org> a écrit :

> We're (well, Andy is) working on 3.7.0 now. We've been trying to maintain
> a 6-month or so release cadence, so you've hit a really good time to begin
> this work. That having been said, I don't think anyone would say that we
> are especially stringent about it, so I wouldn't worry too much about the
> timing myself.
>
> ajs6f
>
> > On Apr 6, 2018, at 9:36 AM, Maxime Lefrançois <ma...@emse.fr>
> wrote:
> >
> > Well,
> >
> > I think I have a pretty clear idea how I would do this. We would end up
> > using a registery like for custom functions or datatypes.
> > That registry would contain an ordered list of SPARQL operator handlers,
> > pre-filled by one for handling XSD datatypes.
> >
> > I am currently requesting the right to fill the Apache individual
> > contributor license agreement.
> >
> > What would be the timeline if we wanted this shipped in the next release?
> >
> > Best,
> > Maxime
> >
> > Le mar. 3 avr. 2018 à 15:30, ajs6f <aj...@apache.org> a écrit :
> >
> >> I agree. I can imagine plenty of use cases for such a powerful pair of
> >> extension points.
> >>
> >> Maxime, how can we help you attack that work? Is there a design that is
> >> already clear to you? Are there any blockers we can help remove?
> >>
> >> ajs6f
> >>
> >>> On Mar 28, 2018, at 5:08 AM, Rob Vesse <rv...@dotnetrdf.org> wrote:
> >>>
> >>> I think work towards Option 2 would be the most valuable to the
> community
> >>>
> >>>
> >>>
> >>> The SPARQL specification allows for the overloading of any
> >> operator/expression where the spec currently defines the evaluation to
> be
> >> an error so extending operators is a natural and valid extension point
> to
> >> provide
> >>>
> >>>
> >>>
> >>> The Terms of Use for UCUM would probably need us to obtain a licensing
> >> assessment from Apache Legal as it is a non-standard OSS license even if
> >> the code that implements it is under BSD (which is fine from an Apache
> >> perspective).  Therefore having a well defined extension mechanism and
> then
> >> having UCUM support live outside Apache Jena that as an extension
> >> implementation maintained by yourself would be the easiest approach
> >>>
> >>>
> >>>
> >>> Rob
> >>>
> >>>
> >>>
> >>> From: Maxime Lefrançois <ma...@emse.fr>
> >>> Reply-To: <de...@jena.apache.org>
> >>> Date: Wednesday, 28 March 2018 at 09:29
> >>> To: <de...@jena.apache.org>
> >>> Subject: Re: Contribution proposal for Jena: support of a datatype for
> >> quantity values
> >>>
> >>>
> >>>
> >>> Dear all,
> >>>
> >>>
> >>>
> >>> Happy to see you are interested the UCUM datatypes !
> >>>
> >>>
> >>>
> >>> Ok so let's dive in the technical details.
> >>>
> >>>
> >>>
> >>> # Compare Jena 3.6.0 and Jena 3.6.0-ucum
> >>>
> >>>
> >>>
> >>>
> >>
> https://github.com/apache/jena/compare/master...OpenSensingCity:jena-3.6.0-ucum
> >>>
> >>>
> >>>
> >>> # Modules, dependencies, licences
> >>>
> >>>
> >>>
> >>> Two modules forked so far: jena-core and jena-arq.
> >>>
> >>> One dependency added to jena-core (after a minor change I made today):
> >>>
> >>>
> >>>
> >>> systems.uom:systems-ucum-java8:0.7.2
> >>>
> >>> -> BSD license of systems-uom,
> >>>
> >>>    and license of UCUM http://unitsofmeasure.org/trac/wiki/TermsOfUse
> >>>
> >>>
> >>>
> >>> --> this use implementation of JSR 363 indeed - Units of Measurement
> API
> >>>
> >>> (see attached for the transitive dependencies, all from
> >> https://github.com/unitsofmeasurement )
> >>>
> >>>
> >>>
> >>> # External module ?
> >>>
> >>>
> >>>
> >>> I would have been happy to develop a separate extension of Jena for the
> >> UCUM datatypes.
> >>>
> >>> One of the main reasons why this is not possible was pointed out by
> Andy:
> >>>
> >>> I had to add a new value space VSPACE_QUANTITY to overload the SPARQL
> >> operators '<>=' and arithmetic functions '+-*/'.
> >>>
> >>>
> >>>
> >>> Indeed, there are two parts: the necessary extensions for operators,
> and
> >> the units themselves.
> >>>
> >>>
> >>>
> >>> We could choose some other unit system than UCUM, but UCUM is very
> >> comprehensive and has different implementations in different programming
> >> languages. It would be possible to implement UCUM datatypes in other
> >> RDF-SPARQL engines.
> >>>
> >>>
> >>>
> >>> # possible directions
> >>>
> >>>
> >>>
> >>> I see three main possible directions of work there:
> >>>
> >>>
> >>>
> >>> 1. work on the proposal as and potentially integrate it completely
> >>>
> >>> 2. work on jena-core and jena-arq to make the definition of new
> >> datatypes and the overloading of operators as easy as the definition of
> new
> >> custom functions --> so that I can easily implement UCUM datatypes as an
> >> extension (and not a fork)
> >>>
> >>> 3. add VSPACE_QUANTITY value space and NodeValueQuantity in jena-arq,
> >> and externalize the support for the UCUM systems of unit in an external
> >> module
> >>>
> >>>
> >>>
> >>> Best,
> >>>
> >>> Maxime
> >>>
> >>>
> >>>
> >>> Le mar. 27 mars 2018 à 17:16, Andy Seaborne <an...@apache.org> a écrit
> :
> >>>
> >>> Extending the operators for SPARQL is a new value space
> VSPACE_QUANTITY.
> >>>
> >>> See (comparison):
> >>>
> >>>
> >>
> https://github.com/OpenSensingCity/jena-ucum/blob/jena-3.6.0-ucum/jena-arq/src/main/java/org/apache/jena/sparql/expr/NodeValue.java#L566
> >>>
> >>> and (multiply)
> >>>
> >>>
> >>
> https://github.com/OpenSensingCity/jena-ucum/blob/jena-3.6.0-ucum/jena-arq/src/main/java/org/apache/jena/sparql/expr/nodevalue/NodeValueOps.java#L283
> >>>
> >>> with a new NodeValueQuantity for javax.measure.Quantity
> >>>
> >>> I'm seeing this a "one dimensional units" - a quantity and a unit.
> >>>
> >>> Even then, there are two part - the necessary extensions for operators
> >>> and the units themselves to allow for other unit systems (?).
> >>>
> >>> There are new dependencies in jena-arq and jena-core.
> >>>
> >>> http://unitsofmeasurement.github.io/
> >>> JSR 363 - Units of Measurement API
> >>> BSD-license
> >>>
> >>> and an old version of something is on central:
> >>>
> >>> http://central.maven.org/maven2/javax/measure/unit-api/1.0
> >>>
> >>> if that's the right thing.
> >>>
> >>> ---
> >>>
> >>> Maxime - what are the dependencies for this contribution and for which
> >>> pieces are they needed?
> >>>
> >>>    Andy
> >>>
> >>> On 27/03/18 15:49, ajs6f wrote:
> >>>> Bruno raises an interesting question-- would this contribution have
> any
> >> effect (or should it) on jena-spatial? Would it be either necessary or
> if
> >> not, appropriate to integrate there? (I'm particularly interested in
> this
> >> because it might help decide between core and an extension.)
> >>>>
> >>>>
> >>>> ajs6f
> >>>>
> >>>>> On Mar 26, 2018, at 5:40 PM, Bruno P. Kinoshita <ki...@apache.org>
> >> wrote:
> >>>>>
> >>>>> Hi Maxime,
> >>>>> Don't know whether it would be best as part of jena core or in an
> >> extension, but sounds very interesting! Will let others comment on this.
> >>>>> At work, one item in my backlog is to replace jscience by jsr363 -
> >> Units of Measurement
> >>>>> |
> >>>>> |
> >>>>> |
> >>>>> |   |    |
> >>>>>
> >>>>>  |
> >>>>>
> >>>>> |
> >>>>> |
> >>>>> |   |
> >>>>> Units of Measurement
> >>>>>
> >>>>> Units of Measurement provides a set of APIs and services for handling
> >> units and quantities.
> >>>>> |   |
> >>>>>
> >>>>> |
> >>>>>
> >>>>> |
> >>>>>
> >>>>>
> >>>>> We use it for weather forecast and GIS, with things like wind speed,
> >> rain amount, etc.
> >>>>> I think another GIS library that we use did the switch as well (some
> >> OGC lib I think).
> >>>>> Perhaps it would be nice to consider taking a look at their api for
> >> compatibility with other systems.
> >>>>> CheersBruno
> >>>>>
> >>>>> Sent from Yahoo Mail on Android
> >>>>>
> >>>>> On Tue, 27 Mar 2018 at 2:07, Maxime Lefrançois<
> >> maxime.lefrancois@emse.fr> wrote:   Dear all,
> >>>>>
> >>>>> I am Associate Professor at MINES Saint-Étienne, France, working on
> >>>>> Semantic Web and Linked Data. I'd like to let you know about our
> >>>>> project *Custom
> >>>>> Datatypes for Quantity Values*[1], that leverages the Unified Code of
> >> Units
> >>>>> of Measures, a code system intended to include all units of measures
> >> being
> >>>>> contemporarily used in international science, engineering, and
> >> business.
> >>>>> Using our UCUM Datatypes, one can encode and query quantity values
> in a
> >>>>> lightweight manner:
> >>>>>
> >>>>> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#>
> >>>>> PREFIX ex: <http://example.org/>
> >>>>>
> >>>>> SELECT ?value1 ?value2 ?result
> >>>>> WHERE{
> >>>>>  VALUES ( ?value1 ?value2 ) {
> >>>>>    ( "1.0 m/s"^^cdt:speed "2 s"^^cdt:time )
> >>>>>  }
> >>>>>  BIND( ?value1 * ?value2 AS ?result )
> >>>>> }
> >>>>>
> >>>>> Results in
> >>>>>
> >>>>>
> ----------------------------------------------------------------------
> >>>>> | value1              | value2              | result              |
> >>>>>
> ======================================================================
> >>>>> | "1.0 m/s"^^cdt:speed | "2 s"^^cdt:time      | "2.0 m"^^cdt:length
> |
> >>>>>
> >>>>> See our demonstration online [2].
> >>>>> It uses *a fork of Jena where we implemented UCUM datatypes* [3] (in
> >>>>> jena-core and jena-arq, with several unit tests) our implementation
> >> uses
> >>>>> the recent JSR 385, Units of Measurement API 2.0, and the UCUM
> >> extension
> >>>>> [4].
> >>>>>
> >>>>> This is not the first project I develop into/using Jena.
> >>>>> - I forked it to Supporting Arbitrary Custom Datatypes in RDF and
> >> SPARQL
> >>>>> fetching some Javascript definition at the URI of the datatype [5]
> >>>>> - I develop SPARQL-Generate, an extension of SPARQL implemented on
> ARQ
> >> to
> >>>>> generate RDF from web documents in XML, JSON, CSV, HTML, CBOR, and
> >> plain
> >>>>> text with regular expressions  [6]
> >>>>>
> >>>>>
> >>>>> If you agree we me that supporting UCUM datatypes would be a nice
> >> addition
> >>>>> to Apache Jena and a nice contribution to the Semantic Web
> community, I
> >>>>> would be willing to help to integrate our contribution to other
> modules
> >>>>> (with jena-tdb, ... ), and help maintaining it in the future.
> >>>>>
> >>>>> Best regards,
> >>>>> Maxime Lefrançois,
> >>>>> Associate Professor, MINES Saint-Étienne
> >>>>>
> >>>>> [1] - http://w3id.org/lindt/custom_datatypes#
> >>>>> [2] - http://w3id.org/lindt/playground.html?example=05-Multiply
> >>>>> [3] - http://w3id.org/lindt/custom_datatypes#implementation
> >>>>> [4] -
> >>>>>
> >>
> https://github.com/unitsofmeasurement/uom-systems/tree/master/ucum-java8
> >>>>> [5] - https://ci.mines-stetienne.fr/lindt/spec.html
> >>>>> [6] - https://ci.mines-stetienne.fr/sparql-generate/
> >>>>
> >>>
> >>>
> >>>
> >>
> >>
>
>

Re: Contribution proposal for Jena: support of a datatype for quantity values

Posted by Andy Seaborne <an...@apache.org>.

On 15/06/18 14:49, Maxime Lefrançois wrote:
> Dear all,
> 
> Regarding our contribution proposal to enable extensions to override SPARQL
> operators in Jena
> 
> We finally got the agreement from our institution to contribute to the
> Apache foundation.
> Question 1: what is the procedure to upload the form?

Normally, the act of providing a contribution like a patch or small 
feature is enough.  The individual is making the contribution and it is 
their responsibility to know this is open source.  This also covers code 
in email to Apache mailing lists (but not elsewhere and not stackoverflow).

It sounds like your institution would prefer an explicit declaration.
These are the Individual Contributor License Agreement (ICLA) and 
Corporate Contributor License Agreement (CCLA).

Ideally, you should provide an ICLA as well as the institution 
submitting an CCLA as you are also an individual here - it will cover 
working on the contribution after submission.

The CCLA has a section to name the contribution and an optional section 
for naming people - it isn't an open ended commitment.

On the ICLA, you don't need to request an Apache id. It will cover all 
Apache projects.

The forms are available at:
https://www.apache.org/licenses/

For completeness - there is the Apache Software Grant for an existing 
body of work that may have existed a open source.  (e.g. This is used 
when a project joins Apache).

     Andy

Re: Contribution proposal for Jena: support of a datatype for quantity values

Posted by Andy Seaborne <an...@apache.org>.

On 15/06/18 14:49, Maxime Lefrançois wrote:
> Dear all,
> 
> Regarding our contribution proposal to enable extensions to override SPARQL
> operators in Jena
> 
> We finally got the agreement from our institution to contribute to the
> Apache foundation.
> Question 1: what is the procedure to upload the form?

Normally, the act of providing a contribution like a patch or small 
feature is enough.  The individual is making the contribution and it is 
their responsibility to know this is open source.  This also covers code 
in email to Apache mailing lists (but not elsewhere and not stackoverflow).

It sounds like your institution would prefer an explicit declaration.
These are the Individual Contributor License Agreement (ICLA) and 
Corporate Contributor License Agreement (CCLA).

Ideally, you should provide an ICLA as well as the institution 
submitting an CCLA as you are also an individual here - it will cover 
working on the contribution after submission.

The CCLA has a section to name the contribution and an optional section 
for naming people - it isn't an open ended commitment.

On the ICLA, you don't need to request an Apache id. It will cover all 
Apache projects.

The forms are available at:
https://www.apache.org/licenses/

For completeness - there is the Apache Software Grant for an existing 
body of work that may have existed a open source.  (e.g. This is used 
when a project joins Apache).

     Andy

(if this is a duplicate, it is because I got a network error on sending)

Re: Contribution proposal for Jena: support of a datatype for quantity values

Posted by Maxime Lefrançois <ma...@emse.fr>.
Some comments:



> > Add use of the standard Java Service Provider API to load things
> automatically found in the classpath:
> >
> > - In TypeMapper --> a method that uses the Service Provider API to find
> more Datatypes
>
> Should this be a method, or rather additional behavior for getTypeByName,
> etc.? Are you thinking of something like "void getMoreMappings()" which
> would check for more available datatypes?
>

Don't know yet, what is you opinion? At some functionality like this would
be coded somewhere in the TypeMapper class.


>
> > - Datatype subclasses are not for just one URI, but could be for a set
> of URIs
>
> Would that be true of Java types, as well?
>

I think it would be better to avoid this being true for Java types.


>
> > - ValueSpaceClassification should not be an enum any more --> maybe use
> a class ValueSpace ...
> > - should add some interface like NodeValueComparator, with some methods
> like:
> >  - canCompare(ValueSpace vs, ValueSpace vs)
> >  - sameAs(NodeValue nv, NodeValue nv)
> >  - compare(NodeValue nv, NodeValue nv)
>
> Should this return a Comparator<NodeValue> instead? (Thinking of sorting.)
>

Could be, but I tried to mimic and externalize what's already there in the
NodeValue class.


> >  - add(NodeValue nv, NodeValue nv)
> >  - substract(NodeValue nv, NodeValue nv)
> > - in NodeValue class, method sameAs(NodeValue nv1, NodeValue nv2) and
> compare(...) should  uses the Service Provider API to find
> NodeValueComparators in the classpath
> > - in class NodeValueOps, method divisionNV(NodeValue nv1, NodeValue
> nv2), multiplicationNV(...) additionNV(...)  , subtractionNV(...)   should
> uses the Service Provider API to find more NodeValueComparators in the
> classpath
>
> Hm. Is there some way this could happen via a lookup in TypeMapper? I'd
> rather not see too many paths to the same service impls...
>

Don't think so, as this would lead to mixing things between jena-core and
jena-arq

Best,
Max


> > Any thoughts about this?
>
> Yes: thank you so much for doing this excellent work!


> > Best regards,
> > Maxime Lefrançois
> >
> >
> >
> > Le sam. 7 avr. 2018 à 15:13, ajs6f <aj...@apache.org> a écrit :
> >
> >> We're (well, Andy is) working on 3.7.0 now. We've been trying to
> maintain
> >> a 6-month or so release cadence, so you've hit a really good time to
> begin
> >> this work. That having been said, I don't think anyone would say that we
> >> are especially stringent about it, so I wouldn't worry too much about
> the
> >> timing myself.
> >>
> >> ajs6f
> >>
> >>> On Apr 6, 2018, at 9:36 AM, Maxime Lefrançois <
> maxime.lefrancois@emse.fr>
> >> wrote:
> >>>
> >>> Well,
> >>>
> >>> I think I have a pretty clear idea how I would do this. We would end up
> >>> using a registery like for custom functions or datatypes.
> >>> That registry would contain an ordered list of SPARQL operator
> handlers,
> >>> pre-filled by one for handling XSD datatypes.
> >>>
> >>> I am currently requesting the right to fill the Apache individual
> >>> contributor license agreement.
> >>>
> >>> What would be the timeline if we wanted this shipped in the next
> release?
> >>>
> >>> Best,
> >>> Maxime
> >>>
> >>> Le mar. 3 avr. 2018 à 15:30, ajs6f <aj...@apache.org> a écrit :
> >>>
> >>>> I agree. I can imagine plenty of use cases for such a powerful pair of
> >>>> extension points.
> >>>>
> >>>> Maxime, how can we help you attack that work? Is there a design that
> is
> >>>> already clear to you? Are there any blockers we can help remove?
> >>>>
> >>>> ajs6f
> >>>>
> >>>>> On Mar 28, 2018, at 5:08 AM, Rob Vesse <rv...@dotnetrdf.org> wrote:
> >>>>>
> >>>>> I think work towards Option 2 would be the most valuable to the
> >> community
> >>>>>
> >>>>>
> >>>>>
> >>>>> The SPARQL specification allows for the overloading of any
> >>>> operator/expression where the spec currently defines the evaluation to
> >> be
> >>>> an error so extending operators is a natural and valid extension point
> >> to
> >>>> provide
> >>>>>
> >>>>>
> >>>>>
> >>>>> The Terms of Use for UCUM would probably need us to obtain a
> licensing
> >>>> assessment from Apache Legal as it is a non-standard OSS license even
> if
> >>>> the code that implements it is under BSD (which is fine from an Apache
> >>>> perspective).  Therefore having a well defined extension mechanism and
> >> then
> >>>> having UCUM support live outside Apache Jena that as an extension
> >>>> implementation maintained by yourself would be the easiest approach
> >>>>>
> >>>>>
> >>>>>
> >>>>> Rob
> >>>>>
> >>>>>
> >>>>>
> >>>>> From: Maxime Lefrançois <ma...@emse.fr>
> >>>>> Reply-To: <de...@jena.apache.org>
> >>>>> Date: Wednesday, 28 March 2018 at 09:29
> >>>>> To: <de...@jena.apache.org>
> >>>>> Subject: Re: Contribution proposal for Jena: support of a datatype
> for
> >>>> quantity values
> >>>>>
> >>>>>
> >>>>>
> >>>>> Dear all,
> >>>>>
> >>>>>
> >>>>>
> >>>>> Happy to see you are interested the UCUM datatypes !
> >>>>>
> >>>>>
> >>>>>
> >>>>> Ok so let's dive in the technical details.
> >>>>>
> >>>>>
> >>>>>
> >>>>> # Compare Jena 3.6.0 and Jena 3.6.0-ucum
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>
> https://github.com/apache/jena/compare/master...OpenSensingCity:jena-3.6.0-ucum
> >>>>>
> >>>>>
> >>>>>
> >>>>> # Modules, dependencies, licences
> >>>>>
> >>>>>
> >>>>>
> >>>>> Two modules forked so far: jena-core and jena-arq.
> >>>>>
> >>>>> One dependency added to jena-core (after a minor change I made
> today):
> >>>>>
> >>>>>
> >>>>>
> >>>>> systems.uom:systems-ucum-java8:0.7.2
> >>>>>
> >>>>> -> BSD license of systems-uom,
> >>>>>
> >>>>>   and license of UCUM http://unitsofmeasure.org/trac/wiki/TermsOfUse
> >>>>>
> >>>>>
> >>>>>
> >>>>> --> this use implementation of JSR 363 indeed - Units of Measurement
> >> API
> >>>>>
> >>>>> (see attached for the transitive dependencies, all from
> >>>> https://github.com/unitsofmeasurement )
> >>>>>
> >>>>>
> >>>>>
> >>>>> # External module ?
> >>>>>
> >>>>>
> >>>>>
> >>>>> I would have been happy to develop a separate extension of Jena for
> the
> >>>> UCUM datatypes.
> >>>>>
> >>>>> One of the main reasons why this is not possible was pointed out by
> >> Andy:
> >>>>>
> >>>>> I had to add a new value space VSPACE_QUANTITY to overload the SPARQL
> >>>> operators '<>=' and arithmetic functions '+-*/'.
> >>>>>
> >>>>>
> >>>>>
> >>>>> Indeed, there are two parts: the necessary extensions for operators,
> >> and
> >>>> the units themselves.
> >>>>>
> >>>>>
> >>>>>
> >>>>> We could choose some other unit system than UCUM, but UCUM is very
> >>>> comprehensive and has different implementations in different
> programming
> >>>> languages. It would be possible to implement UCUM datatypes in other
> >>>> RDF-SPARQL engines.
> >>>>>
> >>>>>
> >>>>>
> >>>>> # possible directions
> >>>>>
> >>>>>
> >>>>>
> >>>>> I see three main possible directions of work there:
> >>>>>
> >>>>>
> >>>>>
> >>>>> 1. work on the proposal as and potentially integrate it completely
> >>>>>
> >>>>> 2. work on jena-core and jena-arq to make the definition of new
> >>>> datatypes and the overloading of operators as easy as the definition
> of
> >> new
> >>>> custom functions --> so that I can easily implement UCUM datatypes as
> an
> >>>> extension (and not a fork)
> >>>>>
> >>>>> 3. add VSPACE_QUANTITY value space and NodeValueQuantity in jena-arq,
> >>>> and externalize the support for the UCUM systems of unit in an
> external
> >>>> module
> >>>>>
> >>>>>
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Maxime
> >>>>>
> >>>>>
> >>>>>
> >>>>> Le mar. 27 mars 2018 à 17:16, Andy Seaborne <an...@apache.org> a
> écrit
> >> :
> >>>>>
> >>>>> Extending the operators for SPARQL is a new value space
> >> VSPACE_QUANTITY.
> >>>>>
> >>>>> See (comparison):
> >>>>>
> >>>>>
> >>>>
> >>
> https://github.com/OpenSensingCity/jena-ucum/blob/jena-3.6.0-ucum/jena-arq/src/main/java/org/apache/jena/sparql/expr/NodeValue.java#L566
> >>>>>
> >>>>> and (multiply)
> >>>>>
> >>>>>
> >>>>
> >>
> https://github.com/OpenSensingCity/jena-ucum/blob/jena-3.6.0-ucum/jena-arq/src/main/java/org/apache/jena/sparql/expr/nodevalue/NodeValueOps.java#L283
> >>>>>
> >>>>> with a new NodeValueQuantity for javax.measure.Quantity
> >>>>>
> >>>>> I'm seeing this a "one dimensional units" - a quantity and a unit.
> >>>>>
> >>>>> Even then, there are two part - the necessary extensions for
> operators
> >>>>> and the units themselves to allow for other unit systems (?).
> >>>>>
> >>>>> There are new dependencies in jena-arq and jena-core.
> >>>>>
> >>>>> http://unitsofmeasurement.github.io/
> >>>>> JSR 363 - Units of Measurement API
> >>>>> BSD-license
> >>>>>
> >>>>> and an old version of something is on central:
> >>>>>
> >>>>> http://central.maven.org/maven2/javax/measure/unit-api/1.0
> >>>>>
> >>>>> if that's the right thing.
> >>>>>
> >>>>> ---
> >>>>>
> >>>>> Maxime - what are the dependencies for this contribution and for
> which
> >>>>> pieces are they needed?
> >>>>>
> >>>>>   Andy
> >>>>>
> >>>>> On 27/03/18 15:49, ajs6f wrote:
> >>>>>> Bruno raises an interesting question-- would this contribution have
> >> any
> >>>> effect (or should it) on jena-spatial? Would it be either necessary or
> >> if
> >>>> not, appropriate to integrate there? (I'm particularly interested in
> >> this
> >>>> because it might help decide between core and an extension.)
> >>>>>>
> >>>>>>
> >>>>>> ajs6f
> >>>>>>
> >>>>>>> On Mar 26, 2018, at 5:40 PM, Bruno P. Kinoshita <ki...@apache.org>
> >>>> wrote:
> >>>>>>>
> >>>>>>> Hi Maxime,
> >>>>>>> Don't know whether it would be best as part of jena core or in an
> >>>> extension, but sounds very interesting! Will let others comment on
> this.
> >>>>>>> At work, one item in my backlog is to replace jscience by jsr363 -
> >>>> Units of Measurement
> >>>>>>> |
> >>>>>>> |
> >>>>>>> |
> >>>>>>> |   |    |
> >>>>>>>
> >>>>>>> |
> >>>>>>>
> >>>>>>> |
> >>>>>>> |
> >>>>>>> |   |
> >>>>>>> Units of Measurement
> >>>>>>>
> >>>>>>> Units of Measurement provides a set of APIs and services for
> handling
> >>>> units and quantities.
> >>>>>>> |   |
> >>>>>>>
> >>>>>>> |
> >>>>>>>
> >>>>>>> |
> >>>>>>>
> >>>>>>>
> >>>>>>> We use it for weather forecast and GIS, with things like wind
> speed,
> >>>> rain amount, etc.
> >>>>>>> I think another GIS library that we use did the switch as well
> (some
> >>>> OGC lib I think).
> >>>>>>> Perhaps it would be nice to consider taking a look at their api for
> >>>> compatibility with other systems.
> >>>>>>> CheersBruno
> >>>>>>>
> >>>>>>> Sent from Yahoo Mail on Android
> >>>>>>>
> >>>>>>> On Tue, 27 Mar 2018 at 2:07, Maxime Lefrançois<
> >>>> maxime.lefrancois@emse.fr> wrote:   Dear all,
> >>>>>>>
> >>>>>>> I am Associate Professor at MINES Saint-Étienne, France, working on
> >>>>>>> Semantic Web and Linked Data. I'd like to let you know about our
> >>>>>>> project *Custom
> >>>>>>> Datatypes for Quantity Values*[1], that leverages the Unified Code
> of
> >>>> Units
> >>>>>>> of Measures, a code system intended to include all units of
> measures
> >>>> being
> >>>>>>> contemporarily used in international science, engineering, and
> >>>> business.
> >>>>>>> Using our UCUM Datatypes, one can encode and query quantity values
> >> in a
> >>>>>>> lightweight manner:
> >>>>>>>
> >>>>>>> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#>
> >>>>>>> PREFIX ex: <http://example.org/>
> >>>>>>>
> >>>>>>> SELECT ?value1 ?value2 ?result
> >>>>>>> WHERE{
> >>>>>>> VALUES ( ?value1 ?value2 ) {
> >>>>>>>   ( "1.0 m/s"^^cdt:speed "2 s"^^cdt:time )
> >>>>>>> }
> >>>>>>> BIND( ?value1 * ?value2 AS ?result )
> >>>>>>> }
> >>>>>>>
> >>>>>>> Results in
> >>>>>>>
> >>>>>>>
> >> ----------------------------------------------------------------------
> >>>>>>> | value1              | value2              | result              |
> >>>>>>>
> >> ======================================================================
> >>>>>>> | "1.0 m/s"^^cdt:speed | "2 s"^^cdt:time      | "2.0 m"^^cdt:length
> >> |
> >>>>>>>
> >>>>>>> See our demonstration online [2].
> >>>>>>> It uses *a fork of Jena where we implemented UCUM datatypes* [3]
> (in
> >>>>>>> jena-core and jena-arq, with several unit tests) our implementation
> >>>> uses
> >>>>>>> the recent JSR 385, Units of Measurement API 2.0, and the UCUM
> >>>> extension
> >>>>>>> [4].
> >>>>>>>
> >>>>>>> This is not the first project I develop into/using Jena.
> >>>>>>> - I forked it to Supporting Arbitrary Custom Datatypes in RDF and
> >>>> SPARQL
> >>>>>>> fetching some Javascript definition at the URI of the datatype [5]
> >>>>>>> - I develop SPARQL-Generate, an extension of SPARQL implemented on
> >> ARQ
> >>>> to
> >>>>>>> generate RDF from web documents in XML, JSON, CSV, HTML, CBOR, and
> >>>> plain
> >>>>>>> text with regular expressions  [6]
> >>>>>>>
> >>>>>>>
> >>>>>>> If you agree we me that supporting UCUM datatypes would be a nice
> >>>> addition
> >>>>>>> to Apache Jena and a nice contribution to the Semantic Web
> >> community, I
> >>>>>>> would be willing to help to integrate our contribution to other
> >> modules
> >>>>>>> (with jena-tdb, ... ), and help maintaining it in the future.
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>> Maxime Lefrançois,
> >>>>>>> Associate Professor, MINES Saint-Étienne
> >>>>>>>
> >>>>>>> [1] - http://w3id.org/lindt/custom_datatypes#
> >>>>>>> [2] - http://w3id.org/lindt/playground.html?example=05-Multiply
> >>>>>>> [3] - http://w3id.org/lindt/custom_datatypes#implementation
> >>>>>>> [4] -
> >>>>>>>
> >>>>
> >>
> https://github.com/unitsofmeasurement/uom-systems/tree/master/ucum-java8
> >>>>>>> [5] - https://ci.mines-stetienne.fr/lindt/spec.html
> >>>>>>> [6] - https://ci.mines-stetienne.fr/sparql-generate/
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: Contribution proposal for Jena: support of a datatype for quantity values

Posted by Andy Seaborne <an...@apache.org>.

On 15/06/18 15:00, ajs6f wrote:
>> On Jun 15, 2018, at 9:49 AM, Maxime Lefrançois <ma...@emse.fr> wrote:

>> In a nutshell this is what I was thinking about:
>>
>> Add use of the standard Java Service Provider API to load things automatically found in the classpath:
>>
>> - In TypeMapper --> a method that uses the Service Provider API to find more Datatypes
> 
> Should this be a method, or rather additional behavior for getTypeByName, etc.? Are you thinking of something like "void getMoreMappings()" which would check for more available datatypes?

Jena already uses ServiceLoader for initialization - can this be used? 
or alternatively, is a separate one of some specific advantage?

https://jena.apache.org/documentation/notes/system-initialization.html

and code in the custom initializer calls of

TypeMapper.getInstance().registerDatatype(....)

>> - Datatype subclasses are not for just one URI, but could be for a set of URIs
> 
> Would that be true of Java types, as well?

NodeValue follows the rules for XSD arithmetic.


TypeMapper and NodeValue are not very connected. Types in jena-core 
don't have arithmetic or comparison.

Maybe there are two different contributiosn here - for 
TypeMapper/jena-core, and NodeValue/jena-arq.

> 
>> - ValueSpaceClassification should not be an enum any more --> maybe use a class ValueSpace ...
>> - should add some interface like NodeValueComparator, with some methods like:
>>   - canCompare(ValueSpace vs, ValueSpace vs)
>>   - sameAs(NodeValue nv, NodeValue nv)
>>   - compare(NodeValue nv, NodeValue nv)

My goal here is to make sure that extensions can be done and that the 
additional flexibility does not impact the performance of the 
Xpath/Xquery F&O evaluations.

What is the relation to QUDT? http://www.qudt.org

Before we get into the Java detail: I'd like to be sure what is being 
supported exactly? It can get a bit weird!

Is arithmetic involving numbers also going to be supported? (the 
playground says "no" for plus an dyes for multiply - is that right? - 
but it does not follow XSD so "1m*2" becomes 2.0m - integer -> decimal)

Is cdt:length * cdt:length a cdt:area?

(we'll accept answers for Euclidean space :-)

Can the query cast? If so, how does it set the measurement scale?

A "cdt:cast(quantity, unit, datatype)" would be nice.

> 
> Should this return a Comparator<NodeValue> instead? (Thinking of sorting.)

In sorting, two NodeValues are always comparable, and it falls back to 
lexical form and datatype.  Comparing by implicit value can get into 
unstable sorting.  The playground says it is:

SELECT ?value {
   VALUES ?value {2 4 "1 m/s2 "^^cdt:acceleration "3 m/s2 
"^^cdt:acceleration }
} ORDER BY ?value

NB if  "1m/s" is "1 km/s" the accelerations don't sort

I think this is because of instability:

   1.5 < 2ft
   2ft < 1m

but

   1m < 1.5

There are two forms of comparison: for the "<" operation and sorting.
Comparison must agree with sorting when comparison isn't an eval exception.


>>   - add(NodeValue nv, NodeValue nv)
>>   - substract(NodeValue nv, NodeValue nv)
>> - in NodeValue class, method sameAs(NodeValue nv1, NodeValue nv2) and compare(...) should  uses the Service Provider API to find NodeValueComparators in the classpath
>> - in class NodeValueOps, method divisionNV(NodeValue nv1, NodeValue nv2), multiplicationNV(...) additionNV(...)  , subtractionNV(...)   should  uses the Service Provider API to find more NodeValueComparators in the classpath

One way is to have a new value space "VSPACE_EXT".

This is an NodeValueExt in ARQ with a method to return a handler for 
operations.

The extension provides NodeValueCDT and the code for the handler.

There is code at the end of NodeValue._setByValue to do a 
datatype->factory look up and the fatory returns NodeValueExt.

If either argument of a binary operator is "VSPACE_EXT", then the
NodeValueExt is used to get the custom evaluation operation.

The existing code remains as-is. The existing VSPACE aren't converted to 
a provider.

This is the extension mechanism - if an extension is seen, then 
extension code is called and it has to deal with the arguments - 
otherwise existing code is used as it is at the moment.

> Hm. Is there some way this could happen via a lookup in TypeMapper? I'd rather not see too many paths to the same service impls...
> 
>> Any thoughts about this?
> 
> Yes: thank you so much for doing this excellent work!

+1

	Andy

> 
>> Best regards,
>> Maxime Lefrançois
>>
>>
>>
>> Le sam. 7 avr. 2018 à 15:13, ajs6f <aj...@apache.org> a écrit :
>>
>>> We're (well, Andy is) working on 3.7.0 now. We've been trying to maintain
>>> a 6-month or so release cadence, so you've hit a really good time to begin
>>> this work. That having been said, I don't think anyone would say that we
>>> are especially stringent about it, so I wouldn't worry too much about the
>>> timing myself.
>>>
>>> ajs6f
>>>
>>>> On Apr 6, 2018, at 9:36 AM, Maxime Lefrançois <ma...@emse.fr>
>>> wrote:
>>>>
>>>> Well,
>>>>
>>>> I think I have a pretty clear idea how I would do this. We would end up
>>>> using a registery like for custom functions or datatypes.
>>>> That registry would contain an ordered list of SPARQL operator handlers,
>>>> pre-filled by one for handling XSD datatypes.
>>>>
>>>> I am currently requesting the right to fill the Apache individual
>>>> contributor license agreement.
>>>>
>>>> What would be the timeline if we wanted this shipped in the next release?
>>>>
>>>> Best,
>>>> Maxime
>>>>
>>>> Le mar. 3 avr. 2018 à 15:30, ajs6f <aj...@apache.org> a écrit :
>>>>
>>>>> I agree. I can imagine plenty of use cases for such a powerful pair of
>>>>> extension points.
>>>>>
>>>>> Maxime, how can we help you attack that work? Is there a design that is
>>>>> already clear to you? Are there any blockers we can help remove?
>>>>>
>>>>> ajs6f
>>>>>
>>>>>> On Mar 28, 2018, at 5:08 AM, Rob Vesse <rv...@dotnetrdf.org> wrote:
>>>>>>
>>>>>> I think work towards Option 2 would be the most valuable to the
>>> community
>>>>>>
>>>>>>
>>>>>>
>>>>>> The SPARQL specification allows for the overloading of any
>>>>> operator/expression where the spec currently defines the evaluation to
>>> be
>>>>> an error so extending operators is a natural and valid extension point
>>> to
>>>>> provide
>>>>>>
>>>>>>
>>>>>>
>>>>>> The Terms of Use for UCUM would probably need us to obtain a licensing
>>>>> assessment from Apache Legal as it is a non-standard OSS license even if
>>>>> the code that implements it is under BSD (which is fine from an Apache
>>>>> perspective).  Therefore having a well defined extension mechanism and
>>> then
>>>>> having UCUM support live outside Apache Jena that as an extension
>>>>> implementation maintained by yourself would be the easiest approach
>>>>>>
>>>>>>
>>>>>>
>>>>>> Rob
>>>>>>
>>>>>>
>>>>>>
>>>>>> From: Maxime Lefrançois <ma...@emse.fr>
>>>>>> Reply-To: <de...@jena.apache.org>
>>>>>> Date: Wednesday, 28 March 2018 at 09:29
>>>>>> To: <de...@jena.apache.org>
>>>>>> Subject: Re: Contribution proposal for Jena: support of a datatype for
>>>>> quantity values
>>>>>>
>>>>>>
>>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Happy to see you are interested the UCUM datatypes !
>>>>>>
>>>>>>
>>>>>>
>>>>>> Ok so let's dive in the technical details.
>>>>>>
>>>>>>
>>>>>>
>>>>>> # Compare Jena 3.6.0 and Jena 3.6.0-ucum
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>> https://github.com/apache/jena/compare/master...OpenSensingCity:jena-3.6.0-ucum
>>>>>>
>>>>>>
>>>>>>
>>>>>> # Modules, dependencies, licences
>>>>>>
>>>>>>
>>>>>>
>>>>>> Two modules forked so far: jena-core and jena-arq.
>>>>>>
>>>>>> One dependency added to jena-core (after a minor change I made today):
>>>>>>
>>>>>>
>>>>>>
>>>>>> systems.uom:systems-ucum-java8:0.7.2
>>>>>>
>>>>>> -> BSD license of systems-uom,
>>>>>>
>>>>>>    and license of UCUM http://unitsofmeasure.org/trac/wiki/TermsOfUse
>>>>>>
>>>>>>
>>>>>>
>>>>>> --> this use implementation of JSR 363 indeed - Units of Measurement
>>> API
>>>>>>
>>>>>> (see attached for the transitive dependencies, all from
>>>>> https://github.com/unitsofmeasurement )
>>>>>>
>>>>>>
>>>>>>
>>>>>> # External module ?
>>>>>>
>>>>>>
>>>>>>
>>>>>> I would have been happy to develop a separate extension of Jena for the
>>>>> UCUM datatypes.
>>>>>>
>>>>>> One of the main reasons why this is not possible was pointed out by
>>> Andy:
>>>>>>
>>>>>> I had to add a new value space VSPACE_QUANTITY to overload the SPARQL
>>>>> operators '<>=' and arithmetic functions '+-*/'.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Indeed, there are two parts: the necessary extensions for operators,
>>> and
>>>>> the units themselves.
>>>>>>
>>>>>>
>>>>>>
>>>>>> We could choose some other unit system than UCUM, but UCUM is very
>>>>> comprehensive and has different implementations in different programming
>>>>> languages. It would be possible to implement UCUM datatypes in other
>>>>> RDF-SPARQL engines.
>>>>>>
>>>>>>
>>>>>>
>>>>>> # possible directions
>>>>>>
>>>>>>
>>>>>>
>>>>>> I see three main possible directions of work there:
>>>>>>
>>>>>>
>>>>>>
>>>>>> 1. work on the proposal as and potentially integrate it completely
>>>>>>
>>>>>> 2. work on jena-core and jena-arq to make the definition of new
>>>>> datatypes and the overloading of operators as easy as the definition of
>>> new
>>>>> custom functions --> so that I can easily implement UCUM datatypes as an
>>>>> extension (and not a fork)
>>>>>>
>>>>>> 3. add VSPACE_QUANTITY value space and NodeValueQuantity in jena-arq,
>>>>> and externalize the support for the UCUM systems of unit in an external
>>>>> module
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Maxime
>>>>>>
>>>>>>
>>>>>>
>>>>>> Le mar. 27 mars 2018 à 17:16, Andy Seaborne <an...@apache.org> a écrit
>>> :
>>>>>>
>>>>>> Extending the operators for SPARQL is a new value space
>>> VSPACE_QUANTITY.
>>>>>>
>>>>>> See (comparison):
>>>>>>
>>>>>>
>>>>>
>>> https://github.com/OpenSensingCity/jena-ucum/blob/jena-3.6.0-ucum/jena-arq/src/main/java/org/apache/jena/sparql/expr/NodeValue.java#L566
>>>>>>
>>>>>> and (multiply)
>>>>>>
>>>>>>
>>>>>
>>> https://github.com/OpenSensingCity/jena-ucum/blob/jena-3.6.0-ucum/jena-arq/src/main/java/org/apache/jena/sparql/expr/nodevalue/NodeValueOps.java#L283
>>>>>>
>>>>>> with a new NodeValueQuantity for javax.measure.Quantity
>>>>>>
>>>>>> I'm seeing this a "one dimensional units" - a quantity and a unit.
>>>>>>
>>>>>> Even then, there are two part - the necessary extensions for operators
>>>>>> and the units themselves to allow for other unit systems (?).
>>>>>>
>>>>>> There are new dependencies in jena-arq and jena-core.
>>>>>>
>>>>>> http://unitsofmeasurement.github.io/
>>>>>> JSR 363 - Units of Measurement API
>>>>>> BSD-license
>>>>>>
>>>>>> and an old version of something is on central:
>>>>>>
>>>>>> http://central.maven.org/maven2/javax/measure/unit-api/1.0
>>>>>>
>>>>>> if that's the right thing.
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> Maxime - what are the dependencies for this contribution and for which
>>>>>> pieces are they needed?
>>>>>>
>>>>>>    Andy
>>>>>>
>>>>>> On 27/03/18 15:49, ajs6f wrote:
>>>>>>> Bruno raises an interesting question-- would this contribution have
>>> any
>>>>> effect (or should it) on jena-spatial? Would it be either necessary or
>>> if
>>>>> not, appropriate to integrate there? (I'm particularly interested in
>>> this
>>>>> because it might help decide between core and an extension.)
>>>>>>>
>>>>>>>
>>>>>>> ajs6f
>>>>>>>
>>>>>>>> On Mar 26, 2018, at 5:40 PM, Bruno P. Kinoshita <ki...@apache.org>
>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Maxime,
>>>>>>>> Don't know whether it would be best as part of jena core or in an
>>>>> extension, but sounds very interesting! Will let others comment on this.
>>>>>>>> At work, one item in my backlog is to replace jscience by jsr363 -
>>>>> Units of Measurement
>>>>>>>> |
>>>>>>>> |
>>>>>>>> |
>>>>>>>> |   |    |
>>>>>>>>
>>>>>>>> |
>>>>>>>>
>>>>>>>> |
>>>>>>>> |
>>>>>>>> |   |
>>>>>>>> Units of Measurement
>>>>>>>>
>>>>>>>> Units of Measurement provides a set of APIs and services for handling
>>>>> units and quantities.
>>>>>>>> |   |
>>>>>>>>
>>>>>>>> |
>>>>>>>>
>>>>>>>> |
>>>>>>>>
>>>>>>>>
>>>>>>>> We use it for weather forecast and GIS, with things like wind speed,
>>>>> rain amount, etc.
>>>>>>>> I think another GIS library that we use did the switch as well (some
>>>>> OGC lib I think).
>>>>>>>> Perhaps it would be nice to consider taking a look at their api for
>>>>> compatibility with other systems.
>>>>>>>> CheersBruno
>>>>>>>>
>>>>>>>> Sent from Yahoo Mail on Android
>>>>>>>>
>>>>>>>> On Tue, 27 Mar 2018 at 2:07, Maxime Lefrançois<
>>>>> maxime.lefrancois@emse.fr> wrote:   Dear all,
>>>>>>>>
>>>>>>>> I am Associate Professor at MINES Saint-Étienne, France, working on
>>>>>>>> Semantic Web and Linked Data. I'd like to let you know about our
>>>>>>>> project *Custom
>>>>>>>> Datatypes for Quantity Values*[1], that leverages the Unified Code of
>>>>> Units
>>>>>>>> of Measures, a code system intended to include all units of measures
>>>>> being
>>>>>>>> contemporarily used in international science, engineering, and
>>>>> business.
>>>>>>>> Using our UCUM Datatypes, one can encode and query quantity values
>>> in a
>>>>>>>> lightweight manner:
>>>>>>>>
>>>>>>>> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#>
>>>>>>>> PREFIX ex: <http://example.org/>
>>>>>>>>
>>>>>>>> SELECT ?value1 ?value2 ?result
>>>>>>>> WHERE{
>>>>>>>> VALUES ( ?value1 ?value2 ) {
>>>>>>>>    ( "1.0 m/s"^^cdt:speed "2 s"^^cdt:time )
>>>>>>>> }
>>>>>>>> BIND( ?value1 * ?value2 AS ?result )
>>>>>>>> }
>>>>>>>>
>>>>>>>> Results in
>>>>>>>>
>>>>>>>>
>>> ----------------------------------------------------------------------
>>>>>>>> | value1              | value2              | result              |
>>>>>>>>
>>> ======================================================================
>>>>>>>> | "1.0 m/s"^^cdt:speed | "2 s"^^cdt:time      | "2.0 m"^^cdt:length
>>> |
>>>>>>>>
>>>>>>>> See our demonstration online [2].
>>>>>>>> It uses *a fork of Jena where we implemented UCUM datatypes* [3] (in
>>>>>>>> jena-core and jena-arq, with several unit tests) our implementation
>>>>> uses
>>>>>>>> the recent JSR 385, Units of Measurement API 2.0, and the UCUM
>>>>> extension
>>>>>>>> [4].
>>>>>>>>
>>>>>>>> This is not the first project I develop into/using Jena.
>>>>>>>> - I forked it to Supporting Arbitrary Custom Datatypes in RDF and
>>>>> SPARQL
>>>>>>>> fetching some Javascript definition at the URI of the datatype [5]
>>>>>>>> - I develop SPARQL-Generate, an extension of SPARQL implemented on
>>> ARQ
>>>>> to
>>>>>>>> generate RDF from web documents in XML, JSON, CSV, HTML, CBOR, and
>>>>> plain
>>>>>>>> text with regular expressions  [6]
>>>>>>>>
>>>>>>>>
>>>>>>>> If you agree we me that supporting UCUM datatypes would be a nice
>>>>> addition
>>>>>>>> to Apache Jena and a nice contribution to the Semantic Web
>>> community, I
>>>>>>>> would be willing to help to integrate our contribution to other
>>> modules
>>>>>>>> (with jena-tdb, ... ), and help maintaining it in the future.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Maxime Lefrançois,
>>>>>>>> Associate Professor, MINES Saint-Étienne
>>>>>>>>
>>>>>>>> [1] - http://w3id.org/lindt/custom_datatypes#
>>>>>>>> [2] - http://w3id.org/lindt/playground.html?example=05-Multiply
>>>>>>>> [3] - http://w3id.org/lindt/custom_datatypes#implementation
>>>>>>>> [4] -
>>>>>>>>
>>>>>
>>> https://github.com/unitsofmeasurement/uom-systems/tree/master/ucum-java8
>>>>>>>> [5] - https://ci.mines-stetienne.fr/lindt/spec.html
>>>>>>>> [6] - https://ci.mines-stetienne.fr/sparql-generate/
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>>
> 

Re: Contribution proposal for Jena: support of a datatype for quantity values

Posted by ajs6f <aj...@apache.org>.
See comments in-line...

ajs6f

> On Jun 15, 2018, at 9:49 AM, Maxime Lefrançois <ma...@emse.fr> wrote:
> 
> Dear all,
> 
> Regarding our contribution proposal to enable extensions to override SPARQL operators in Jena
> 
> We finally got the agreement from our institution to contribute to the Apache foundation.
> Question 1: what is the procedure to upload the form?

If we're talking about:

https://www.apache.org/licenses/cla-corporate.txt

then you can just scan and email a PDF to secretary@apache.org. There are other means of submission at that URL.

> About the how, I would like to discuss first with you
> 
> In a nutshell this is what I was thinking about:
> 
> Add use of the standard Java Service Provider API to load things automatically found in the classpath:
> 
> - In TypeMapper --> a method that uses the Service Provider API to find more Datatypes

Should this be a method, or rather additional behavior for getTypeByName, etc.? Are you thinking of something like "void getMoreMappings()" which would check for more available datatypes?

> - Datatype subclasses are not for just one URI, but could be for a set of URIs

Would that be true of Java types, as well?

> - ValueSpaceClassification should not be an enum any more --> maybe use a class ValueSpace ...
> - should add some interface like NodeValueComparator, with some methods like:
>  - canCompare(ValueSpace vs, ValueSpace vs)
>  - sameAs(NodeValue nv, NodeValue nv)
>  - compare(NodeValue nv, NodeValue nv)

Should this return a Comparator<NodeValue> instead? (Thinking of sorting.)

>  - add(NodeValue nv, NodeValue nv)
>  - substract(NodeValue nv, NodeValue nv)
> - in NodeValue class, method sameAs(NodeValue nv1, NodeValue nv2) and compare(...) should  uses the Service Provider API to find NodeValueComparators in the classpath
> - in class NodeValueOps, method divisionNV(NodeValue nv1, NodeValue nv2), multiplicationNV(...) additionNV(...)  , subtractionNV(...)   should  uses the Service Provider API to find more NodeValueComparators in the classpath

Hm. Is there some way this could happen via a lookup in TypeMapper? I'd rather not see too many paths to the same service impls...

> Any thoughts about this?

Yes: thank you so much for doing this excellent work!

> Best regards,
> Maxime Lefrançois
> 
> 
> 
> Le sam. 7 avr. 2018 à 15:13, ajs6f <aj...@apache.org> a écrit :
> 
>> We're (well, Andy is) working on 3.7.0 now. We've been trying to maintain
>> a 6-month or so release cadence, so you've hit a really good time to begin
>> this work. That having been said, I don't think anyone would say that we
>> are especially stringent about it, so I wouldn't worry too much about the
>> timing myself.
>> 
>> ajs6f
>> 
>>> On Apr 6, 2018, at 9:36 AM, Maxime Lefrançois <ma...@emse.fr>
>> wrote:
>>> 
>>> Well,
>>> 
>>> I think I have a pretty clear idea how I would do this. We would end up
>>> using a registery like for custom functions or datatypes.
>>> That registry would contain an ordered list of SPARQL operator handlers,
>>> pre-filled by one for handling XSD datatypes.
>>> 
>>> I am currently requesting the right to fill the Apache individual
>>> contributor license agreement.
>>> 
>>> What would be the timeline if we wanted this shipped in the next release?
>>> 
>>> Best,
>>> Maxime
>>> 
>>> Le mar. 3 avr. 2018 à 15:30, ajs6f <aj...@apache.org> a écrit :
>>> 
>>>> I agree. I can imagine plenty of use cases for such a powerful pair of
>>>> extension points.
>>>> 
>>>> Maxime, how can we help you attack that work? Is there a design that is
>>>> already clear to you? Are there any blockers we can help remove?
>>>> 
>>>> ajs6f
>>>> 
>>>>> On Mar 28, 2018, at 5:08 AM, Rob Vesse <rv...@dotnetrdf.org> wrote:
>>>>> 
>>>>> I think work towards Option 2 would be the most valuable to the
>> community
>>>>> 
>>>>> 
>>>>> 
>>>>> The SPARQL specification allows for the overloading of any
>>>> operator/expression where the spec currently defines the evaluation to
>> be
>>>> an error so extending operators is a natural and valid extension point
>> to
>>>> provide
>>>>> 
>>>>> 
>>>>> 
>>>>> The Terms of Use for UCUM would probably need us to obtain a licensing
>>>> assessment from Apache Legal as it is a non-standard OSS license even if
>>>> the code that implements it is under BSD (which is fine from an Apache
>>>> perspective).  Therefore having a well defined extension mechanism and
>> then
>>>> having UCUM support live outside Apache Jena that as an extension
>>>> implementation maintained by yourself would be the easiest approach
>>>>> 
>>>>> 
>>>>> 
>>>>> Rob
>>>>> 
>>>>> 
>>>>> 
>>>>> From: Maxime Lefrançois <ma...@emse.fr>
>>>>> Reply-To: <de...@jena.apache.org>
>>>>> Date: Wednesday, 28 March 2018 at 09:29
>>>>> To: <de...@jena.apache.org>
>>>>> Subject: Re: Contribution proposal for Jena: support of a datatype for
>>>> quantity values
>>>>> 
>>>>> 
>>>>> 
>>>>> Dear all,
>>>>> 
>>>>> 
>>>>> 
>>>>> Happy to see you are interested the UCUM datatypes !
>>>>> 
>>>>> 
>>>>> 
>>>>> Ok so let's dive in the technical details.
>>>>> 
>>>>> 
>>>>> 
>>>>> # Compare Jena 3.6.0 and Jena 3.6.0-ucum
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>> https://github.com/apache/jena/compare/master...OpenSensingCity:jena-3.6.0-ucum
>>>>> 
>>>>> 
>>>>> 
>>>>> # Modules, dependencies, licences
>>>>> 
>>>>> 
>>>>> 
>>>>> Two modules forked so far: jena-core and jena-arq.
>>>>> 
>>>>> One dependency added to jena-core (after a minor change I made today):
>>>>> 
>>>>> 
>>>>> 
>>>>> systems.uom:systems-ucum-java8:0.7.2
>>>>> 
>>>>> -> BSD license of systems-uom,
>>>>> 
>>>>>   and license of UCUM http://unitsofmeasure.org/trac/wiki/TermsOfUse
>>>>> 
>>>>> 
>>>>> 
>>>>> --> this use implementation of JSR 363 indeed - Units of Measurement
>> API
>>>>> 
>>>>> (see attached for the transitive dependencies, all from
>>>> https://github.com/unitsofmeasurement )
>>>>> 
>>>>> 
>>>>> 
>>>>> # External module ?
>>>>> 
>>>>> 
>>>>> 
>>>>> I would have been happy to develop a separate extension of Jena for the
>>>> UCUM datatypes.
>>>>> 
>>>>> One of the main reasons why this is not possible was pointed out by
>> Andy:
>>>>> 
>>>>> I had to add a new value space VSPACE_QUANTITY to overload the SPARQL
>>>> operators '<>=' and arithmetic functions '+-*/'.
>>>>> 
>>>>> 
>>>>> 
>>>>> Indeed, there are two parts: the necessary extensions for operators,
>> and
>>>> the units themselves.
>>>>> 
>>>>> 
>>>>> 
>>>>> We could choose some other unit system than UCUM, but UCUM is very
>>>> comprehensive and has different implementations in different programming
>>>> languages. It would be possible to implement UCUM datatypes in other
>>>> RDF-SPARQL engines.
>>>>> 
>>>>> 
>>>>> 
>>>>> # possible directions
>>>>> 
>>>>> 
>>>>> 
>>>>> I see three main possible directions of work there:
>>>>> 
>>>>> 
>>>>> 
>>>>> 1. work on the proposal as and potentially integrate it completely
>>>>> 
>>>>> 2. work on jena-core and jena-arq to make the definition of new
>>>> datatypes and the overloading of operators as easy as the definition of
>> new
>>>> custom functions --> so that I can easily implement UCUM datatypes as an
>>>> extension (and not a fork)
>>>>> 
>>>>> 3. add VSPACE_QUANTITY value space and NodeValueQuantity in jena-arq,
>>>> and externalize the support for the UCUM systems of unit in an external
>>>> module
>>>>> 
>>>>> 
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Maxime
>>>>> 
>>>>> 
>>>>> 
>>>>> Le mar. 27 mars 2018 à 17:16, Andy Seaborne <an...@apache.org> a écrit
>> :
>>>>> 
>>>>> Extending the operators for SPARQL is a new value space
>> VSPACE_QUANTITY.
>>>>> 
>>>>> See (comparison):
>>>>> 
>>>>> 
>>>> 
>> https://github.com/OpenSensingCity/jena-ucum/blob/jena-3.6.0-ucum/jena-arq/src/main/java/org/apache/jena/sparql/expr/NodeValue.java#L566
>>>>> 
>>>>> and (multiply)
>>>>> 
>>>>> 
>>>> 
>> https://github.com/OpenSensingCity/jena-ucum/blob/jena-3.6.0-ucum/jena-arq/src/main/java/org/apache/jena/sparql/expr/nodevalue/NodeValueOps.java#L283
>>>>> 
>>>>> with a new NodeValueQuantity for javax.measure.Quantity
>>>>> 
>>>>> I'm seeing this a "one dimensional units" - a quantity and a unit.
>>>>> 
>>>>> Even then, there are two part - the necessary extensions for operators
>>>>> and the units themselves to allow for other unit systems (?).
>>>>> 
>>>>> There are new dependencies in jena-arq and jena-core.
>>>>> 
>>>>> http://unitsofmeasurement.github.io/
>>>>> JSR 363 - Units of Measurement API
>>>>> BSD-license
>>>>> 
>>>>> and an old version of something is on central:
>>>>> 
>>>>> http://central.maven.org/maven2/javax/measure/unit-api/1.0
>>>>> 
>>>>> if that's the right thing.
>>>>> 
>>>>> ---
>>>>> 
>>>>> Maxime - what are the dependencies for this contribution and for which
>>>>> pieces are they needed?
>>>>> 
>>>>>   Andy
>>>>> 
>>>>> On 27/03/18 15:49, ajs6f wrote:
>>>>>> Bruno raises an interesting question-- would this contribution have
>> any
>>>> effect (or should it) on jena-spatial? Would it be either necessary or
>> if
>>>> not, appropriate to integrate there? (I'm particularly interested in
>> this
>>>> because it might help decide between core and an extension.)
>>>>>> 
>>>>>> 
>>>>>> ajs6f
>>>>>> 
>>>>>>> On Mar 26, 2018, at 5:40 PM, Bruno P. Kinoshita <ki...@apache.org>
>>>> wrote:
>>>>>>> 
>>>>>>> Hi Maxime,
>>>>>>> Don't know whether it would be best as part of jena core or in an
>>>> extension, but sounds very interesting! Will let others comment on this.
>>>>>>> At work, one item in my backlog is to replace jscience by jsr363 -
>>>> Units of Measurement
>>>>>>> |
>>>>>>> |
>>>>>>> |
>>>>>>> |   |    |
>>>>>>> 
>>>>>>> |
>>>>>>> 
>>>>>>> |
>>>>>>> |
>>>>>>> |   |
>>>>>>> Units of Measurement
>>>>>>> 
>>>>>>> Units of Measurement provides a set of APIs and services for handling
>>>> units and quantities.
>>>>>>> |   |
>>>>>>> 
>>>>>>> |
>>>>>>> 
>>>>>>> |
>>>>>>> 
>>>>>>> 
>>>>>>> We use it for weather forecast and GIS, with things like wind speed,
>>>> rain amount, etc.
>>>>>>> I think another GIS library that we use did the switch as well (some
>>>> OGC lib I think).
>>>>>>> Perhaps it would be nice to consider taking a look at their api for
>>>> compatibility with other systems.
>>>>>>> CheersBruno
>>>>>>> 
>>>>>>> Sent from Yahoo Mail on Android
>>>>>>> 
>>>>>>> On Tue, 27 Mar 2018 at 2:07, Maxime Lefrançois<
>>>> maxime.lefrancois@emse.fr> wrote:   Dear all,
>>>>>>> 
>>>>>>> I am Associate Professor at MINES Saint-Étienne, France, working on
>>>>>>> Semantic Web and Linked Data. I'd like to let you know about our
>>>>>>> project *Custom
>>>>>>> Datatypes for Quantity Values*[1], that leverages the Unified Code of
>>>> Units
>>>>>>> of Measures, a code system intended to include all units of measures
>>>> being
>>>>>>> contemporarily used in international science, engineering, and
>>>> business.
>>>>>>> Using our UCUM Datatypes, one can encode and query quantity values
>> in a
>>>>>>> lightweight manner:
>>>>>>> 
>>>>>>> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#>
>>>>>>> PREFIX ex: <http://example.org/>
>>>>>>> 
>>>>>>> SELECT ?value1 ?value2 ?result
>>>>>>> WHERE{
>>>>>>> VALUES ( ?value1 ?value2 ) {
>>>>>>>   ( "1.0 m/s"^^cdt:speed "2 s"^^cdt:time )
>>>>>>> }
>>>>>>> BIND( ?value1 * ?value2 AS ?result )
>>>>>>> }
>>>>>>> 
>>>>>>> Results in
>>>>>>> 
>>>>>>> 
>> ----------------------------------------------------------------------
>>>>>>> | value1              | value2              | result              |
>>>>>>> 
>> ======================================================================
>>>>>>> | "1.0 m/s"^^cdt:speed | "2 s"^^cdt:time      | "2.0 m"^^cdt:length
>> |
>>>>>>> 
>>>>>>> See our demonstration online [2].
>>>>>>> It uses *a fork of Jena where we implemented UCUM datatypes* [3] (in
>>>>>>> jena-core and jena-arq, with several unit tests) our implementation
>>>> uses
>>>>>>> the recent JSR 385, Units of Measurement API 2.0, and the UCUM
>>>> extension
>>>>>>> [4].
>>>>>>> 
>>>>>>> This is not the first project I develop into/using Jena.
>>>>>>> - I forked it to Supporting Arbitrary Custom Datatypes in RDF and
>>>> SPARQL
>>>>>>> fetching some Javascript definition at the URI of the datatype [5]
>>>>>>> - I develop SPARQL-Generate, an extension of SPARQL implemented on
>> ARQ
>>>> to
>>>>>>> generate RDF from web documents in XML, JSON, CSV, HTML, CBOR, and
>>>> plain
>>>>>>> text with regular expressions  [6]
>>>>>>> 
>>>>>>> 
>>>>>>> If you agree we me that supporting UCUM datatypes would be a nice
>>>> addition
>>>>>>> to Apache Jena and a nice contribution to the Semantic Web
>> community, I
>>>>>>> would be willing to help to integrate our contribution to other
>> modules
>>>>>>> (with jena-tdb, ... ), and help maintaining it in the future.
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> Maxime Lefrançois,
>>>>>>> Associate Professor, MINES Saint-Étienne
>>>>>>> 
>>>>>>> [1] - http://w3id.org/lindt/custom_datatypes#
>>>>>>> [2] - http://w3id.org/lindt/playground.html?example=05-Multiply
>>>>>>> [3] - http://w3id.org/lindt/custom_datatypes#implementation
>>>>>>> [4] -
>>>>>>> 
>>>> 
>> https://github.com/unitsofmeasurement/uom-systems/tree/master/ucum-java8
>>>>>>> [5] - https://ci.mines-stetienne.fr/lindt/spec.html
>>>>>>> [6] - https://ci.mines-stetienne.fr/sparql-generate/
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>>