You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Bill Roberts <bi...@swirrl.com> on 2014/02/11 12:05:05 UTC
geosparql and Jena
Hi All
Does anyone have plans to implement GeoSPARQL in Jena? I'm aware of Jena Spatial which obviously has many functional similarities, but just wondering if there are plans for GeoSPARQL itself?
Thanks
Bill
Re: geosparql and Jena
Posted by Bill Roberts <bi...@swirrl.com>.
Hi Andy
Thanks very much for the detailed assessment.
No I wasn't on the reviewing panel for the W3C/OGC event - I haven't seen your paper for that! (I do plan to attend the event).
I asked the question because we have been looking at the various approaches all based on the Lucene geo functions (Jena spatial, ElasticSearch etc) for indexing and searching geographical data in a linked data context.
The Lucene based approaches should do what we want to do for now, but was just thinking that if there was a standardised way of incorporating the geo functions into SPARQL, that would be attractive for us - if the performance was decent of course, and I realise that is far from easy to achieve!
Cheers
Bill
On 12 Feb 2014, at 10:32, Andy Seaborne <an...@apache.org> wrote:
> On 11/02/14 11:05, Bill Roberts wrote:
>> Hi All
>>
>> Does anyone have plans to implement GeoSPARQL in Jena? I'm aware of Jena Spatial which obviously has many functional similarities, but just wondering if there are plans for GeoSPARQL itself?
>>
>> Thanks
>>
>> Bill
>>
>
> Hi Bill -
>
> Hope you weren't one of the reviewers for the submission to the W3C/OGC workshop that coming up! The "why not GeoSPARQL" came up but as the submission tries to point out, the work needed to do even a partial GeoSPARQL is not insignificant.
>
> There are non-technical issues as well. Support and users questions - suppose a complete, perfect implementation is released in Jena. Or suppose it's a partial implementation - now there is a need to explain what is and isn't implemented.
>
> The first step it needs someone to investigate it properly; it does look to me like something that needs resource with access to a geospatial expert for at least advice. It's not in the same league as a one-off patch to ARQ.
>
> jena-spatial was driven by the availability of the geo functions in lucene. jena-spatial is a self-contained extension, GeoSPARQL needs deep integration into the query engine just to do the same point-in-bounding box functionality.
>
> From what I can see, there needs to be a community around geospatial data somehow, not just users learning about geospatial data. That would be good to have wherever it is; Jena community, sub project, independent project on github.
>
> GeoSPARQL is a core and number of extensions. The core is just some class definitions - Jena already supports all the core requirements as does all general SPARQL engines but it does not do anything. It's the various extensions that give the functionality.
>
> GeoSPARQL covers regions and boundaries - for the Topology Vocabulary Extension (section 7) it needs one or more geo-reasoners to provide the topological relations e.g. geo:sfDisjoint in relation_family=Simple Features; There is also relation_family=Egenhofer and relation_family=RCC8.
>
> Geometry Extension (section 8) have the interesting part "Non-topological Query Functions" (section 8.7)
>
> Take function "geof:distance"
>
> FILTER ( geof:distance(?geoPoints, SomeFixedGeo, units) < 56 )
>
> which is the within-circle function.
>
> If you simply add that function as a custom function to a general purpose SPARQL engine, then to calculate it you need to full scan of the geo data to find all the ?geoPoints, and filter them. That's the situation we had pre-jena-spatial. It's slow even on modest data without access to a geospatial index (R-tree, quad-tree, lucene spatial, whatever),
>
> jena-spatial collects the bounded geospatial access together in one property function that asks a geo index that can find a few points of interest very quickly then adds info from the rest of the RDF data.
>
> To do the GeoSPARQL style, you need to pick out from the graph pattern part where ?geo came from, being careful that the non-geo access patterns are not made inefficient in the process. It's an optimization problem. If the focus is on a geospatial DB, then it's not too bad but if the RDG database is some geo and a lot of other data, all the optimization choices get mixed up and compete.
>
> There are various other geof:* functions which work on regions and run into later sections getting more complicated.
>
> The Query Rewrite Extension (section 11) looks fun. It's query rewrite to turn property relationships into primitive data access and custom functions. ARQ can do that but again, what about when in the context of general data as well?
>
> I haven't found geo libraries to use except spatial4j. There are some that are various ones using GPL which I haven't tried, and obvious they have consequences for the whole of Jena. There would need to be some kind of geo index, working with the optimizer and data loading. The Lucene spatial index is just point data. An R-tree and regions is needed for more general GeoSPARQL extensions.
>
> So - call to geo-experts - is that a fair assessment? Being wrong about the amount of work needed would be very good news.
>
> Andy
>
Re: geosparql and Jena
Posted by Andy Seaborne <an...@apache.org>.
On 11/02/14 11:05, Bill Roberts wrote:
> Hi All
>
> Does anyone have plans to implement GeoSPARQL in Jena? I'm aware of Jena Spatial which obviously has many functional similarities, but just wondering if there are plans for GeoSPARQL itself?
>
> Thanks
>
> Bill
>
Hi Bill -
Hope you weren't one of the reviewers for the submission to the W3C/OGC
workshop that coming up! The "why not GeoSPARQL" came up but as the
submission tries to point out, the work needed to do even a partial
GeoSPARQL is not insignificant.
There are non-technical issues as well. Support and users questions -
suppose a complete, perfect implementation is released in Jena. Or
suppose it's a partial implementation - now there is a need to explain
what is and isn't implemented.
The first step it needs someone to investigate it properly; it does look
to me like something that needs resource with access to a geospatial
expert for at least advice. It's not in the same league as a one-off
patch to ARQ.
jena-spatial was driven by the availability of the geo functions in
lucene. jena-spatial is a self-contained extension, GeoSPARQL needs
deep integration into the query engine just to do the same
point-in-bounding box functionality.
From what I can see, there needs to be a community around geospatial
data somehow, not just users learning about geospatial data. That would
be good to have wherever it is; Jena community, sub project, independent
project on github.
GeoSPARQL is a core and number of extensions. The core is just some
class definitions - Jena already supports all the core requirements as
does all general SPARQL engines but it does not do anything. It's the
various extensions that give the functionality.
GeoSPARQL covers regions and boundaries - for the Topology Vocabulary
Extension (section 7) it needs one or more geo-reasoners to provide the
topological relations e.g. geo:sfDisjoint in relation_family=Simple
Features; There is also relation_family=Egenhofer and relation_family=RCC8.
Geometry Extension (section 8) have the interesting part
"Non-topological Query Functions" (section 8.7)
Take function "geof:distance"
FILTER ( geof:distance(?geoPoints, SomeFixedGeo, units) < 56 )
which is the within-circle function.
If you simply add that function as a custom function to a general
purpose SPARQL engine, then to calculate it you need to full scan of the
geo data to find all the ?geoPoints, and filter them. That's the
situation we had pre-jena-spatial. It's slow even on modest data
without access to a geospatial index (R-tree, quad-tree, lucene spatial,
whatever),
jena-spatial collects the bounded geospatial access together in one
property function that asks a geo index that can find a few points of
interest very quickly then adds info from the rest of the RDF data.
To do the GeoSPARQL style, you need to pick out from the graph pattern
part where ?geo came from, being careful that the non-geo access
patterns are not made inefficient in the process. It's an optimization
problem. If the focus is on a geospatial DB, then it's not too bad but
if the RDG database is some geo and a lot of other data, all the
optimization choices get mixed up and compete.
There are various other geof:* functions which work on regions and run
into later sections getting more complicated.
The Query Rewrite Extension (section 11) looks fun. It's query rewrite
to turn property relationships into primitive data access and custom
functions. ARQ can do that but again, what about when in the context of
general data as well?
I haven't found geo libraries to use except spatial4j. There are some
that are various ones using GPL which I haven't tried, and obvious they
have consequences for the whole of Jena. There would need to be some
kind of geo index, working with the optimizer and data loading. The
Lucene spatial index is just point data. An R-tree and regions is needed
for more general GeoSPARQL extensions.
So - call to geo-experts - is that a fair assessment? Being wrong about
the amount of work needed would be very good news.
Andy