You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Bill Roberts <bi...@swirrl.com> on 2014/02/11 12:05:05 UTC

geosparql and Jena

Hi All

Does anyone have plans to implement GeoSPARQL in Jena?  I'm aware of Jena Spatial which obviously has many functional similarities, but just wondering if there are plans for GeoSPARQL itself?

Thanks

Bill

Re: geosparql and Jena

Posted by Bill Roberts <bi...@swirrl.com>.
Hi Andy

Thanks very much for the detailed assessment.

No I wasn't on the reviewing panel for the W3C/OGC event - I haven't seen your paper for that!  (I do plan to attend the event).

I asked the question because we have been looking at the various approaches all based on the Lucene geo functions (Jena spatial, ElasticSearch etc) for indexing and searching geographical data in a linked data context.

The Lucene based approaches should do what we want to do for now, but was just thinking that if there was a standardised way of incorporating the geo functions into SPARQL, that would be attractive for us - if the performance was decent of course, and I realise that is far from easy to achieve!


Cheers

Bill





On 12 Feb 2014, at 10:32, Andy Seaborne <an...@apache.org> wrote:

> On 11/02/14 11:05, Bill Roberts wrote:
>> Hi All
>> 
>> Does anyone have plans to implement GeoSPARQL in Jena?  I'm aware of Jena Spatial which obviously has many functional similarities, but just wondering if there are plans for GeoSPARQL itself?
>> 
>> Thanks
>> 
>> Bill
>> 
> 
> Hi Bill -
> 
> Hope you weren't one of the reviewers for the submission to the W3C/OGC workshop that coming up!  The "why not GeoSPARQL" came up but as the submission tries to point out, the work needed to do even a partial GeoSPARQL is not insignificant.
> 
> There are non-technical issues as well.  Support and users questions - suppose a complete, perfect implementation is released in Jena.  Or suppose it's a partial implementation - now there is a need to explain what is and isn't implemented.
> 
> The first step it needs someone to investigate it properly; it does look to me like something that needs resource with access to a geospatial expert for at least advice.  It's not in the same league as a one-off patch to ARQ.
> 
> jena-spatial was driven by the availability of the geo functions in lucene.  jena-spatial is a self-contained extension, GeoSPARQL needs deep integration into the query engine just to do the same point-in-bounding box functionality.
> 
> From what I can see, there needs to be a community around geospatial data somehow, not just users learning about geospatial data.  That would be good to have wherever it is; Jena community, sub project, independent project on github.
> 
> GeoSPARQL is a core and number of extensions.   The core is just some class definitions - Jena already supports all the core requirements as does all general SPARQL engines but it does not do anything.   It's the various extensions that give the functionality.
> 
> GeoSPARQL covers regions and boundaries - for the Topology Vocabulary Extension (section 7) it needs one or more geo-reasoners to provide the topological relations e.g.  geo:sfDisjoint in relation_family=Simple Features;  There is also relation_family=Egenhofer and relation_family=RCC8.
> 
> Geometry Extension (section 8) have the interesting part "Non-topological Query Functions" (section 8.7)
> 
> Take function "geof:distance"
> 
>    FILTER ( geof:distance(?geoPoints, SomeFixedGeo, units) < 56 )
> 
> which is the within-circle function.
> 
> If you simply add that function as a custom function to a general purpose SPARQL engine, then to calculate it you need to full scan of the geo data to find all the ?geoPoints, and filter them.  That's the situation we had pre-jena-spatial.  It's slow even on modest data without access to a geospatial index (R-tree, quad-tree, lucene spatial, whatever),
> 
> jena-spatial collects the bounded geospatial access together in one property function that asks a geo index that can find a few points of interest very quickly then adds info from the rest of the RDF data.
> 
> To do the GeoSPARQL style, you need to pick out from the graph pattern part where ?geo came from, being careful that the non-geo access patterns are not made inefficient in the process.  It's an optimization problem.  If the focus is on a geospatial DB, then it's not too bad but if the RDG database is some geo and a lot of other data, all the optimization choices get mixed up and compete.
> 
> There are various other geof:* functions which work on regions and run into later sections getting more complicated.
> 
> The Query Rewrite Extension (section 11) looks fun.  It's query rewrite to turn property relationships into primitive data access and custom functions.  ARQ can do that but again, what about when in the context of general data as well?
> 
> I haven't found geo libraries to use except spatial4j.  There are some that are various ones using GPL which I haven't tried, and obvious they have consequences for the whole of Jena.  There would need to be some kind of geo index, working with the optimizer and data loading.  The Lucene spatial index is just point data. An R-tree and regions is needed for more general GeoSPARQL extensions.
> 
> So - call to geo-experts - is that a fair assessment?  Being wrong about the amount of work needed would be very good news.
> 
> 	Andy
> 


Re: geosparql and Jena

Posted by Andy Seaborne <an...@apache.org>.
On 11/02/14 11:05, Bill Roberts wrote:
> Hi All
>
> Does anyone have plans to implement GeoSPARQL in Jena?  I'm aware of Jena Spatial which obviously has many functional similarities, but just wondering if there are plans for GeoSPARQL itself?
>
> Thanks
>
> Bill
>

Hi Bill -

Hope you weren't one of the reviewers for the submission to the W3C/OGC 
workshop that coming up!  The "why not GeoSPARQL" came up but as the 
submission tries to point out, the work needed to do even a partial 
GeoSPARQL is not insignificant.

There are non-technical issues as well.  Support and users questions - 
suppose a complete, perfect implementation is released in Jena.  Or 
suppose it's a partial implementation - now there is a need to explain 
what is and isn't implemented.

The first step it needs someone to investigate it properly; it does look 
to me like something that needs resource with access to a geospatial 
expert for at least advice.  It's not in the same league as a one-off 
patch to ARQ.

jena-spatial was driven by the availability of the geo functions in 
lucene.  jena-spatial is a self-contained extension, GeoSPARQL needs 
deep integration into the query engine just to do the same 
point-in-bounding box functionality.

 From what I can see, there needs to be a community around geospatial 
data somehow, not just users learning about geospatial data.  That would 
be good to have wherever it is; Jena community, sub project, independent 
project on github.

GeoSPARQL is a core and number of extensions.   The core is just some 
class definitions - Jena already supports all the core requirements as 
does all general SPARQL engines but it does not do anything.   It's the 
various extensions that give the functionality.

GeoSPARQL covers regions and boundaries - for the Topology Vocabulary 
Extension (section 7) it needs one or more geo-reasoners to provide the 
topological relations e.g.  geo:sfDisjoint in relation_family=Simple 
Features;  There is also relation_family=Egenhofer and relation_family=RCC8.

Geometry Extension (section 8) have the interesting part 
"Non-topological Query Functions" (section 8.7)

Take function "geof:distance"

     FILTER ( geof:distance(?geoPoints, SomeFixedGeo, units) < 56 )

which is the within-circle function.

If you simply add that function as a custom function to a general 
purpose SPARQL engine, then to calculate it you need to full scan of the 
geo data to find all the ?geoPoints, and filter them.  That's the 
situation we had pre-jena-spatial.  It's slow even on modest data 
without access to a geospatial index (R-tree, quad-tree, lucene spatial, 
whatever),

jena-spatial collects the bounded geospatial access together in one 
property function that asks a geo index that can find a few points of 
interest very quickly then adds info from the rest of the RDF data.

To do the GeoSPARQL style, you need to pick out from the graph pattern 
part where ?geo came from, being careful that the non-geo access 
patterns are not made inefficient in the process.  It's an optimization 
problem.  If the focus is on a geospatial DB, then it's not too bad but 
if the RDG database is some geo and a lot of other data, all the 
optimization choices get mixed up and compete.

There are various other geof:* functions which work on regions and run 
into later sections getting more complicated.

The Query Rewrite Extension (section 11) looks fun.  It's query rewrite 
to turn property relationships into primitive data access and custom 
functions.  ARQ can do that but again, what about when in the context of 
general data as well?

I haven't found geo libraries to use except spatial4j.  There are some 
that are various ones using GPL which I haven't tried, and obvious they 
have consequences for the whole of Jena.  There would need to be some 
kind of geo index, working with the optimizer and data loading.  The 
Lucene spatial index is just point data. An R-tree and regions is needed 
for more general GeoSPARQL extensions.

So - call to geo-experts - is that a fair assessment?  Being wrong about 
the amount of work needed would be very good news.

	Andy