You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Heidi McClure <he...@issinc.com> on 2014/02/25 20:41:07 UTC

Creating Spatial Lucene index from existing TDB data store

I have an existing TDB data store that I would like to create a spatial index for.  Are there utilities or API's to do this?

I have successfully followed the examples for reading in from .ttl files and creating the TDB and spatial index data at the same time.

My TDB has GeoSPARQL nodes in it like:

<http://issinc.com/events#event_901112240019> <http://www.opengis.net/ont/geosparql#asWKT> "POINT(1.7488388 40.05863)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>

thanks,
-heidi



RE: Creating Spatial Lucene index from existing TDB data store

Posted by Heidi McClure <he...@issinc.com>.
Thanks Andy - I'll look at the jena.spatialindexer for creating an index from existing TDB data.  

In case it helps others, below is the config I used to start a TDB backed store with both text and spatial indexes configured.  I used this command to start:

fuseki-server --config=C:/jena/jena-fuseki-1.0.0/config-text-spatial-myTDBStore.ttl /ds

And I added the JTS classes to my fuseki-server.jar.

config-text-spatial-myTDBStore.ttl is a modified version of the config-text.ttl in the jena documentation and the text-spatial one contains:

## Example of a TDB dataset and text index published using Fuseki

@prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix spatial: <http://jena.apache.org/spatial#> .
@prefix geosparql: <http://www.opengis.net/ont/geosparql#> .

[] rdf:type fuseki:Server ;
   # Timeout - server-wide default: milliseconds.
   # Format 1: "1000" -- 1 second timeout
   # Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout to for rest of query.
   # See java doc for ARQ.queryTimeout
   # ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "10000" ] ;
   # ja:loadClass "your.code.Class" ;

   fuseki:services (
     <#service_text_tdb>
	 <#service_spatial_tdb>
   ) .

# TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

# Text
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
#text:TextIndexSolr    rdfs:subClassOf   text:TextIndex .
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .

# Spatial
[] ja:loadClass "org.apache.jena.query.spatial.SpatialQuery" .
spatial:SpatialtDataset  rdfs:subClassOf  ja:RDFDataset .
#spatial:SpatialIndexSolr  rdfs:subClassOf  spatial:SpatialIndex .
spatial:SpatialIndexLucene  rdfs:subClassOf   spatial:SpatialIndex .

## ---------------------------------------------------------------

<#service_text_tdb> rdf:type fuseki:Service ;
    rdfs:label                      "TDB/text service" ;
    fuseki:name                     "ds" ;
    fuseki:serviceQuery             "query" ;
    fuseki:serviceQuery             "sparql" ;
    fuseki:serviceUpdate            "update" ;
    fuseki:serviceUpload            "upload" ;
    fuseki:serviceReadGraphStore    "get" ;
    fuseki:serviceReadWriteGraphStore    "data" ;
    fuseki:dataset                  <#text_dataset> ;
    .

<#text_dataset> rdf:type     text:TextDataset ;
    text:dataset   <#dataset> ;
    ##text:index   <#indexSolr> ;
    text:index     <#indexTextLucene> ;
    .

<#dataset> rdf:type      tdb:DatasetTDB ;
    tdb:location "Data/myTDBStore" ;
    ##tdb:unionDefaultGraph true ;
    .

<#indexSolr> a text:TextIndexSolr ;
    #text:server <http://localhost:8983/solr/COLLECTION> ;
    text:server <embedded:SolrARQ> ;
    text:entityMap <#entMap> ;
    .

<#indexTextLucene> a text:TextIndexLucene ;
    text:directory <file:Data/myTDBStore_text_index> ;
    ##text:directory "mem" ;
    text:entityMap <#entMap> ;
    .

<#entMap> a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "text" ;        ## Should be defined in the text:map.
    text:map (
         # rdfs:label            
         [ text:field "text" ; text:predicate rdfs:label ]
         ) .

##---------------------------------------------------------------
<#service_spatial_tdb> rdf:type fuseki:Service ;
    rdfs:label                      "TDB/spatial service" ;
    fuseki:name                     "ds" ;
    fuseki:serviceQuery             "query" ;
    fuseki:serviceQuery             "sparql" ;
    fuseki:serviceUpdate            "update" ;
    fuseki:serviceUpload            "upload" ;
    fuseki:serviceReadGraphStore    "get" ;
    fuseki:serviceReadWriteGraphStore    "data" ;
    fuseki:dataset                  :spatial_dataset ;
	.
	
:spatial_dataset rdf:type     spatial:SpatialDataset ;
    spatial:dataset   <#dataset> ;
    ##spaital:index   <#indexSolr> ;
    spatial:index     <#indexSpatialLucene> ;
    .

<#indexSpatialLucene> a spatial:SpatialIndexLucene ;
    spatial:directory <file:Data/myTDBStore_spatial_index> ;
    #spatial:directory "mem" ;
    spatial:definition <#definition> ;
    .

<#definition> a spatial:EntityDefinition ;
    spatial:entityField      "uri" ;
    spatial:geoField     "geo" ;
    # custom geo predicates for 1) Latitude/Longitude Format
    spatial:hasSpatialPredicatePairs (
         [ spatial:latitude :latitude_1 ; spatial:longitude :longitude_1 ]
         [ spatial:latitude :latitude_2 ; spatial:longitude :longitude_2 ]
         ) ;
    # custom geo predicates for 2) Well Known Text (WKT) Literal
    spatial:hasWKTPredicates (:wkt_1 :wkt_2 geosparql:asWKT) ;
    # custom SpatialContextFactory for 2) Well Known Text (WKT) Literal
    spatial:spatialContextFactory
         "com.spatial4j.core.context.jts.JtsSpatialContextFactory"
    .

-----Original Message-----
From: Andy Seaborne [mailto:andy@apache.org] 
Sent: Wednesday, February 26, 2014 7:35 AM
To: users@jena.apache.org
Subject: Re: Creating Spatial Lucene index from existing TDB data store

On 25/02/14 19:41, Heidi McClure wrote:
> I have an existing TDB data store that I would like to create a spatial index for.  Are there utilities or API's to do this?

jena.spatialindexer creates an index - but also when you load data into a spatial dataset, it gets indexed automatically when configured ...


> I have successfully followed the examples for reading in from .ttl files and creating the TDB and spatial index data at the same time.
>
> My TDB has GeoSPARQL nodes in it like:
>
> <http://issinc.com/events#event_901112240019> <http://www.opengis.net/ont/geosparql#asWKT> "POINT(1.7488388 40.05863)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>

You'll need to configure in the JTS library

http://jena.apache.org/documentation/query/spatial-query.html#supported-geo-data-for-indexing-and-querying

(and I'm going on the documentation here...)

You will need to configure the EntityDefinition to correspond to the data.

	Andy
>
> thanks,
> -heidi
>
>
>


Re: Fuskei: ordering in Construct

Posted by Chris Dollin <ch...@epimorphics.com>.
On Thursday, February 27, 2014 01:29:50 PM mark wrote

> Is there a way to control the ordering of output via CONSTRUCT so that
> it is handled in the same way as for SELECT?

CONSTRUCT builds a graph. The triples in a graph are not ordered.
When you render the graph to (eg) Turtle, the renderer can emit
the triples in any order it likes; it usually gets whatever's convenient
for the triple store the graph was built in.

Chris

-- 
"And what you've gained is hard to quantify"    - Mermaid Kiss, /Crayola Skies/

Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)


Re: Re: Fuskei: ordering in Construct

Posted by Chris Dollin <ch...@epimorphics.com>.
On Thursday, February 27, 2014 01:47:34 PM mark wrote:

> are there tools to help me with this within fuseki and jena, or am I
> going to have to write my own sorting implementation on the output
> stream?

If you're on a Unixy OS, sort has already been written and may do
what you need.

If you can write Java code, Collections.sort has already be written for
you, so long as you can write a RDFNode Comparator (and have
enough memory).

Chris

-- 
"What I don't understand is this ..."   Trevor Chaplin, /The Beiderbeck Affair/

Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)


Re: Fuskei: ordering in Construct

Posted by Martynas Jusevičius <ma...@graphity.org>.
Well my usual use case is pagination, with the results sorted per page
by some variable. So my query is usually like this:

DESCRIBE ?thing
{
  SELECT ?thing
  {
    ?thing dct:title ?title
  }
  ORDER BY ?title
  LIMIT 20
  OFFSET 20
}

In this case we retrieve the second page (assuming 20 per page) of
resources ordered by title. But since the ordering is lost in Model, I
sort again when I'm transforming RDF/XML to XHTML, smth like this:

  <xsl:template match="rdf:RDF">
    <div>
      <xsl:apply-templates> <!-- this will match rdf:Descriptions -->
        <xsl:sort select="dct:title"/>
      </xsl:apply-templates>
    </div>
  </xsl:tempate>

Notice <xsl:sort> with dct:title.

If this seems useful to you, check our Graphity Client on
http://graphity.org which has quite a few predefined XSLT templates
like this.


Martynas
graphityhq.com

On Thu, Feb 27, 2014 at 2:47 PM, mark <ma...@metarelate.net> wrote:
> Hello Martynas
>
> many thanks for the swift response.
>
> When you say:
>> You can do post-processing to sort them the way you want
>
> are there tools to help me with this within fuseki and jena, or am I
> going to have to write my own sorting implementation on the output
> stream?
>
> thank you
> mark
>
> On Thu, 27 Feb 2014 14:34:49 +0100
> Martynas Jusevičius <ma...@graphity.org> wrote:
>
>> Mark,
>>
>> if you are using DESCRIBE or CONSTRUCT queries then the result you get
>> is a Model. And a model is a set of statements which is not ordered.
>> You can do post-processing to sort them the way you want, e.g. when
>> you are presenting data to the user.
>>
>> Martynas
>>
>> On Thu, Feb 27, 2014 at 2:29 PM, mark <ma...@metarelate.net> wrote:
>> > Hello
>> >
>> > SPARQL and fuseki support results ordering using the 'order by'
>> > statements.
>> >
>> > Fuseki implement the ordering of results for SELECT queries
>> > in exactly the way I expect.
>> >
>> > However, if I put the same WHERE clause into a CONSTRUCT query
>> > (output="text") the results I get back are not ordered in a way I
>> > expect at all.
>> >
>> > The results appear to be consistently ordered for identical tdb
>> > content but any change to the tdb appears to lead to changes in
>> > ordering of the .ttl output.
>> >
>> > I do not see how this may be controlled or managed.  I tripped over
>> > this, it came as something of a surprise to me.
>> >
>> > Is there a way to control the ordering of output via CONSTRUCT so
>> > that it is handled in the same way as for SELECT?
>> >
>> > This would be a very useful feature for me.
>> >
>> > many thanks
>> >
>> > mark
>

Re: Fuskei: ordering in Construct

Posted by mark <ma...@metarelate.net>.
Hello Martynas

many thanks for the swift response.

When you say:
> You can do post-processing to sort them the way you want

are there tools to help me with this within fuseki and jena, or am I
going to have to write my own sorting implementation on the output
stream?

thank you
mark

On Thu, 27 Feb 2014 14:34:49 +0100
Martynas Jusevičius <ma...@graphity.org> wrote:

> Mark,
> 
> if you are using DESCRIBE or CONSTRUCT queries then the result you get
> is a Model. And a model is a set of statements which is not ordered.
> You can do post-processing to sort them the way you want, e.g. when
> you are presenting data to the user.
> 
> Martynas
> 
> On Thu, Feb 27, 2014 at 2:29 PM, mark <ma...@metarelate.net> wrote:
> > Hello
> >
> > SPARQL and fuseki support results ordering using the 'order by'
> > statements.
> >
> > Fuseki implement the ordering of results for SELECT queries
> > in exactly the way I expect.
> >
> > However, if I put the same WHERE clause into a CONSTRUCT query
> > (output="text") the results I get back are not ordered in a way I
> > expect at all.
> >
> > The results appear to be consistently ordered for identical tdb
> > content but any change to the tdb appears to lead to changes in
> > ordering of the .ttl output.
> >
> > I do not see how this may be controlled or managed.  I tripped over
> > this, it came as something of a surprise to me.
> >
> > Is there a way to control the ordering of output via CONSTRUCT so
> > that it is handled in the same way as for SELECT?
> >
> > This would be a very useful feature for me.
> >
> > many thanks
> >
> > mark


Re: Fuskei: ordering in Construct

Posted by Martynas Jusevičius <ma...@graphity.org>.
Mark,

if you are using DESCRIBE or CONSTRUCT queries then the result you get
is a Model. And a model is a set of statements which is not ordered.
You can do post-processing to sort them the way you want, e.g. when
you are presenting data to the user.

Martynas

On Thu, Feb 27, 2014 at 2:29 PM, mark <ma...@metarelate.net> wrote:
> Hello
>
> SPARQL and fuseki support results ordering using the 'order by'
> statements.
>
> Fuseki implement the ordering of results for SELECT queries
> in exactly the way I expect.
>
> However, if I put the same WHERE clause into a CONSTRUCT query
> (output="text") the results I get back are not ordered in a way I expect
> at all.
>
> The results appear to be consistently ordered for identical tdb content
> but any change to the tdb appears to lead to changes in ordering of the
> .ttl output.
>
> I do not see how this may be controlled or managed.  I tripped over
> this, it came as something of a surprise to me.
>
> Is there a way to control the ordering of output via CONSTRUCT so that
> it is handled in the same way as for SELECT?
>
> This would be a very useful feature for me.
>
> many thanks
>
> mark

Fuskei: ordering in Construct

Posted by mark <ma...@metarelate.net>.
Hello

SPARQL and fuseki support results ordering using the 'order by'
statements.

Fuseki implement the ordering of results for SELECT queries
in exactly the way I expect.

However, if I put the same WHERE clause into a CONSTRUCT query
(output="text") the results I get back are not ordered in a way I expect
at all.

The results appear to be consistently ordered for identical tdb content
but any change to the tdb appears to lead to changes in ordering of the
.ttl output.

I do not see how this may be controlled or managed.  I tripped over
this, it came as something of a surprise to me.

Is there a way to control the ordering of output via CONSTRUCT so that
it is handled in the same way as for SELECT?

This would be a very useful feature for me.

many thanks

mark

Re: Creating Spatial Lucene index from existing TDB data store

Posted by Andy Seaborne <an...@apache.org>.
On 25/02/14 19:41, Heidi McClure wrote:
> I have an existing TDB data store that I would like to create a spatial index for.  Are there utilities or API's to do this?

jena.spatialindexer creates an index - but also when you load data into 
a spatial dataset, it gets indexed automatically when configured ...


> I have successfully followed the examples for reading in from .ttl files and creating the TDB and spatial index data at the same time.
>
> My TDB has GeoSPARQL nodes in it like:
>
> <http://issinc.com/events#event_901112240019> <http://www.opengis.net/ont/geosparql#asWKT> "POINT(1.7488388 40.05863)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>

You'll need to configure in the JTS library

http://jena.apache.org/documentation/query/spatial-query.html#supported-geo-data-for-indexing-and-querying

(and I'm going on the documentation here...)

You will need to configure the EntityDefinition to correspond to the data.

	Andy
>
> thanks,
> -heidi
>
>
>