You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Andy Seaborne <an...@epimorphics.com> on 2013/04/15 15:51:34 UTC

GSoC: Cache tables for SPARQL queries

== Property Tables

Property tables are a technique for speeding queries up by additional 
ways of access the data other than the triple table.

They can be used for:

+ data that is reasonably regular
+ caching partial query evaluations ahead of time
+ efficient inference for subclass/subproperty relationships.

A property table is a table where there is a column denoting a variable
in part of a SPARQL pattern.  It may be the subject and one or more 
columns for properties of that subject but theer are oither possibilities.

A row in a property table matches a SPARQL basic graph pattern.

Example:

Suppose a dataset includes information about people, and that each 
person always has first name, last name and formal address form:

A property table might be:

subject URI                 first    last       Formal
                              name     name       name

(http://example/person#afs,  "Fred", "Smith",  "Frederick Smith")

and matches the the SPARQL patttern

{ ?person foaf:familyName ?fName ;
           foaf:givenName  ?gName ;
           ex:formalName   ?formal
}

but it can also be used to efficiently answer both partial occurrences 
of that patterns and ones where some terms are fixed:

   [] foaf:familyName ?fName ;
      foaf:givenName  ?gName ;
      ex:formalName   "Frederick Smith" .

This is a simple example of only 3 properties.  In the real world, one 
resources may have 10's of properties so reducing the number of 
databases accesses may be significant and improve caching.

The basic pattern matched doesn't have to be "same subject" - it might 
be a complex query: such as:

     SELECT (count(*) AS ?c) { ?s ?p ?o } GROUP BY ?s

with a table of (?s, ?c)

"property table" is just conventional name for this approach because the 
first systems here were just basic graph patterns for RDQL.

A query compiler could spot parts of a query pattern to access a 
precomputed additional table instead of accessing the conventional 
triple table many times.

This project would apply this idea to Jena TDB.

It involves:
1/ spotting a query patterns
2/ building the property table
3/ maintain the table as data changes

A focus on primarily read-only data for publication means that (3) can 
be a process that runs in the background at regular intervals.