You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@marmotta.apache.org by Maxime Poitevineau-Millin <ma...@dipia.fr> on 2014/04/21 17:29:43 UTC

Issue With a SPARQL Query

Hi all,



We are currently implementing marmotta in our system but!



When trying this query :



SELECT * FROM <sesame:nil> {





                               ?kw rdf:type Keyword .

                               ?kw2 rdf:type Keyword .

                               ?kw2 ?rel ?kw .



}



on a database containing 60 000 keywords, marmotta is creating 60 000 *
60 000 queries similar to this one :

SELECT
id,subject,predicate,object,context,deleted,inferred,creator,createdAt,deletedAt
FROM triples WHERE deleted = false AND subject = '458172920399523841' AND
object = '458172732532453376' AND context = '458172711103754240'



Which obviously takes a long time.



This query takes 2 sec on Virtuoso or Owlim, is there any configuration
issue or something we should change?



Regards,

___



[image: dipia]

 Maxime Poitevineau-Millin - *M* : 06 33 17 64 28

* T : *03 80 40 33 46 - *F* : 04 84 25 03 63

Re: Issue With a SPARQL Query

Posted by Sebastian Schaffert <se...@gmail.com>.

Hi Maxime,

as mentioned in Jira: by default, the KiWi triple store uses the Sesame
in-memory implementation of SPARQL, i.e. in your case it will list all
triples three times and evaluate the join in-memory (i.e. 60.000^3). This
is obviously quite inefficient. If you want to directly translate SPARQL
queries to SQL, you have to use the kiwi-sparql module in addition to the
kiwi-triplestore and wrap the KiWiStore in a KiWiSparqlSail as follows:

KiWiSparqlSail sail = new KiWiSparqlSail(store);

This will add a native SPARQL implementation to KiWi, so that many queries
will be translated directly to SQL. As described in [1], this translation
is not a complete SPARQL-SQL translation. Only those parts will be
translated that are efficient to evaluate in a relational database and the
current KiWi data model. What you can expect is that any triple pattern
(without OPTIONAL) and most FILTER conditions are directly translated into
SQL. Please note, however, that so far native SPARQL support followed
"correctness over performance", so many advanced constructs are not
optimized at the moment:

- projection parts are never optimized; after the WHERE part has been
evaluated, the internal result is a collection of node database IDs, which
are then resolved in a separate query step (simple primary key based
queries)
- as a consequence, aggregation constructs are not optimized, so a
count(...) could take much longer than expected
- OPTIONAL is not optimized, because its semantics are a bit different to a
normal SQL LEFT JOIN (we are still working on that); however, all patterns
outside the OPTIONAL will still be optimized, so if you have good filter
criteria outside it should not be a problem
- SPARQL 1.1 path queries are not optimized, because their expressiveness
exceeds the expressiveness of SQL
- DISTINCT, ORDER BY, GROUP BY are not optimized, because they work on the
projection results; DISTINCT in particular should be avoided if not
necessary (but this also holds for other systems)

In case of your query, the whole WHERE part will be translated into a
single database JOIN, so it should be as efficient as it can get. The
variable projection part will then, however, result in additional queries
to the database, so the more results you expect the longer it will take. A
very good way to dramatically improve performance is to add a LIMIT to the
SPARQL query.

That said, we are constantly working on improving SPARQL support. In
particular, one of the next things to come is probably support for
OPTIONAL, and then for ORDER BY and GROUP BY.

Greetings,

Sebastian

[1] http://marmotta.apache.org/kiwi/sparql.html


2014-04-21 17:29 GMT+02:00 Maxime Poitevineau-Millin <
maxime.poitevineau@dipia.fr>:

> Hi all,
>
>
>
> We are currently implementing marmotta in our system but!
>
>
>
> When trying this query :
>
>
>
> SELECT * FROM <sesame:nil> {
>
>
>
>
>
>                                ?kw rdf:type Keyword .
>
>                                ?kw2 rdf:type Keyword .
>
>                                ?kw2 ?rel ?kw .
>
>
>
> }
>
>
>
> on a database containing 60 000 keywords, marmotta is creating 60 000 *
> 60 000 queries similar to this one :
>
> SELECT
> id,subject,predicate,object,context,deleted,inferred,creator,createdAt,deletedAt
> FROM triples WHERE deleted = false AND subject = '458172920399523841' AND
> object = '458172732532453376' AND context = '458172711103754240'
>
>
>
> Which obviously takes a long time.
>
>
>
> This query takes 2 sec on Virtuoso or Owlim, is there any configuration
> issue or something we should change?
>
>
>
> Regards,
>
> ___
>
>
>
> [image: dipia]
>
>  Maxime Poitevineau-Millin - *M* : 06 33 17 64 28
>
> * T : *03 80 40 33 46 - *F* : 04 84 25 03 63
>
>
>