You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Greg Albiston <an...@apache.org> on 2018/06/19 09:59:49 UTC
CMS diff: TDB Datasets
Clone URL (Committers only):
https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Ftdb%2Fdatasets.mdtext
Greg Albiston
Index: trunk/content/documentation/tdb/datasets.mdtext
===================================================================
--- trunk/content/documentation/tdb/datasets.mdtext (revision 1833775)
+++ trunk/content/documentation/tdb/datasets.mdtext (working copy)
@@ -51,6 +51,51 @@
...
}
+### Named Graphs & Filters
+
+Named graphs provide a convenient way to organise and store your data.
+However, be aware that in certain situations named graphs can make it difficult for the query optimiser.
+
+For example, a query with the following structure took 29 minutes to complete:
+
+ SELECT ?b ...
+ WHERE {
+
+ GRAPH dataset:BigA {
+ ?a rdf:type my:AThing.
+ ?a noa:hasGeometry ?aData.
+ ...
+ }
+
+ GRAPH dataset:SmallB {
+ ?b rdf:type my:BThing.
+ ?b my:hasData ?bData.
+ ...
+ }
+
+ FILTER(my:filterFunction1(?bData, ?aData))
+ FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral) )
+
+ }
+
+The completion duration was reduced to 7 seconds by applying the global TDB.symUnionDefaultGraph option (see above) to the dataset and modifying the query as follows:
+
+ SELECT ?b ...
+ WHERE {
+
+ ?a rdf:type my:AThing.
+ ?a noa:hasGeometry ?aData.
+ ...
+
+ ?b rdf:type my:BThing.
+ ?b my:hasData ?bData.
+ ...
+
+ FILTER(my:filterFunction1(?bData, ?aData))
+ FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral) )
+
+ }
+
## Special Graph Names
URI | Meaning
RE: CMS diff: TDB Datasets
Posted by Greg Albiston <gr...@hotmail.com>.
Hi Andy,
Thanks for the response. Your suggestion worked and the query completed in a similar time to the union graph approach.
I'd tried moving the filter into the graph clause but not swapping the graph order.
I added that update on the documentation so if anyone else was having similar problems it might help.
Do you still want me to create a JIRA for it?
More generally, is there a page/section for tips on query writing to help optimisation?
I searched but could only find description of TDB's optimisation functionality and extending query execution. I spent quite a while hunting for tips and trying different ways to influence the resolution order until I thought I'd try the union graph.
Thanks,
Greg
-----Original Message-----
From: Andy Seaborne <an...@apache.org>
Sent: 19 June 2018 13:56
To: dev@jena.apache.org; Greg Albiston <gr...@hotmail.com>
Subject: Re: CMS diff: TDB Datasets
Greg,
Could you create a JIRA ticket for this please? It is something that looks addressable. The solution proposed (using union graph) is a bit specialised.
Andy
The query may be better if written (but the "..." may be making a
difference.)
GRAPH dataset:SmallB {
?b rdf:type my:BThing.
?b my:hasData ?bData.
FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral)) }
GRAPH dataset:BigA {
?a rdf:type my:AThing.
?a noa:hasGeometry ?aData.
}
FILTER(my:filterFunction1(?bData, ?aData))
On 19/06/18 10:59, Greg Albiston wrote:
> Clone URL (Committers only):
> https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://j
> ena.apache.org/documentation%2Ftdb%2Fdatasets.mdtext
>
> Greg Albiston
>
> Index: trunk/content/documentation/tdb/datasets.mdtext
> ===================================================================
> --- trunk/content/documentation/tdb/datasets.mdtext (revision 1833775)
> +++ trunk/content/documentation/tdb/datasets.mdtext (working copy)
> @@ -51,6 +51,51 @@
> ...
> }
>
> +### Named Graphs & Filters
> +
> +Named graphs provide a convenient way to organise and store your data.
> +However, be aware that in certain situations named graphs can make it difficult for the query optimiser.
> +
> +For example, a query with the following structure took 29 minutes to complete:
> +
> + SELECT ?b ...
> + WHERE {
> +
> + GRAPH dataset:BigA {
> + ?a rdf:type my:AThing.
> + ?a noa:hasGeometry ?aData.
> + ...
> + }
> +
> + GRAPH dataset:SmallB {
> + ?b rdf:type my:BThing.
> + ?b my:hasData ?bData.
> + ...
> + }
> +
> + FILTER(my:filterFunction1(?bData, ?aData))
> + FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0
> + 2.0"^^my:dataLiteral) )
> +
> + }
> +
> +The completion duration was reduced to 7 seconds by applying the global TDB.symUnionDefaultGraph option (see above) to the dataset and modifying the query as follows:
> +
> + SELECT ?b ...
> + WHERE {
> +
> + ?a rdf:type my:AThing.
> + ?a noa:hasGeometry ?aData.
> + ...
> +
> + ?b rdf:type my:BThing.
> + ?b my:hasData ?bData.
> + ...
> +
> + FILTER(my:filterFunction1(?bData, ?aData))
> + FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0
> + 2.0"^^my:dataLiteral) )
> +
> + }
> +
> ## Special Graph Names
>
> URI | Meaning
>
Re: CMS diff: TDB Datasets
Posted by Andy Seaborne <an...@apache.org>.
Greg,
Could you create a JIRA ticket for this please? It is something that
looks addressable. The solution proposed (using union graph) is a bit
specialised.
Andy
The query may be better if written (but the "..." may be making a
difference.)
GRAPH dataset:SmallB {
?b rdf:type my:BThing.
?b my:hasData ?bData.
FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral))
}
GRAPH dataset:BigA {
?a rdf:type my:AThing.
?a noa:hasGeometry ?aData.
}
FILTER(my:filterFunction1(?bData, ?aData))
On 19/06/18 10:59, Greg Albiston wrote:
> Clone URL (Committers only):
> https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Ftdb%2Fdatasets.mdtext
>
> Greg Albiston
>
> Index: trunk/content/documentation/tdb/datasets.mdtext
> ===================================================================
> --- trunk/content/documentation/tdb/datasets.mdtext (revision 1833775)
> +++ trunk/content/documentation/tdb/datasets.mdtext (working copy)
> @@ -51,6 +51,51 @@
> ...
> }
>
> +### Named Graphs & Filters
> +
> +Named graphs provide a convenient way to organise and store your data.
> +However, be aware that in certain situations named graphs can make it difficult for the query optimiser.
> +
> +For example, a query with the following structure took 29 minutes to complete:
> +
> + SELECT ?b ...
> + WHERE {
> +
> + GRAPH dataset:BigA {
> + ?a rdf:type my:AThing.
> + ?a noa:hasGeometry ?aData.
> + ...
> + }
> +
> + GRAPH dataset:SmallB {
> + ?b rdf:type my:BThing.
> + ?b my:hasData ?bData.
> + ...
> + }
> +
> + FILTER(my:filterFunction1(?bData, ?aData))
> + FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral) )
> +
> + }
> +
> +The completion duration was reduced to 7 seconds by applying the global TDB.symUnionDefaultGraph option (see above) to the dataset and modifying the query as follows:
> +
> + SELECT ?b ...
> + WHERE {
> +
> + ?a rdf:type my:AThing.
> + ?a noa:hasGeometry ?aData.
> + ...
> +
> + ?b rdf:type my:BThing.
> + ?b my:hasData ?bData.
> + ...
> +
> + FILTER(my:filterFunction1(?bData, ?aData))
> + FILTER(my:filterFunction2(?bData, "1.0 3.0, 4.0 2.0"^^my:dataLiteral) )
> +
> + }
> +
> ## Special Graph Names
>
> URI | Meaning
>