You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by "Dr. André Lanka" <ma...@dr-lanka.de> on 2013/09/09 20:38:05 UTC

Shared composition of TDB datasets

Hello Jena community,

I'm looking for a possibility to SPARQL two different TDB backed
datasets within one query. The background is that I have a few TDB
stores (say A and B) and one special TDB store S containing a few
triples that I want to "share" with the other stores.
For instance I want to do something like this:

SELECT ?o WHERE { GRAPH <S> { ?s ?p ?o. } GRAPH <A> { ?s1 ?p1 ?o.} }

In addition, S needs to be writeable and the changes need to be promptly
visible for other queries. Even if I change A to B in the above example.

I know about named graphs but unfortunately they are no option in our
scenario. Could I do it by using assemblers? Or is there any other
possibility?

We use the current versions of Jena/ARQ/TDB in pure Java.

Thanks in advance
André

-- 
Dr. André Lanka  *  0178 / 134 44 47  *  http://dr-lanka.de

Re: Shared composition of TDB datasets

Posted by Andy Seaborne <an...@apache.org>.

On 10/09/13 17:57, "Dr. André Lanka" wrote:
> Hi Andy,
>
> thank you very much. I saw the example earlier but missed that it
> perfectly matches my needs. *facepalm*
>
> In general the approach works really fine. Yet, TDBGraphAssembler seems
> to ignore the transaction framework. All is ok when I use the assembler
> at the beginning of the code. But when I used the original datasets
> transactional before, I get a TDBTransactionException (see below for
> more details)
>
> The call correctly reuses the underlying and already created dataset
> graph (what is really great for us) but there seems to be no possibility
> to force a transaction start.
>
> Is this intended for some reason?

When you select individual graphs, there is no transaction control. 
Transactions are per-dataset (the journal is per database). 
Cross-database transactions would need either distribute transactions 
(e.g. e.g. XA transactions) or multi-dataset logging.

Having all graphs in one DB is more usual.

You should be able to operate directly on the dataset the graph is in 
transactionally, with care, if you ensure the system is MRSW (Multiple 
reader OR Single Writer).

	Andy


>
> André
>
> Code that fails:
>
> ds1 = TDBFactory.createDataset(new Location("/var/tmp/namedTest1"));
> ds1.begin(ReadWrite.READ);
> ds1.end();
>
> build = (Dataset) AssemblerUtils.build("/var/tmp/tdbNew.ttl",
> "http://jena.hpl.hp.com/2005/11/Assembler#RDFDataset");
>
> StackTrace:
>
> ...
> Caused by: com.hp.hpl.jena.tdb.transaction.TDBTransactionException: Not
> in a transaction
> 	at
> com.hp.hpl.jena.tdb.transaction.DatasetGraphTransaction.get(DatasetGraphTransaction.java:104)
> 	at
> com.hp.hpl.jena.tdb.transaction.DatasetGraphTransaction.get(DatasetGraphTransaction.java:1)
> 	at
> com.hp.hpl.jena.sparql.core.DatasetGraphTrackActive.getDefaultGraph(DatasetGraphTrackActive.java:99)
> 	at
> com.hp.hpl.jena.sparql.core.DatasetImpl.getDefaultModel(DatasetImpl.java:103)
> 	at
> com.hp.hpl.jena.tdb.assembler.TDBGraphAssembler.open(TDBGraphAssembler.java:95)
> 	at
> com.hp.hpl.jena.tdb.assembler.TDBGraphAssembler.open(TDBGraphAssembler.java:1)
> 	at
> com.hp.hpl.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.openBySpecificType(AssemblerGroup.java:130)
> 	... 39 more
>
> Turtle Definition:
>
> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
>
> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
> tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
> tdb:GraphTDB    rdfs:subClassOf  ja:Model .
>
> # A dataset of one TDB-backed graph as the default graph and
> # an in-memory graph as a named graph.
> <#dataset> rdf:type      ja:RDFDataset ;
>          ja:defaultGraph <#graph> ;
>          ja:namedGraph
>          [ ja:graphName      <http://example.org/name1> ;
>          ja:graph          <#graph2> ] ;
> .
>
> <#graph>        rdf:type tdb:GraphTDB ;
>                  tdb:location "/var/tmp/namedTest1" ;
> .
> <#graph2>       rdf:type tdb:GraphTDB ;
>                  tdb:location "/var/tmp/namedTest2" ;
>
>
>
> On 10.09.2013 12:30, Andy Seaborne wrote:
>>> Hello Jena community,
>>>
>>> I'm looking for a possibility to SPARQL two different TDB backed
>>> datasets within one query. The background is that I have a few TDB
>>> stores (say A and B) and one special TDB store S containing a few
>>> triples that I want to "share" with the other stores.
>>> For instance I want to do something like this:
>>>
>>> SELECT ?o WHERE { GRAPH <S> { ?s ?p ?o. } GRAPH <A> { ?s1 ?p1 ?o.} }
>>>
>>> In addition, S needs to be writeable and the changes need to be promptly
>>> visible for other queries. Even if I change A to B in the above example.
>>>
>>> I know about named graphs but unfortunately they are no option in our
>>> scenario. Could I do it by using assemblers? Or is there any other
>>> possibility?
>>>
>>> We use the current versions of Jena/ARQ/TDB in pure Java.
>>
>> You can create a single dataset with graphs from different storages, the
>> your queries will look exactly as in your example.
>>
>> This can be done in an assembler or in code.
>>
>> Example:
>>
>> http://jena.apache.org/documentation/tdb/assembler.html#mixed-datasets
>>
>>      Andy
>>
>

Re: Shared composition of TDB datasets

Posted by "Dr. André Lanka" <ma...@dr-lanka.de>.

Hi Andy,

thank you very much. I saw the example earlier but missed that it
perfectly matches my needs. *facepalm*

In general the approach works really fine. Yet, TDBGraphAssembler seems
to ignore the transaction framework. All is ok when I use the assembler
at the beginning of the code. But when I used the original datasets
transactional before, I get a TDBTransactionException (see below for
more details)

The call correctly reuses the underlying and already created dataset
graph (what is really great for us) but there seems to be no possibility
to force a transaction start.

Is this intended for some reason?

André

Code that fails:

ds1 = TDBFactory.createDataset(new Location("/var/tmp/namedTest1"));
ds1.begin(ReadWrite.READ);
ds1.end();

build = (Dataset) AssemblerUtils.build("/var/tmp/tdbNew.ttl",
"http://jena.hpl.hp.com/2005/11/Assembler#RDFDataset");

StackTrace:

...
Caused by: com.hp.hpl.jena.tdb.transaction.TDBTransactionException: Not
in a transaction
	at
com.hp.hpl.jena.tdb.transaction.DatasetGraphTransaction.get(DatasetGraphTransaction.java:104)
	at
com.hp.hpl.jena.tdb.transaction.DatasetGraphTransaction.get(DatasetGraphTransaction.java:1)
	at
com.hp.hpl.jena.sparql.core.DatasetGraphTrackActive.getDefaultGraph(DatasetGraphTrackActive.java:99)
	at
com.hp.hpl.jena.sparql.core.DatasetImpl.getDefaultModel(DatasetImpl.java:103)
	at
com.hp.hpl.jena.tdb.assembler.TDBGraphAssembler.open(TDBGraphAssembler.java:95)
	at
com.hp.hpl.jena.tdb.assembler.TDBGraphAssembler.open(TDBGraphAssembler.java:1)
	at
com.hp.hpl.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.openBySpecificType(AssemblerGroup.java:130)
	... 39 more

Turtle Definition:

@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .

[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

# A dataset of one TDB-backed graph as the default graph and
# an in-memory graph as a named graph.
<#dataset> rdf:type      ja:RDFDataset ;
        ja:defaultGraph <#graph> ;
        ja:namedGraph
        [ ja:graphName      <http://example.org/name1> ;
        ja:graph          <#graph2> ] ;
.

<#graph>        rdf:type tdb:GraphTDB ;
                tdb:location "/var/tmp/namedTest1" ;
.
<#graph2>       rdf:type tdb:GraphTDB ;
                tdb:location "/var/tmp/namedTest2" ;



On 10.09.2013 12:30, Andy Seaborne wrote:
>> Hello Jena community,
>>
>> I'm looking for a possibility to SPARQL two different TDB backed
>> datasets within one query. The background is that I have a few TDB
>> stores (say A and B) and one special TDB store S containing a few
>> triples that I want to "share" with the other stores.
>> For instance I want to do something like this:
>>
>> SELECT ?o WHERE { GRAPH <S> { ?s ?p ?o. } GRAPH <A> { ?s1 ?p1 ?o.} }
>>
>> In addition, S needs to be writeable and the changes need to be promptly
>> visible for other queries. Even if I change A to B in the above example.
>>
>> I know about named graphs but unfortunately they are no option in our
>> scenario. Could I do it by using assemblers? Or is there any other
>> possibility?
>>
>> We use the current versions of Jena/ARQ/TDB in pure Java.
> 
> You can create a single dataset with graphs from different storages, the
> your queries will look exactly as in your example.
> 
> This can be done in an assembler or in code.
> 
> Example:
> 
> http://jena.apache.org/documentation/tdb/assembler.html#mixed-datasets
> 
>     Andy
> 

-- 
Dr. André Lanka  *  0178 / 134 44 47  *  http://dr-lanka.de

Re: Shared composition of TDB datasets

Posted by Andy Seaborne <an...@apache.org>.

> Hello Jena community,
>
> I'm looking for a possibility to SPARQL two different TDB backed
> datasets within one query. The background is that I have a few TDB
> stores (say A and B) and one special TDB store S containing a few
> triples that I want to "share" with the other stores.
> For instance I want to do something like this:
>
> SELECT ?o WHERE { GRAPH <S> { ?s ?p ?o. } GRAPH <A> { ?s1 ?p1 ?o.} }
>
> In addition, S needs to be writeable and the changes need to be promptly
> visible for other queries. Even if I change A to B in the above example.
>
> I know about named graphs but unfortunately they are no option in our
> scenario. Could I do it by using assemblers? Or is there any other
> possibility?
>
> We use the current versions of Jena/ARQ/TDB in pure Java.

You can create a single dataset with graphs from different storages, the 
your queries will look exactly as in your example.

This can be done in an assembler or in code.

Example:

http://jena.apache.org/documentation/tdb/assembler.html#mixed-datasets

	Andy

Re: Shared composition of TDB datasets

Posted by "Dr. André Lanka" <ma...@dr-lanka.de>.

Hello Arthur,

my example is a very little instance. We host thousands of TDB stores
like A, B and S on each machine. We access them directly and usually
operate on models created by dataSet.getDefaultModel(). Doing server
calls (even on localhost) is ineligible for performance reasons. By
this, your solution (calling a web server each time) unfortunately won't
work :(

Is there any other idea how we could bundle some TDB stores together to
run SPARQLS against it?

Any idea is appreciated :)
André

On 10.09.2013 08:55, Arthur Vaïsse-Lesteven wrote:
> I assume that each of your multiple TDB has his own local endpoint. If yes you can use Service clauses by writing something like [1]. The triples stocked in S will be shared by all your TDB, and update will be immediately applied for all.
> 
> If not, I cannot help you.
> 
> 
> VAISSE-LESTEVEN Arthur.
> 
> 
>> SELECT ?o
>> WHERE {
>>      ?s1 ?p1 ?o
>>      SERVICE <localhost:3030/dataset_TDB_S/query> { ?s ?p ?o }
>> }
> 
> 
> ________________________________
>  De : Dr. André Lanka <ma...@dr-lanka.de>
> À : users@jena.apache.org 
> Envoyé le : Lundi 9 septembre 2013 22h06
> Objet : Re: Shared composition of TDB datasets
>  
> 
> Hello Arthur,
> 
> thanks for your hint. How would the endpoint address looks like? I
> assume it's not the file location on disk or is it? If it's a remote
> address it won't work for us. Setting up a separate server to access S
> is impossible for us. We need a "local" share, perhaps something like a
> dataset. We don't need to share S with others but we do need to share
> the triples stored in S internally with other TDB stores. By this a
> solution that works within our JVM is highly appreciated.
> 
> André
> 
> On 09.09.2013 21:46, Arthur Vaïsse-Lesteven wrote:
>> You may use federated query to query and modify the triple in the shared TDB. Your query should look like :
>>
>> (Assuming he request is executed against TDB A)
>> SELECT ?o
>> WHERE {
>>      ?s1 ?p1 ?o
>>      SERVICE <endpoint adress of TDB S> { ?s ?p ?o }
>> }
>>
>>
>> I hope this will help you.
>>
>> Arthur
>>
>>
>> ________________________________
>>   De : ""Dr. André Lanka"" <ma...@dr-lanka.de>
>> À : users@jena.apache.org 
>> Envoyé le : Lundi 9 septembre 2013 20h38
>> Objet : Shared composition of TDB datasets
>>   
>>
>> Hello Jena community,
>>
>> I'm looking for a possibility to SPARQL two different TDB backed
>> datasets within one query. The background is that I have a few TDB
>> stores (say A and B) and one special TDB store S containing a few
>> triples that I want to "share" with the other stores.
>> For instance I want to do something like this:
>>
>> SELECT ?o WHERE { GRAPH <S> { ?s ?p ?o. } GRAPH <A> { ?s1 ?p1 ?o.} }
>>
>> In addition, S needs to be writeable and the changes need to be promptly
>> visible for other queries. Even if I change A to B in the above example.
>>
>> I know about named graphs but unfortunately they are no option in our
>> scenario. Could I do it by using assemblers? Or is there any other
>> possibility?
>>
>> We use the current versions of Jena/ARQ/TDB in pure Java.
>>
>> Thanks in advance
>> André
>>
> 

-- 
Dr. André Lanka  *  0178 / 134 44 47  *  http://dr-lanka.de

Re: Shared composition of TDB datasets

Posted by Arthur Vaïsse-Lesteven <ar...@yahoo.fr>.

I assume that each of your multiple TDB has his own local endpoint. If yes you can use Service clauses by writing something like [1]. The triples stocked in S will be shared by all your TDB, and update will be immediately applied for all.

If not, I cannot help you.

VAISSE-LESTEVEN Arthur.

> SELECT ?o
> WHERE {
>     ?s1 ?p1 ?o
>     SERVICE <localhost:3030/dataset_TDB_S/query> { ?s ?p ?o }
> }

________________________________
 De : Dr. André Lanka <ma...@dr-lanka.de>
À : users@jena.apache.org 
Envoyé le : Lundi 9 septembre 2013 22h06
Objet : Re: Shared composition of TDB datasets

Hello Arthur,

thanks for your hint. How would the endpoint address looks like? I
assume it's not the file location on disk or is it? If it's a remote
address it won't work for us. Setting up a separate server to access S
is impossible for us. We need a "local" share, perhaps something like a
dataset. We don't need to share S with others but we do need to share
the triples stored in S internally with other TDB stores. By this a
solution that works within our JVM is highly appreciated.

André

On 09.09.2013 21:46, Arthur Vaïsse-Lesteven wrote:
> You may use federated query to query and modify the triple in the shared TDB. Your query should look like :
> 
> (Assuming he request is executed against TDB A)
> SELECT ?o
> WHERE {
>     ?s1 ?p1 ?o
>     SERVICE <endpoint adress of TDB S> { ?s ?p ?o }
> }
> 
> 
> I hope this will help you.
> 
> Arthur
> 
> 
> ________________________________
>  De : ""Dr. André Lanka"" <ma...@dr-lanka.de>
> À : users@jena.apache.org 
> Envoyé le : Lundi 9 septembre 2013 20h38
> Objet : Shared composition of TDB datasets
>  
> 
> Hello Jena community,
> 
> I'm looking for a possibility to SPARQL two different TDB backed
> datasets within one query. The background is that I have a few TDB
> stores (say A and B) and one special TDB store S containing a few
> triples that I want to "share" with the other stores.
> For instance I want to do something like this:
> 
> SELECT ?o WHERE { GRAPH <S> { ?s ?p ?o. } GRAPH <A> { ?s1 ?p1 ?o.} }
> 
> In addition, S needs to be writeable and the changes need to be promptly
> visible for other queries. Even if I change A to B in the above example.
> 
> I know about named graphs but unfortunately they are no option in our
> scenario. Could I do it by using assemblers? Or is there any other
> possibility?
> 
> We use the current versions of Jena/ARQ/TDB in pure Java.
> 
> Thanks in advance
> André
> 

-- 
Dr. André Lanka  *  0178 / 134 44 47  *  http://dr-lanka.de

Re: Shared composition of TDB datasets

Posted by "Dr. André Lanka" <ma...@dr-lanka.de>.

Hello Arthur,

thanks for your hint. How would the endpoint address looks like? I
assume it's not the file location on disk or is it? If it's a remote
address it won't work for us. Setting up a separate server to access S
is impossible for us. We need a "local" share, perhaps something like a
dataset. We don't need to share S with others but we do need to share
the triples stored in S internally with other TDB stores. By this a
solution that works within our JVM is highly appreciated.

André

On 09.09.2013 21:46, Arthur Vaïsse-Lesteven wrote:
> You may use federated query to query and modify the triple in the shared TDB. Your query should look like :
> 
> (Assuming he request is executed against TDB A)
> SELECT ?o
> WHERE {
>     ?s1 ?p1 ?o
>     SERVICE <endpoint adress of TDB S> { ?s ?p ?o }
> }
> 
> 
> I hope this will help you.
> 
> Arthur
> 
> 
> ________________________________
>  De : ""Dr. André Lanka"" <ma...@dr-lanka.de>
> À : users@jena.apache.org 
> Envoyé le : Lundi 9 septembre 2013 20h38
> Objet : Shared composition of TDB datasets
>  
> 
> Hello Jena community,
> 
> I'm looking for a possibility to SPARQL two different TDB backed
> datasets within one query. The background is that I have a few TDB
> stores (say A and B) and one special TDB store S containing a few
> triples that I want to "share" with the other stores.
> For instance I want to do something like this:
> 
> SELECT ?o WHERE { GRAPH <S> { ?s ?p ?o. } GRAPH <A> { ?s1 ?p1 ?o.} }
> 
> In addition, S needs to be writeable and the changes need to be promptly
> visible for other queries. Even if I change A to B in the above example.
> 
> I know about named graphs but unfortunately they are no option in our
> scenario. Could I do it by using assemblers? Or is there any other
> possibility?
> 
> We use the current versions of Jena/ARQ/TDB in pure Java.
> 
> Thanks in advance
> André
> 

-- 
Dr. André Lanka  *  0178 / 134 44 47  *  http://dr-lanka.de

Re: Shared composition of TDB datasets

Posted by Arthur Vaïsse-Lesteven <ar...@yahoo.fr>.

You may use federated query to query and modify the triple in the shared TDB. Your query should look like :

(Assuming he request is executed against TDB A)
SELECT ?o
WHERE {
    ?s1 ?p1 ?o
    SERVICE <endpoint adress of TDB S> { ?s ?p ?o }
}


I hope this will help you.

Arthur


________________________________
 De : ""Dr. André Lanka"" <ma...@dr-lanka.de>
À : users@jena.apache.org 
Envoyé le : Lundi 9 septembre 2013 20h38
Objet : Shared composition of TDB datasets
 

Hello Jena community,

I'm looking for a possibility to SPARQL two different TDB backed
datasets within one query. The background is that I have a few TDB
stores (say A and B) and one special TDB store S containing a few
triples that I want to "share" with the other stores.
For instance I want to do something like this:

SELECT ?o WHERE { GRAPH <S> { ?s ?p ?o. } GRAPH <A> { ?s1 ?p1 ?o.} }

In addition, S needs to be writeable and the changes need to be promptly
visible for other queries. Even if I change A to B in the above example.

I know about named graphs but unfortunately they are no option in our
scenario. Could I do it by using assemblers? Or is there any other
possibility?

We use the current versions of Jena/ARQ/TDB in pure Java.

Thanks in advance
André

-- 
Dr. André Lanka  *  0178 / 134 44 47  *  http://dr-lanka.de