You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Yarc Yarc <mo...@gmail.com> on 2015/05/05 21:58:22 UTC

possibly a problem with fuseki caching

Hello,

I'm trying to use fuseki for querying live data managed by a web
application. This generally works, but I seem to have a caching problem
with my fuseki. This is the setup;

a. I'm running the fuseki.war over glassfish 4 configured with a dataset
fethced from a TDB file system folder for persistence. That works well it
seems. Data is read from the TDB folder and I can query over port 3030.

b. I have another .war application running in the same glassfish instance.
This .war uses the same TDB folder for persistence. The application queries
and updates the TDB folder. The data is persistent in that if I undeploy
the application and redeploy the data is still there.

Now, if the application in b. inserts new data, then this is not reflected
in the fuseki application (a). If I, however, undeploy and redeploy
fuseki.war then the new data from b. is there.

My initial hypothesis is that fuseki uses a cache at some level, so that if
the underlying TDB file store changes, it is not picked up. Is this a
reasonable hypothesis?

If so, is there some way I might circumvent this behaviour - i.e. can I
enforce a TDB sync before each query in Fuseki somehow?

If not, then I'm probably confused and I'm asking for a friendly hint on
where I go wrong.

I have read the caching and synchronization section in the documentation (
https://jena.apache.org/documentation/tdb/java_api.html), generally googled
this issue (e.g. "fuseki cache" or "share TDB store") and I'm aware of the
single-jvm constraints on TDB (
https://jena.apache.org/documentation/tdb/tdb_transactions.html) as well as
a suggested pattern of letting all other clients of a TDB store go through
fuseki if I understand it correctly (
http://answers.semanticweb.com/questions/28629/sharing-jena-tdb-data-store).


Any and all help is appreciated.

:-)

M.

------ a few excerpts on how b. relates to TDB, commits and the dataset
-----

I create my dataset like this:

dataset = TDBFactory.createDataset(datasetdir);

Before each sparql insert I do

dataset.begin(ReadWrite.WRITE);

After each sparql insert I do

try {
dataset.commit();
} catch {
...} finally {
dataset.end();
}

Re: possibly a problem with fuseki caching

Posted by Yarc Yarc <mo...@gmail.com>.

Thanks for answers!

> Can your updating .war send
requests to Fuseki?

Yes it could.

> The way to update a database is through the web facing side with SPARQL
Update or the SPARQL Graph Store protocol.

You're here referring to the things docuemented in
http://jena.apache.org/documentation/serving_data/ ?

I'm a bit hesitant to employ an http interface. From experience I'm
sceptical to http as an efficient and transactional interface between two
middleware components on the same server instance. I will not have a
problem with concurrency + transactional boundaries and isolation with such
a setup? (e.g. is the one-write-multiple-read semantics of TDB preserved
over this http interface?)

> What is your system setup?  32 or 64 bit java?

My target is a windows server 2012 r2 64-bit. Does that help?

:-)

M.

On Wed, May 6, 2015 at 7:45 PM, Andy Seaborne <an...@apache.org> wrote:

> PS What is your system setup?  32 or 64 bit java?  (caching works
> differently btween those two).  It won't work in either case bu tfor
> different reasons (the index caches are probably shared via memory mapped
> fiels on 64 bit - not on 32 bit - there is also a large in-JVM, not shared,
> cache for the node table)
>
>         Andy
>
> On 06/05/15 11:09, Andy Seaborne wrote:
>
>> Hi Morten,
>>
>> The way TDB works, only one application can have the database files at
>> he same time.  I suspect that the checking TDB performs (you get an
>> error if multiple access is deteched) is being defeated by running two
>> webapps in the same, but partition, glassfish instance.  The test, IIRC,
>> is "same process id", which it is for two instances in glassfish.
>>
>> TDB uses a number of statics to manage shared access to the database.
>> TDBFactory returns the same database for a location every time.  But if
>> a class is loaded twice by different classloaders, there are different
>> statics.
>>
>> The way to update a database is through the web facing side with SPARQL
>> Update or the SPARQL Graph Store protocol.  Can your updating .war send
>> requests to Fuseki?
>>
>> A more complicated setup would be to have one .war to do all the TDB
>> access and have a special Fuseki service that knows to send requests to
>> that 3rd DB-manager .war.  That would take new code.
>>
>>      Andy
>>
>> On 05/05/15 20:58, Yarc Yarc wrote:
>>
>>> Hello,
>>>
>>> I'm trying to use fuseki for querying live data managed by a web
>>> application. This generally works, but I seem to have a caching problem
>>> with my fuseki. This is the setup;
>>>
>>> a. I'm running the fuseki.war over glassfish 4 configured with a dataset
>>> fethced from a TDB file system folder for persistence. That works well it
>>> seems. Data is read from the TDB folder and I can query over port 3030.
>>>
>>> b. I have another .war application running in the same glassfish
>>> instance.
>>> This .war uses the same TDB folder for persistence. The application
>>> queries
>>> and updates the TDB folder. The data is persistent in that if I undeploy
>>> the application and redeploy the data is still there.
>>>
>>> Now, if the application in b. inserts new data, then this is not
>>> reflected
>>> in the fuseki application (a). If I, however, undeploy and redeploy
>>> fuseki.war then the new data from b. is there.
>>>
>>> My initial hypothesis is that fuseki uses a cache at some level, so
>>> that if
>>> the underlying TDB file store changes, it is not picked up. Is this a
>>> reasonable hypothesis?
>>>
>>> If so, is there some way I might circumvent this behaviour - i.e. can I
>>> enforce a TDB sync before each query in Fuseki somehow?
>>>
>>> If not, then I'm probably confused and I'm asking for a friendly hint on
>>> where I go wrong.
>>>
>>> I have read the caching and synchronization section in the
>>> documentation (
>>> https://jena.apache.org/documentation/tdb/java_api.html), generally
>>> googled
>>> this issue (e.g. "fuseki cache" or "share TDB store") and I'm aware of
>>> the
>>> single-jvm constraints on TDB (
>>> https://jena.apache.org/documentation/tdb/tdb_transactions.html) as
>>> well as
>>> a suggested pattern of letting all other clients of a TDB store go
>>> through
>>> fuseki if I understand it correctly (
>>>
>>> http://answers.semanticweb.com/questions/28629/sharing-jena-tdb-data-store
>>> ).
>>>
>>>
>>>
>>> Any and all help is appreciated.
>>>
>>> :-)
>>>
>>> M.
>>>
>>> ------ a few excerpts on how b. relates to TDB, commits and the dataset
>>> -----
>>>
>>> I create my dataset like this:
>>>
>>> dataset = TDBFactory.createDataset(datasetdir);
>>>
>>> Before each sparql insert I do
>>>
>>> dataset.begin(ReadWrite.WRITE);
>>>
>>> After each sparql insert I do
>>>
>>> try {
>>> dataset.commit();
>>> } catch {
>>> ...} finally {
>>> dataset.end();
>>> }
>>>
>>>
>>
>

Re: possibly a problem with fuseki caching

Posted by Andy Seaborne <an...@apache.org>.

PS What is your system setup?  32 or 64 bit java?  (caching works 
differently btween those two).  It won't work in either case bu tfor 
different reasons (the index caches are probably shared via memory 
mapped fiels on 64 bit - not on 32 bit - there is also a large in-JVM, 
not shared, cache for the node table)

	Andy

On 06/05/15 11:09, Andy Seaborne wrote:
> Hi Morten,
>
> The way TDB works, only one application can have the database files at
> he same time.  I suspect that the checking TDB performs (you get an
> error if multiple access is deteched) is being defeated by running two
> webapps in the same, but partition, glassfish instance.  The test, IIRC,
> is "same process id", which it is for two instances in glassfish.
>
> TDB uses a number of statics to manage shared access to the database.
> TDBFactory returns the same database for a location every time.  But if
> a class is loaded twice by different classloaders, there are different
> statics.
>
> The way to update a database is through the web facing side with SPARQL
> Update or the SPARQL Graph Store protocol.  Can your updating .war send
> requests to Fuseki?
>
> A more complicated setup would be to have one .war to do all the TDB
> access and have a special Fuseki service that knows to send requests to
> that 3rd DB-manager .war.  That would take new code.
>
>      Andy
>
> On 05/05/15 20:58, Yarc Yarc wrote:
>> Hello,
>>
>> I'm trying to use fuseki for querying live data managed by a web
>> application. This generally works, but I seem to have a caching problem
>> with my fuseki. This is the setup;
>>
>> a. I'm running the fuseki.war over glassfish 4 configured with a dataset
>> fethced from a TDB file system folder for persistence. That works well it
>> seems. Data is read from the TDB folder and I can query over port 3030.
>>
>> b. I have another .war application running in the same glassfish
>> instance.
>> This .war uses the same TDB folder for persistence. The application
>> queries
>> and updates the TDB folder. The data is persistent in that if I undeploy
>> the application and redeploy the data is still there.
>>
>> Now, if the application in b. inserts new data, then this is not
>> reflected
>> in the fuseki application (a). If I, however, undeploy and redeploy
>> fuseki.war then the new data from b. is there.
>>
>> My initial hypothesis is that fuseki uses a cache at some level, so
>> that if
>> the underlying TDB file store changes, it is not picked up. Is this a
>> reasonable hypothesis?
>>
>> If so, is there some way I might circumvent this behaviour - i.e. can I
>> enforce a TDB sync before each query in Fuseki somehow?
>>
>> If not, then I'm probably confused and I'm asking for a friendly hint on
>> where I go wrong.
>>
>> I have read the caching and synchronization section in the
>> documentation (
>> https://jena.apache.org/documentation/tdb/java_api.html), generally
>> googled
>> this issue (e.g. "fuseki cache" or "share TDB store") and I'm aware of
>> the
>> single-jvm constraints on TDB (
>> https://jena.apache.org/documentation/tdb/tdb_transactions.html) as
>> well as
>> a suggested pattern of letting all other clients of a TDB store go
>> through
>> fuseki if I understand it correctly (
>> http://answers.semanticweb.com/questions/28629/sharing-jena-tdb-data-store).
>>
>>
>>
>> Any and all help is appreciated.
>>
>> :-)
>>
>> M.
>>
>> ------ a few excerpts on how b. relates to TDB, commits and the dataset
>> -----
>>
>> I create my dataset like this:
>>
>> dataset = TDBFactory.createDataset(datasetdir);
>>
>> Before each sparql insert I do
>>
>> dataset.begin(ReadWrite.WRITE);
>>
>> After each sparql insert I do
>>
>> try {
>> dataset.commit();
>> } catch {
>> ...} finally {
>> dataset.end();
>> }
>>
>

Re: possibly a problem with fuseki caching

Posted by Andy Seaborne <an...@apache.org>.

Hi Morten,

The way TDB works, only one application can have the database files at 
he same time.  I suspect that the checking TDB performs (you get an 
error if multiple access is deteched) is being defeated by running two 
webapps in the same, but partition, glassfish instance.  The test, IIRC, 
is "same process id", which it is for two instances in glassfish.

TDB uses a number of statics to manage shared access to the database. 
TDBFactory returns the same database for a location every time.  But if 
a class is loaded twice by different classloaders, there are different 
statics.

The way to update a database is through the web facing side with SPARQL 
Update or the SPARQL Graph Store protocol.  Can your updating .war send 
requests to Fuseki?

A more complicated setup would be to have one .war to do all the TDB 
access and have a special Fuseki service that knows to send requests to 
that 3rd DB-manager .war.  That would take new code.

	Andy

On 05/05/15 20:58, Yarc Yarc wrote:
> Hello,
>
> I'm trying to use fuseki for querying live data managed by a web
> application. This generally works, but I seem to have a caching problem
> with my fuseki. This is the setup;
>
> a. I'm running the fuseki.war over glassfish 4 configured with a dataset
> fethced from a TDB file system folder for persistence. That works well it
> seems. Data is read from the TDB folder and I can query over port 3030.
>
> b. I have another .war application running in the same glassfish instance.
> This .war uses the same TDB folder for persistence. The application queries
> and updates the TDB folder. The data is persistent in that if I undeploy
> the application and redeploy the data is still there.
>
> Now, if the application in b. inserts new data, then this is not reflected
> in the fuseki application (a). If I, however, undeploy and redeploy
> fuseki.war then the new data from b. is there.
>
> My initial hypothesis is that fuseki uses a cache at some level, so that if
> the underlying TDB file store changes, it is not picked up. Is this a
> reasonable hypothesis?
>
> If so, is there some way I might circumvent this behaviour - i.e. can I
> enforce a TDB sync before each query in Fuseki somehow?
>
> If not, then I'm probably confused and I'm asking for a friendly hint on
> where I go wrong.
>
> I have read the caching and synchronization section in the documentation (
> https://jena.apache.org/documentation/tdb/java_api.html), generally googled
> this issue (e.g. "fuseki cache" or "share TDB store") and I'm aware of the
> single-jvm constraints on TDB (
> https://jena.apache.org/documentation/tdb/tdb_transactions.html) as well as
> a suggested pattern of letting all other clients of a TDB store go through
> fuseki if I understand it correctly (
> http://answers.semanticweb.com/questions/28629/sharing-jena-tdb-data-store).
>
>
> Any and all help is appreciated.
>
> :-)
>
> M.
>
> ------ a few excerpts on how b. relates to TDB, commits and the dataset
> -----
>
> I create my dataset like this:
>
> dataset = TDBFactory.createDataset(datasetdir);
>
> Before each sparql insert I do
>
> dataset.begin(ReadWrite.WRITE);
>
> After each sparql insert I do
>
> try {
> dataset.commit();
> } catch {
> ...} finally {
> dataset.end();
> }
>