You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Andy Seaborne <an...@epimorphics.com> on 2011/03/15 11:39:09 UTC

Re: SPARQL query equality and "equivalence"...

> I had a first go at implementing a cached QueryEngineHTTP:
> https://github.com/castagna/sparql-cache/raw/master/src/main/java/com/talis/labs/arq/CachedQueryEngineHTTP.java
>
> Invalidation is by service end-point and this approach makes sense only
> if you have mostly reads with few and non frequent updates.
>
> I am not happy with all those synchronized and the static cache.

   synchronized (cache) {
      if (cache.containsKey(key)) {
            return (ResultSetRewindable) cache.get(key);
       }
     }
     rs = ResultSetFactory.makeRewindable(super.execSelect());
       synchronized (cache) {
          cache.put(key, rs);
     }

The synchronization needs to be over the get and the put together.  What 
if another thread comes in after the first block?  Both find .get is null.

You then have two threads executing the makeRewindable/execSelect  This 
happens to be safe because of the operation super.execSelect() is safe 
in parallel but in general the operation being cached is not thread safe.

Generally this is safer: a single sync over get and the 
put-if-cache-miss.  And in this case just as fast, and makes only one 
call to the far end:
  (thread 2 waits on the lock, not makes the query on the remote end):

   synchronized (cache) {
      if (cache.containsKey(key)) {
            return (ResultSetRewindable) cache.get(key);
      rs = ResultSetFactory.makeRewindable(super.execSelect());
      cache.put(key, rs);
     }

	Andy

Re: SPARQL query equality and "equivalence"...

Posted by Andy Seaborne <an...@epimorphics.com>.

On 15/03/11 19:39, Paolo Castagna wrote:
> Hi Andy.
>
> Andy Seaborne wrote:
>>
>>> I had a first go at implementing a cached QueryEngineHTTP:
>>> https://github.com/castagna/sparql-cache/raw/master/src/main/java/com/talis/labs/arq/CachedQueryEngineHTTP.java
>>>
>>>
>>> Invalidation is by service end-point and this approach makes sense only
>>> if you have mostly reads with few and non frequent updates.
>>>
>>> I am not happy with all those synchronized and the static cache.
>>
>> synchronized (cache) {
>> if (cache.containsKey(key)) {
>> return (ResultSetRewindable) cache.get(key);
>> }
>> }
>> rs = ResultSetFactory.makeRewindable(super.execSelect());
>> synchronized (cache) {
>> cache.put(key, rs);
>> }
>>
>> The synchronization needs to be over the get and the put together.
>> What if another thread comes in after the first block? Both find .get
>> is null.
>
> This was my first solution, then I tried to be clever...
> I wanted to reduce as much as possible the synchronized part and in
> particular
> I wanted not to have a potentially long operation inside the synchronized
> block. The risk is to have two threads making the query remotely and adding
> the result to the cache.
>
> My fear was that the approach below will reduce concurrency on the client,
> but this is probably not true.

The lock is held for a short time if there is a cache hit and long time 
on a cache miss.  But for the cache miss any thread blocked is waiting 
for this result and would otherwise make the call itself (and slow the 
far end down !).

>
>> You then have two threads executing the makeRewindable/execSelect This
>> happens to be safe because of the operation super.execSelect() is safe
>> in parallel but in general the operation being cached is not thread safe.
>>
>> Generally this is safer: a single sync over get and the
>> put-if-cache-miss. And in this case just as fast, and makes only one
>> call to the far end:
>> (thread 2 waits on the lock, not makes the query on the remote end):
>>
>> synchronized (cache) {
>> if (cache.containsKey(key)) {
>> return (ResultSetRewindable) cache.get(key);
>> rs = ResultSetFactory.makeRewindable(super.execSelect());
>> cache.put(key, rs);
>> }
>
> I'll follow your suggestion.
>
> I also added one which uses Memcached and one which uses Redis as well:
> https://github.com/castagna/sparql-cache/raw/master/src/main/java/com/talis/labs/arq/MemcachedQueryEngineHTTP.java
>
> https://github.com/castagna/sparql-cache/raw/master/src/main/java/com/talis/labs/arq/RedisQueryEngineHTTP.java
>
>
> Cache invalidation is the problem. :-)

:-)

And remote SPARQL endpoints don't manage etags or cache life times very 
well - and can't if random updates appear.

Maybe keep entries for a fixed length of time.

An entry can be checked by asking the server some cheap operation if it 
supports ETags in some useful manner.  If the server provides the same 
etag for all requests between updates, life is OK.  A bit of fixed time 
(to catch very high frequency requests) and

The HTTP request "If-None-Match" (and a 304 response) is also possible 
but it is an extra round trip if the data has changed.

(memo to self: add etags to Fuseki)

> I've tried to follow your advice on implementing a new QueryEngine
> with a factory/implementation pair, but I got confused... it was
> late and this is somehow "out-of-band" activity.
>
> However, I would like to have a good/proper example to show how you
> can have a cached query engine either local or remote.
>
> Thanks,
> Paolo
>
>>
>> Andy

Re: SPARQL query equality and "equivalence"...

Posted by Paolo Castagna <ca...@googlemail.com>.
Hi Andy.

Andy Seaborne wrote:
> 
>> I had a first go at implementing a cached QueryEngineHTTP:
>> https://github.com/castagna/sparql-cache/raw/master/src/main/java/com/talis/labs/arq/CachedQueryEngineHTTP.java 
>>
>>
>> Invalidation is by service end-point and this approach makes sense only
>> if you have mostly reads with few and non frequent updates.
>>
>> I am not happy with all those synchronized and the static cache.
> 
>   synchronized (cache) {
>      if (cache.containsKey(key)) {
>            return (ResultSetRewindable) cache.get(key);
>       }
>     }
>     rs = ResultSetFactory.makeRewindable(super.execSelect());
>       synchronized (cache) {
>          cache.put(key, rs);
>     }
> 
> The synchronization needs to be over the get and the put together.  What 
> if another thread comes in after the first block?  Both find .get is null.

This was my first solution, then I tried to be clever...
I wanted to reduce as much as possible the synchronized part and in particular
I wanted not to have a potentially long operation inside the synchronized
block. The risk is to have two threads making the query remotely and adding
the result to the cache.

My fear was that the approach below will reduce concurrency on the client,
but this is probably not true.

> You then have two threads executing the makeRewindable/execSelect  This 
> happens to be safe because of the operation super.execSelect() is safe 
> in parallel but in general the operation being cached is not thread safe.
> 
> Generally this is safer: a single sync over get and the 
> put-if-cache-miss.  And in this case just as fast, and makes only one 
> call to the far end:
>  (thread 2 waits on the lock, not makes the query on the remote end):
> 
>   synchronized (cache) {
>      if (cache.containsKey(key)) {
>            return (ResultSetRewindable) cache.get(key);
>      rs = ResultSetFactory.makeRewindable(super.execSelect());
>      cache.put(key, rs);
>     }

I'll follow your suggestion.

I also added one which uses Memcached and one which uses Redis as well:
https://github.com/castagna/sparql-cache/raw/master/src/main/java/com/talis/labs/arq/MemcachedQueryEngineHTTP.java
https://github.com/castagna/sparql-cache/raw/master/src/main/java/com/talis/labs/arq/RedisQueryEngineHTTP.java

Cache invalidation is the problem. :-)

I've tried to follow your advice on implementing a new QueryEngine
with a factory/implementation pair, but I got confused... it was
late and this is somehow "out-of-band" activity.

However, I would like to have a good/proper example to show how you
can have a cached query engine either local or remote.

Thanks,
Paolo

> 
>     Andy