You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Simon Helsen <sh...@ca.ibm.com> on 2011/10/31 17:34:33 UTC

IRIResolver cache not thread safe?

Hi all,

very sometimes (it is a rare and impossibly difficult to reproduce) we run 
into a problem which I can only explain by a concurrency violation. The 
stack trace we typically encounter is below. I know that the concrete NPE 
which comes as a result is specific to the IBM JDK because of the way they 
implement LinkedHashMap, but the bug is not theirs. That the Sun SDK does 
not immediately expose the issue is mere luck. Looking at the IRIResolver 
code, it seems to me that the cache inside the IRIResolveNormal (
resolvedIRIs) ought to be using a concurrent linked hashMap since there is 
only one global cache which can be used by many threads. 

Any thoughts?

Thanks

Simon

java.lang.NullPointerException
        at java.util.LinkedHashMap.get(LinkedHashMap.java:324)
        at org.openjena.atlas.lib.cache.CacheLRU.get(CacheLRU.java:41)
        at 
org.openjena.atlas.lib.cache.CacheWrapper.get(CacheWrapper.java:29)
        at 
org.openjena.atlas.lib.cache.CacheWithGetter.get(CacheWithGetter.java:26)
        at 
org.openjena.riot.IRIResolver$IRIResolverNormal.resolveSilent(IRIResolver.java:377)
        at 
org.openjena.riot.IRIResolver$IRIResolverNormal.resolveToString(IRIResolver.java:363)
        at 
org.openjena.riot.IRIResolver.resolveGlobalToString(IRIResolver.java:78)
        at 
org.openjena.riot.JenaReaderRIOT.readImpl(JenaReaderRIOT.java:121)
        at org.openjena.riot.JenaReaderRIOT.read(JenaReaderRIOT.java:40)
        at 
com.ibm.team.jfs.indexing.service.internal.rdf.RDFUtil.readNTriples(RDFUtil.java:166)

Re: IRIResolver cache not thread safe?

Posted by Andy Seaborne <an...@apache.org>.
On 14/11/11 20:20, Simon Helsen wrote:
> Andy,
>
> thanks, I was digging through the code paths both in the latest as well
> as the version we currently still have deployed (arq 2.8.5) and it seems
> that the only difference is that in 2.8.5, the readImpl of JenaReadRIOT
> calls the resolve here:
>
> *private**void*readImpl(Model model, Tokenizer tokenizer, String base)
> {
> // The reader has been checked, if possible, by now or
> // constructed correctly by code here.
> *if*( base != *null*)
> base = IRIResolver./resolveGlobalToString/(base) ;
> *try*{
> model.notifyEvent( GraphEvents./startRead/);
> readWorker(model, tokenizer, base) ;
> }
> ...
> whereas in the latest, there is no resolution of the base at all at that
> stage:
>
> *private**void*readImpl(Model model, Tokenizer tokenizer, String base)
> {
> *try*{
> model.notifyEvent( GraphEvents./startRead/);
> readWorker(model, tokenizer, base) ;
> }
> ...
>
> Now, deeper down, both create an IRIResolverNormal object, so I am
> guessing it is safe if I patch our current version to remove the call to
> resolveGlobalToString. Any comments? (and before you ask, this is
> because we are currently not yet upgrading the jena stack, but I need to
> fix this since it leads to concurrency errors during our indexing process)
>
> thanks
>
> Simon

Better would be to patch IRIresolver and add "synchronized" to every 
static function that uses "globalResolver".  Deleting the code from 
readImpl means that you've change the behaviour of reading when the base 
is provided.  Maybe your code always sets an absolute base ... but safer 
to fix IRIResolver IMO.  I don't think globalResolver isn't let out of 
IRIResolver in v2.8.5

	Andy



Re: IRIResolver cache not thread safe?

Posted by Simon Helsen <sh...@ca.ibm.com>.
Andy,

thanks, I was digging through the code paths both in the latest as well as 
the version we currently still have deployed (arq 2.8.5) and it seems that 
the only difference is that in 2.8.5, the readImpl of JenaReadRIOT calls 
the resolve here:

    private void readImpl(Model model, Tokenizer tokenizer, String base)
    {
        // The reader has been checked, if possible, by now or
        // constructed correctly by code here. 
        if ( base != null )
            base = IRIResolver.resolveGlobalToString(base) ;
        try {
            model.notifyEvent( GraphEvents.startRead );
            readWorker(model, tokenizer,  base) ;
        }
        ...
whereas in the latest, there is no resolution of the base at all at that 
stage:

   private void readImpl(Model model, Tokenizer tokenizer, String base)
    {
        try {
            model.notifyEvent( GraphEvents.startRead );
            readWorker(model, tokenizer,  base) ;
        }
        ...

Now, deeper down, both create an IRIResolverNormal object, so I am 
guessing it is safe if I patch our current version to remove the call to 
resolveGlobalToString. Any comments? (and before you ask, this is because 
we are currently not yet upgrading the jena stack, but I need to fix this 
since it leads to concurrency errors during our indexing process)

thanks

Simon



From:
Andy Seaborne <an...@apache.org>
To:
Simon Helsen/Toronto/IBM@IBMCA
Cc:
jena-dev@incubator.apache.org, Andy Seaborne 
<an...@gmail.com>
Date:
11/03/2011 04:15 AM
Subject:
Re: IRIResolver cache not thread safe?



On 02/11/11 20:06, Simon Helsen wrote:

> However, independent of that, when I look at IRIResolverNormal, it has a
> cache
>
> *private*Cache<String, IRI> resolvedIRIs=
> CacheFactory./createCache/(getter, /CacheSize/) ;
>
> which is not thread-safe. So *public*IRI resolveSilent(String
> relURI)could not be used from multiple threads.
>
> @Andy: are you saying that access to resolveSilent is definitely
> single-threaded, even if there are multiple concurrent reads?

.resolveSilent is an object method - it is not called from separate 
threads without an outer lock.

Each parser run creates a Prologue object which includes a prefix map 
and a resolver, and the also the bNode label mapping.

It's is only for the life time of the RIOT parser (not the jena reader) 
and a new one if created for each parse run so they are are not 
multithread - they have a handle on the input stream to be parsed for 
example.  There is no need to lock.

There is also a static instance of IRIResolverNormal inside IRIResolver 
(there is no route to getting hold of the object directly) and every 
static method that accesses it is synchronized.  It is created in class 
initialization.

e.g.

    static private IRI resolveIRI(String relStr, String baseStr)
     {
         synchronized(globalResolverLock)
         {
             IRI i = iriFactory.create(relStr);
             if (i.isAbsolute())
                 // removes excess . segments
                 return globalResolver.getBaseIRI().create(i);

             IRI base = iriFactory.create(baseStr);

             if ("file".equalsIgnoreCase(base.getScheme()))
                 return globalResolver.getBaseIRI().create(i);
             return base.create(i);
         }
     }

IRIResolverNormal is a static inner class with resolvedIRIs as a field. 
  A new cache is created for each new instance of IRIResolverNormal, and 
all calls into IRIResolverNormal are either via IRIResolver.create or 
globalResolver in IRIResolver.

So either IRIResolverNormal is used by a parser (new instance) or it's
used by a synchronized static. There is no need to use a concurent hash
map for resolvedIRIs.

If this is not the case, please could you provide point to the code
where it is not so.

                 Andy

>
> thanks
>
> Simon




Re: IRIResolver cache not thread safe?

Posted by Andy Seaborne <an...@apache.org>.
On 02/11/11 20:06, Simon Helsen wrote:

> However, independent of that, when I look at IRIResolverNormal, it has a
> cache
>
> *private*Cache<String, IRI> resolvedIRIs=
> CacheFactory./createCache/(getter, /CacheSize/) ;
>
> which is not thread-safe. So *public*IRI resolveSilent(String
> relURI)could not be used from multiple threads.
>
> @Andy: are you saying that access to resolveSilent is definitely
> single-threaded, even if there are multiple concurrent reads?

.resolveSilent is an object method - it is not called from separate 
threads without an outer lock.

Each parser run creates a Prologue object which includes a prefix map 
and a resolver, and the also the bNode label mapping.

It's is only for the life time of the RIOT parser (not the jena reader) 
and a new one if created for each parse run so they are are not 
multithread - they have a handle on the input stream to be parsed for 
example.  There is no need to lock.

There is also a static instance of IRIResolverNormal inside IRIResolver 
(there is no route to getting hold of the object directly) and every 
static method that accesses it is synchronized.  It is created in class 
initialization.

e.g.

    static private IRI resolveIRI(String relStr, String baseStr)
     {
         synchronized(globalResolverLock)
         {
             IRI i = iriFactory.create(relStr);
             if (i.isAbsolute())
                 // removes excess . segments
                 return globalResolver.getBaseIRI().create(i);

             IRI base = iriFactory.create(baseStr);

             if ("file".equalsIgnoreCase(base.getScheme()))
                 return globalResolver.getBaseIRI().create(i);
             return base.create(i);
         }
     }

IRIResolverNormal is a static inner class with resolvedIRIs as a field. 
  A new cache is created for each new instance of IRIResolverNormal, and 
all calls into IRIResolverNormal are either via IRIResolver.create or 
globalResolver in IRIResolver.

So either IRIResolverNormal is used by a parser (new instance) or it's
used by a synchronized static. There is no need to use a concurent hash
map for resolvedIRIs.

If this is not the case, please could you provide point to the code
where it is not so.

	Andy

>
> thanks
>
> Simon

Re: IRIResolver cache not thread safe?

Posted by Simon Helsen <sh...@ca.ibm.com>.
sorry for the late response. I looked again into this

so, yes, first of all, I encountered this in ARQ 2.8.7 and indeed in later 
ARQs, the path seems different

Trunk ARQ:

   private void readImpl(Model model, Tokenizer tokenizer, String base)
    {
        try {
            model.notifyEvent( GraphEvents.startRead );
            readWorker(model, tokenizer,  base) ;
        }

ARQ 2.8.5:

  private void readImpl(Model model, Tokenizer tokenizer, String base)
    {
        // The reader has been checked, if possible, by now or
        // constructed correctly by code here. 
        if ( base != null )
            base = IRIResolver.resolveGlobalToString(base) ;
        try {
            model.notifyEvent( GraphEvents.startRead );
            readWorker(model, tokenizer,  base) ;
        }

So, perhaps we circumvent the problem using a later ARQ (we are looking to 
adopt ARQ 2.8.8 in the near future)

However, independent of that, when I look at IRIResolverNormal, it has a 
cache

 private Cache<String, IRI> resolvedIRIs = CacheFactory.createCache(getter
, CacheSize) ;

which is not thread-safe. So  public IRI resolveSilent(String relURI) 
could not be used from multiple threads. 

@Andy: are you saying that access to resolveSilent is definitely 
single-threaded, even if there are multiple concurrent reads?

thanks

Simon



From:
Andy Seaborne <an...@apache.org>
To:
jena-dev@incubator.apache.org
Date:
10/31/2011 06:08 PM
Subject:
Re: IRIResolver cache not thread safe?



On 31/10/11 16:34, Simon Helsen wrote:
> Hi all,
>
> very sometimes (it is a rare and impossibly difficult to reproduce) we 
run
> into a problem which I can only explain by a concurrency violation. The
> stack trace we typically encounter is below. I know that the concrete 
NPE
> which comes as a result is specific to the IBM JDK because of the way 
they
> implement LinkedHashMap, but the bug is not theirs. That the Sun SDK 
does
> not immediately expose the issue is mere luck. Looking at the 
IRIResolver
> code, it seems to me that the cache inside the IRIResolveNormal (
> resolvedIRIs) ought to be using a concurrent linked hashMap since there 
is
> only one global cache which can be used by many threads.
>
> Any thoughts?

There is no such operation IRIResolver.resolveGlobalToString anymore. 
Which version are you using?

Use of private static "globalResolver" may need protecting (see the 
current system) which has one IRIResolverNormal - there may be others 
and the class itself IRIResolverNormal does not need it (it has no 
statics) and there are ways to create new ones.

                 Andy

>
> Thanks
>
> Simon
>
> java.lang.NullPointerException
>          at java.util.LinkedHashMap.get(LinkedHashMap.java:324)
>          at org.openjena.atlas.lib.cache.CacheLRU.get(CacheLRU.java:41)
>          at
> org.openjena.atlas.lib.cache.CacheWrapper.get(CacheWrapper.java:29)
>          at
> 
org.openjena.atlas.lib.cache.CacheWithGetter.get(CacheWithGetter.java:26)
>          at
> 
org.openjena.riot.IRIResolver$IRIResolverNormal.resolveSilent(IRIResolver.java:377)
>          at
> 
org.openjena.riot.IRIResolver$IRIResolverNormal.resolveToString(IRIResolver.java:363)
>          at
> org.openjena.riot.IRIResolver.resolveGlobalToString(IRIResolver.java:78)
>          at
> org.openjena.riot.JenaReaderRIOT.readImpl(JenaReaderRIOT.java:121)
>          at 
org.openjena.riot.JenaReaderRIOT.read(JenaReaderRIOT.java:40)
>          at
> 
com.ibm.team.jfs.indexing.service.internal.rdf.RDFUtil.readNTriples(RDFUtil.java:166)




Re: IRIResolver cache not thread safe?

Posted by Andy Seaborne <an...@apache.org>.
On 31/10/11 16:34, Simon Helsen wrote:
> Hi all,
>
> very sometimes (it is a rare and impossibly difficult to reproduce) we run
> into a problem which I can only explain by a concurrency violation. The
> stack trace we typically encounter is below. I know that the concrete NPE
> which comes as a result is specific to the IBM JDK because of the way they
> implement LinkedHashMap, but the bug is not theirs. That the Sun SDK does
> not immediately expose the issue is mere luck. Looking at the IRIResolver
> code, it seems to me that the cache inside the IRIResolveNormal (
> resolvedIRIs) ought to be using a concurrent linked hashMap since there is
> only one global cache which can be used by many threads.
>
> Any thoughts?

There is no such operation IRIResolver.resolveGlobalToString anymore. 
Which version are you using?

Use of private static "globalResolver" may need protecting (see the 
current system) which has one IRIResolverNormal - there may be others 
and the class itself IRIResolverNormal does not need it (it has no 
statics) and there are ways to create new ones.

	Andy

>
> Thanks
>
> Simon
>
> java.lang.NullPointerException
>          at java.util.LinkedHashMap.get(LinkedHashMap.java:324)
>          at org.openjena.atlas.lib.cache.CacheLRU.get(CacheLRU.java:41)
>          at
> org.openjena.atlas.lib.cache.CacheWrapper.get(CacheWrapper.java:29)
>          at
> org.openjena.atlas.lib.cache.CacheWithGetter.get(CacheWithGetter.java:26)
>          at
> org.openjena.riot.IRIResolver$IRIResolverNormal.resolveSilent(IRIResolver.java:377)
>          at
> org.openjena.riot.IRIResolver$IRIResolverNormal.resolveToString(IRIResolver.java:363)
>          at
> org.openjena.riot.IRIResolver.resolveGlobalToString(IRIResolver.java:78)
>          at
> org.openjena.riot.JenaReaderRIOT.readImpl(JenaReaderRIOT.java:121)
>          at org.openjena.riot.JenaReaderRIOT.read(JenaReaderRIOT.java:40)
>          at
> com.ibm.team.jfs.indexing.service.internal.rdf.RDFUtil.readNTriples(RDFUtil.java:166)