You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Fergal Monaghan (JIRA)" <ji...@apache.org> on 2015/08/20 13:44:45 UTC

[jira] [Comment Edited] (OPENNLP-808) Parser is not thread safe

    [ https://issues.apache.org/jira/browse/OPENNLP-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704717#comment-14704717 ] 

Fergal Monaghan edited comment on OPENNLP-808 at 8/20/15 11:44 AM:
-------------------------------------------------------------------

This initial patch is simply to beef up the chunking ParserTest to highlight and recreate the issue:
https://issues.apache.org/jira/secure/attachment/12751472/test_thread_safety_bug.diff

1. The new test method `shouldParse` parses a tokenised sentence without issue
2. The new AbstractConcurrencyTest class (which ParserTest now subclasses) allows any subclass to run it's test in a multi-threaded environment (set to 2 concurrent test threads by default). ParserTest now implements the abstract `concurrentProcess` test method, calling `shouldParse` concurrently which fails and recreates the issue reported
3. Note that for the parser used in the tests to be initialised, the large `en-parser-chunking.bin` model file must be available on the classpath


was (Author: roadrunner):
This initial patch is simply to beef up the chunking ParserTest to highlight and recreate the issue:

1. The new test method `shouldParse` parses a tokenised sentence without issue
2. The new AbstractConcurrencyTest class (which ParserTest now subclasses) allows any subclass to run it's test in a multi-threaded environment (set to 2 concurrent test threads by default). ParserTest now implements the abstract `concurrentProcess` test method, calling `shouldParse` concurrently which fails and recreates the issue reported
3. Note that for the parser used in the tests to be initialised, the large `en-parser-chunking.bin` model file must be available on the classpath

> Parser is not thread safe
> -------------------------
>
>                 Key: OPENNLP-808
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-808
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Parser
>    Affects Versions: tools-1.5.3, 1.6.0
>         Environment: Ubuntu 14.04.3 LTS
> java version "1.7.0_55"
> Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
>            Reporter: Fergal Monaghan
>         Attachments: test_thread_safety_bug.diff
>
>
> I'm actually not sure if this is really a "Major" "Bug" as I have listed it, perhaps it is by design. However even in this case this issue should possibly be listed as an "Improvement".
> Steps to recreate:
> 1. Run 2 or more threads simultaneously which make calls to the same parser object with the same piece of text.
> 2. One of a couple of things happens:
> (a) Either: line 281 of opennlp.tools.parser.AbstractBottomUpParser throws a java.util.ConcurrentModificationException from java.util.ArrayList iterator due to the `odh` field being global/shared in the object and not local to the method.
> (b) Or: the opennlp.tools.postag.DefaultPOSContextGenerator.getContext method throws a NullPointerException from line 77 of the opennlp.tools.util.Cache.clear method, since the underlying opennlp.tools.util.DoubleLinkedListElement is altered out from underneath it.
> Unless there are serious memory reasons for doing so, I would propose that such fields could be made local to the method since thread safety may take precedence over the memory saved in this case. As is, any code that calls the parser has to be enclosed in a giant synchronized block, and all applications using the parser have serious performance issues/cannot make use of modern hardware. I could be way of the mark here though if there is method to the madness :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)