You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Peter Abramowitsch <pa...@gmail.com> on 2019/03/09 16:17:31 UTC

Re: ctake web service and multi-threading

After seeing some of the thread contention issues last year, I started from scratch to create a pipeline pool that sizes itself according to the memory that’s available.  Each instance contains the complete pipeline including the Term Annotator and a re-settable JCas object.   I don’t use any of the thread constructs in piper files - to not confuse the issue.  All of this is accessed via a web service with a multi threaded dispatcher (SparkJava).  This implementation does allow each of the pool members to initialize its pipeline serially though.  It only starts handling requests when all members are ready.

It seems to be completely stable - running for hours with as many clients as the memory can handle.   Looking at performance stats, one can see that, single threaded, there are latencies introduced by dictionary lookups, and these ‘holes’ are filled when multi threading, allowing a much greater ability to saturate the cpu.  After a certain point, of course, when all cores are saturated, the performance curve flattens out. But there are no exceptions.   

Introducing a wait  in the pool’s getMember() allows the system to deal effectively with overloading - ie, what to do when all members are busy: there are more requests than the pool-size will handle.  

I don’t think there would be a problem to have pool members with differently configured pipelines, but I never tried that.  And I suppose it could be that one of the annotators in the community might not work if it had any statics.  But I’m guessing that they would primarily be configuration items set in the annotator’s initialization phase.   

Long story short - with a bit of extra packaging, you can get it to work without tampering with the core code.

Peter
Sent from my iPad

> On Mar 8, 2019, at 08:23, Jeffrey Miller <je...@gmail.com> wrote:
> 
> Is there any known reason that you can't create a pipeline pool, but keep
> everything in the same process? Is it safe to load multiple pipelines in
> the same process as long as only one thread can access each one at a time
> (we plan to use this in a Spark pipeline). One caveat I have noticed- it
> seems like if I use the thread safe components to build a pipeline pool,
> only one dictionary for the DefaultJCasTermAnnotator can be loaded per
> process. For example, I was trying to take advantage of the ability to
> switch pipelines via a query parameter that is suggested at in the code for
> the rest service. The two pipelines used different ontology dictionaries,
> but it seemed like with the thread safe components it must have reduced
> the DefaultJCasTermAnnotator to a singleton object in memory, because it
> only used the first dictionary instantiated. Either way, given how Sean
> described how the thread safe components worked above, you probably
> wouldn't want to use them in a pipeline pool, assuming that the problems
> with threading was limited to multiple threads access the same pipeline at
> the same time, and not having multiple pipelines loaded into memory each
> accessed by only a single thread.
> 
> On Fri, Mar 8, 2019 at 11:06 AM Kathy Ferro <he...@gmail.com>
> wrote:
> 
>> I thought about creating a queue that acts as traffic cop.  Only the
>> traffic cop calls the WS.  I also want to test multiple WS running on
>> different port.  Traffic cop calls which every WS is available and keep
>> track of WS statuses.  With all this processing going, it might kill the
>> power for blocks.
>> 
>> On Fri, Mar 8, 2019 at 10:34 AM Finan, Sean <
>> Sean.Finan@childrens.harvard.edu> wrote:
>> 
>>> Hi all,
>>> 
>>> I guess that a quick test could be run with a multi-threaded pipeline.
>>> Tim, for some reason I recall you checking in one with a dockerfile.
>> Maybe
>>> not, and it might not be the default in the service.  Anyway, you could
>> set
>>> the procs to something like 50 and throw 50 users at it.  It definitely
>>> does not scale anything close to linearly.  ctakes aes aren't build for
>>> thread-safety, so they are all wrapped with locks and there is a lot of
>>> thread contention.  However, running such a test might indicate the
>> source
>>> of the problem.
>>> 
>>> The other option is to create a queue that collects post calls and doles
>>> them out serially to a single pipeline.  User #50 would probably not
>>> appreciate it though ...
>>> ________________________________________
>>> From: gandhi rajan <ga...@gmail.com>
>>> Sent: Friday, March 8, 2019 10:02 AM
>>> To: dev@ctakes.apache.org
>>> Subject: Re: ctake web service [EXTERNAL]
>>> 
>>> Hi Kathy,
>>> 
>>> I guess the initializations happens in post construct method. So if we
>>> could synchronize that I feel we can get away from the problem.
>>> Unfortunately I m not able to tet this as my setup is gone with my old
>> job.
>>> Try it out.
>>> 
>>> Regards,
>>> Gandhi.
>>> 
>>>> On Friday, March 8, 2019, Kathy Ferro <he...@gmail.com> wrote:
>>>> 
>>>> Tim,
>>>> 
>>>> Thanks for reply.  I'm continuing the research.  With all the layers
>> that
>>>> wrap around this, you would think we can handle this suggestion.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Thu, Mar 7, 2019 at 8:01 PM Miller, Timothy <
>>>> Timothy.Miller@childrens.harvard.edu> wrote:
>>>> 
>>>>> That's a good question that I've also heard from others, and
>>>> unfortunately
>>>>> I don't know the answer. My use cases are typically a single job at a
>>>> time
>>>>> making sequential calls, so I wasn't stressing it with multiple
>>>>> asynchronous calls. I would've thought that the Tomcat container
>> would
>>>> have
>>>>> some ability to manage that though!
>>>>> Tim
>>>>> 
>>>>> ________________________________________
>>>>> From: Kathy Ferro <he...@gmail.com>
>>>>> Sent: Thursday, March 7, 2019 6:10 PM
>>>>> To: dev@ctakes.apache.org
>>>>> Subject: Re: ctake web service [EXTERNAL]
>>>>> 
>>>>> Tim,
>>>>> 
>>>>> Does docker solution handle multiple instances?  I tested the Rest
>> Web
>>>>> Service with 2 requests at the same time, it errors out.  I removed
>> the
>>>>> part that write the result xml file to the disc; it still error out.
>>>>> 
>>>>> Best,
>>>>> Kathy
>>>>> 
>>>>> On Mon, Mar 4, 2019 at 10:52 AM Miller, Timothy <
>>>>> Timothy.Miller@childrens.harvard.edu> wrote:
>>>>> 
>>>>>> I don't know what the solution was, but I leave my ctakes REST
>> server
>>>>>> running basically full time and haven't seen time outs yet.
>>>>>> Tim
>>>>>> 
>>>>>> ________________________________________
>>>>>> From: gandhi rajan <ga...@gmail.com>
>>>>>> Sent: Monday, March 4, 2019 10:43 AM
>>>>>> To: dev@ctakes.apache.org
>>>>>> Subject: Re: ctake web service [EXTERNAL]
>>>>>> 
>>>>>> Hi Kathy, Sean did respond that there is no timeout happening from
>>>> cTAKES
>>>>>> end. You might probably have to look at database settings for this
>>>> closed
>>>>>> connection issue.
>>>>>> 
>>>>>> Does someone have any clue on this?
>>>>>> 
>>>>>> On Monday, March 4, 2019, Kathy Ferro <he...@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>>> Gandhi,
>>>>>>> 
>>>>>>> Do you get any response to this issue?  Does it try to keep the
>>>>>> connection
>>>>>>> open while WS is up? Or does it open and close after it's done?
>>>>>>> 
>>>>>>> We are still getting this error.
>>>>>>> "ERROR JdbcRareWordDictionary - No operations allowed after
>>> statement
>>>>>>> closed."
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Kathy
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Aug 17, 2018 at 9:43 AM Gandhi Rajan Natarajan <
>>>>>>> Gandhi.Natarajan@arisglobal.com> wrote:
>>>>>>> 
>>>>>>>> Hi Kathy,
>>>>>>>> 
>>>>>>>> Sometime back we encountered this issue and the problem seems
>> to
>>> be
>>>>> DB
>>>>>>>> connections getting timed out.
>>>>>>>> 
>>>>>>>> Currently we are using the following implementations:
>>>>>>>> 
>>>>>> "org.apache.ctakes.dictionary.lookup2.dictionary.
>>>> JdbcRareWordDictionary"
>>>>>>>> and "org.apache.ctakes.dictionary.lookup2.concept.
>>>> JdbcConceptFactory"
>>>>>>>> 
>>>>>>>> Does anybody aware of any timeout settings that needs to be
>> done
>>> in
>>>>>> these
>>>>>>>> implementations to avoid DB connection timeout issue?
>>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: Kathy Ferro <he...@gmail.com>
>>>>>>>> Sent: Thursday, August 16, 2018 11:07 PM
>>>>>>>> To: dev@ctakes.apache.org
>>>>>>>> Subject: ctake web service
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Just want to see if anybody has experience this issue.
>>>>>>>> 
>>>>>>>> If the web service had been up for a day or two, it will drop
>> the
>>>>>>>> dictionary lookup.  The only result it returns are
>>>>> ConllDependencyNode
>>>>>>> tag
>>>>>>>> in the xmi file;  no mention, no concept, etc...
>>>>>>>> 
>>>>>>>> I haven't have a chance to investigate it, yet.
>>>>>>>> 
>>>>>>>> Kathy
>>>>>>>> This email and any files transmitted with it are confidential
>> and
>>>>>>> intended
>>>>>>>> solely for the use of the individual or entity to whom they are
>>>>>>> addressed.
>>>>>>>> If you are not the named addressee you should not disseminate,
>>>>>> distribute
>>>>>>>> or copy this e-mail. Please notify the sender or system manager
>>> by
>>>>>> email
>>>>>>>> immediately if you have received this e-mail by mistake and
>>> delete
>>>>> this
>>>>>>>> e-mail from your system. If you are not the intended recipient
>>> you
>>>>> are
>>>>>>>> notified that disclosing, copying, distributing or taking any
>>>> action
>>>>> in
>>>>>>>> reliance on the contents of this information is strictly
>>> prohibited
>>>>> and
>>>>>>>> against the law.
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Regards,
>>>>>> Gandhi
>>>>>> 
>>>>>> "The best way to find urself is to lose urself in the service of
>>> others
>>>>>> !!!"
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Regards,
>>> Gandhi
>>> 
>>> "The best way to find urself is to lose urself in the service of others
>>> !!!"
>>> 
>>