You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ctakes.apache.org by "Yadav, Harish" <hy...@live.unc.edu> on 2017/12/14 01:04:22 UTC

Slowness in processing files

Hi All,

When the medical records are run with the AE as AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor the processing is very slow. It is pretty fast when the smaller files (~2 kb) are fed as input but when I am processing with bigger files say, 2Mb, it is very slow and the files are taking ~5 hours to process. Any pointer will be of great help.

Regards,
Harish.

Re: Slowness in processing files [EXTERNAL]

Posted by James Masanz <ma...@gmail.com>.
I created a 2MB file by concatenating together many copies of (the text
version of) Peds_FebrileSez_1 and it still isn't finished after many hours.
So that's going to require some debug.  Until someone gets to debugging
that:

As Jonas S pointed out, 11 seconds for 2K does mean ~ 3hrs for 2MB, if
linear, and I don't expect all components to be as nice as linear, though I
don't have numbers offhand.

A few ideas
  - are there parts of the files that can be ignored? 2MB seems large.
Using a sectionizer as the first part of the pipeline and having later
components skip processing some sections should help, if you don't need the
entire document annotated

 - are there some components you could do without?

 - you could try replacing some of the annotators with others that run
faster, for example there are rule based versions of  the polarity,
subject, certainty, and history of components (See NE Contexts)

https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+-+NE+Contexts

Also, there is initialization time for each component - if you process 10
documents, it doesn't take 10 times as long as a single document.  So to
get a sense of how things will scale for you, you need to run multiple
documents. For example if I run 1 document = 11 second but 10 copies takes
75 seconds, not 110.

-- James






On Fri, Dec 15, 2017 at 3:16 PM, James Masanz <ma...@gmail.com>
wrote:

>
> I tried an input file of 5.5K and I was surprised to find it took 11
> seconds on my laptop.
>
> I'll run a 2MB input file and post results tomorrow. I'll also compare
> running from binary vs. running from within an IDE in case the timings are
> affected by the size of the jars built for the binary install.
>
> With the 5.5K input file, the annotators taking the most time were
>   ConstituencyParser - 39%
>   HistoryCleartk - 11%
>   PolarityCleartk - 11%
>   LVG annotator - 8%
>   GenericCleartk - 7.5%
>
> Note the above numbers are from a single run of a single file.
>
> If you're not using the output of any of the annotators that are among the
> longer-running ones in your environment (or  any downstream annotators that
> depend upon their output), you could consider removing some of them from
> your pipeline.
>
> For those not familiar with the CPE Gui, after it processes a set of
> documents, it outputs a performance report showing the percentage and
> absolute time taken by each annotator in a pipeline.
>
>
>
>
> On Thu, Dec 14, 2017 at 2:15 PM, Yadav, Harish <hy...@live.unc.edu>
> wrote:
>
>> Hi James,
>>
>>
>>
>> Below is the CAS consumer detail:
>>
>>
>>
>> FileWriterCasConsumer
>>
>>
>>
>> Descriptor in collection reader:
>>
>>
>>
>> FilesInDirectoryCollectionReader.xml
>>
>>
>>
>> The contents of AggregatePlaintextFastUMLSProcessor are not changed and
>> I have always used CPE GUI by clear all option. I am not sure of hard drive
>> error logs, but will check that as one of the possibilities.
>>
>>
>>
>> Could you please let me know approximately how much time it took for you
>> to run files of sizes ~2Mb (or if you can share any other benchmarks for
>> other file sizes you used earlier)
>>
>>
>>
>> Regards,
>>
>> Harish.
>>
>>
>>
>> *From:* James Masanz [mailto:masanz.james@gmail.com]
>> *Sent:* Thursday, December 14, 2017 1:21 PM
>> *To:* user@ctakes.apache.org
>> *Subject:* Re: Slowness in processing files [EXTERNAL]
>>
>>
>>
>> sorry, I meant verify that the contents of  the xml file for the fast
>> dictionary lookup haven't changed (AggregatePlaintextFastUMLSProcessor)
>>
>>
>>
>> On Thu, Dec 14, 2017 at 1:20 PM, James Masanz <ma...@gmail.com>
>> wrote:
>>
>>
>>
>> Harish,
>>
>>
>>
>> with the AggregatePlaintextFastUMLSProcessor, it should not be taking
>> that long. It sounds like either something outside of cTAKES is having an
>> issue (a hard drive starting to fail) or that you are accidentally
>> running AggregatePlaintextUMLSProcessor.
>>
>>
>> I've had issues with the CPE GUI not always behaving well for me.
>>
>>
>>
>> I suggest when you run the CPE GUI, you use File->Clear all and
>> re-enter/re-select what you want.
>>
>> If that doesn't help, verify that the contents
>> of AggregatePlaintextUMLSProcessor haven't been changed.
>>
>>
>> If none of that helps, as a last resort, I'd look into hard drive error
>> logs.
>>
>>
>>
>> Also, are you using a  Cas  Consumer? if so, which one.
>>
>>
>>
>>
>>
>> On Thu, Dec 14, 2017 at 12:04 PM, <Jo...@informatik.hu-berlin.de>
>> wrote:
>>
>> If a 2kb file takes about 11 seconds, then a 2mb file is expected to take
>> ~11*1000 seconds which is about 3 hours (under the assumption that the
>> runtime is linear to the file size).
>>
>> I do not know if the pipeline can be sped up. I would suggest to chunk
>> the file into smaller chunks (pieces) and run the pipeline in parallel for
>> each chunk.
>>
>> Jonas S
>>
>> Am 14.12.17 um 17:48 schrieb Yadav, Harish:
>>
>> Hi Timothy,
>>
>> Sorry for the typo, I meant ran with AE AggregatePlaintextFastUMLSProcessor
>> with -Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours)
>> for a single file of 2 Mb size. It runs fine for 2 Kb file.
>>
>> Regards,
>> Harish.
>>
>> -----Original Message-----
>> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
>> Sent: Thursday, December 14, 2017 11:22 AM
>> To: user@ctakes.apache.org
>> Subject: Re: Slowness in processing files [EXTERNAL]
>>
>> You missed the most important part of my message:
>>
>> Do not try to use AggregatePlainTextProcessor, it is just slow.
>>
>>
>> Use AggregatePlaintextFastUMLSProcessor
>>
>> Tim
>>
>>
>> On Thu, 2017-12-14 at 16:15 +0000, Yadav, Harish wrote:
>>
>> Hi Timothy,
>>
>> I fixed the password issues and ran with AE
>> AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a
>> lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I
>> have checked the memory consumption of the process and it never goes
>> above 4.5 G, so I am not sure if it is the memory issue. However, AE
>> AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but
>> most of our files are in Mbs so processing time for each file for more
>> than 2 hours is not feasible.
>>
>> Could you please suggest something which may improve the performance.
>> Below are the logs for the process of 2 Mb file with
>> AggregatePlainTextProcessor:
>>
>>
>>
>> Logs:
>>
>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp
>> "C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>> 4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes-
>> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g
>> org.apache.uima.tools.cpm.CpmFrame
>> Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init>
>> WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs
>> at root 0x80000002. Windows
>> RegCreateKeyEx(...) returned error code 5.
>> log4j: reset attribute= "false".
>> log4j: Threshold ="null".
>> log4j: Retreiving an instance of org.apache.log4j.Logger.
>> log4j: Setting [ProgressAppender] additivity to [false].
>> log4j: Level value for ProgressAppender is  [INFO].
>> log4j: ProgressAppender level set to INFO
>> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>> log4j: Setting property [conversionPattern] to [%m].
>> log4j: Adding appender named [noEolAppender] to category
>> [ProgressAppender].
>> log4j: Retreiving an instance of org.apache.log4j.Logger.
>> log4j: Setting [ProgressDone] additivity to [false].
>> log4j: Level value for ProgressDone is  [INFO].
>> log4j: ProgressDone level set to INFO
>> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>> log4j: Setting property [conversionPattern] to [%m%n].
>> log4j: Adding appender named [eolAppender] to category [ProgressDone].
>> log4j: Level value for root is  [INFO].
>> log4j: root level set to INFO
>> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy
>> HH:mm:ss} %5p %c{1} - %m%n].
>> log4j: Adding appender named [consoleAppender] to category [root].
>> 14 Dec 2017 09:42:09  INFO Chunker - Chunker model file:
>> org/apache/ctakes/chunker/models/chunker-model.zip
>> 14 Dec 2017 09:42:10  INFO TokenizerAnnotatorPTB - Initializing
>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
>> 14 Dec 2017 09:42:10  INFO ContextDependentTokenizerAnnotator - Finite
>> state machines loaded.
>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
>> dictionary lookup window type:
>> org.apache.ctakes.typesystem.type.textspan.Sentence
>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Exclusion
>> tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB
>> VBD VBG VBN VBP VBZ WDT WP WPS WRB
>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using minimum
>> term text span: 3
>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
>> Dictionary Descriptor:
>> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
>> 14 Dec 2017 09:42:10  INFO DictionaryDescriptorParser - Parsing
>> dictionary specifications:
>> 14 Dec 2017 09:42:10  INFO UmlsUserApprover - Checking UMLS Account at
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.
>> nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW
>> 14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
>> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
>> user harish1234:
>> .14 Dec 2017 09:42:11  INFO UmlsUserApprover -   UMLS Account at http
>> s://urldefense.proofpoint.com/v2/url?u=https-3A__uts-
>> 2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69z
>> y_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
>> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
>> user harish1234 has been validated
>>
>> 14 Dec 2017 09:42:11  INFO JdbcConnectionFactory - Connecting to
>> jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/s
>> no_rx_16ab/sno_rx_16ab:
>> 14 Dec 2017 09:42:11  INFO ENGINE - open start - state not modified
>> ..................
>> 14 Dec 2017 09:42:17  INFO JdbcConnectionFactory -  Database connected
>> 14 Dec 2017 09:42:17  INFO JdbcRareWordDictionary - Connected to cui
>> and term table CUI_TERMS
>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>> table TUI with class TUI
>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>> table RXNORM with class LONG
>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>> table PREFTERM with class PREFTERM
>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>> table SNOMEDCT_US with class LONG
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
>> sizes: 10 , 10
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
>> LEFT,RIGHT
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
>> org.apache.ctakes.necontexts.status.StatusContextAnalyzer
>> 14 Dec 2017 09:42:17  INFO StatusContextAnalyzer - initBoundaryData()
>> called for ContextInitializer
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
>> org.apache.ctakes.necontexts.status.StatusContextHitConsumer
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
>> type: org.apache.ctakes.typesystem.type.textspan.Sentence
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
>> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
>> org.apache.ctakes.typesystem.type.syntax.BaseToken
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
>> sizes: 7 , 7
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
>> LEFT,RIGHT
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
>> org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
>> 14 Dec 2017 09:42:17  INFO NegationContextAnalyzer -
>> initBoundaryData() called for ContextInitializer
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
>> org.apache.ctakes.necontexts.negation.NegationContextHitConsumer
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
>> type: org.apache.ctakes.typesystem.type.textspan.Sentence
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
>> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
>> org.apache.ctakes.typesystem.type.syntax.BaseToken
>> 14 Dec 2017 09:42:17  INFO SentenceDetector - Sentence detector model
>> file: org/apache/ctakes/core/sentdetect/sd-med-model.zip
>> 14 Dec 2017 09:42:17  INFO POSTagger - POS tagger model file:
>> org/apache/ctakes/postagger/models/mayo-pos.zip
>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - Loading NLM Norm
>> and Lvg with config file = C:\New_Drive\apache-ctakes-4.0.0-
>> bin\apache-ctakes-
>> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl -   config file
>> absolute path = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cwd =
>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>> 4.0.0\resources\org\apache\ctakes\lvg\
>> 14 Dec 2017 09:42:18  INFO ENGINE - open start - state not modified
>> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open start
>> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open end
>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
>> 14 Dec 2017 09:42:18  INFO DrugMentionAnnotator - Finite state
>> machines loaded.
>> 14 Dec 2017 09:42:23  INFO ClearNLPDependencyParserAE - using Morphy
>> analysis? true Loading configuration.
>> Loading feature templates.
>> Loading lexica.
>> Loading model:
>> .....................................................................
>> ...................
>> Loading configuration.
>> Loading feature templates.
>> Loading model:
>> .
>> Loading configuration.
>> Loading feature templates.
>> Loading lexica.
>> Loading model:
>> ...
>> <various Loading model>
>> .
>> Loading configuration.
>> Loading feature templates.
>> Loading lexica.
>> Loading model:
>> ................................
>> Loading model:
>> .............................
>> 14 Dec 2017 09:42:32  INFO ConstituencyParser - Initializing parser...
>> 14 Dec 2017 09:42:33  INFO SentenceDetector - Starting processing.
>> 14 Dec 2017 09:42:34  INFO TokenizerAnnotatorPTB - process(JCas) in
>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
>> 14 Dec 2017 09:42:36  INFO LvgAnnotator - process(JCas)
>> 14 Dec 2017 09:42:55  INFO ContextDependentTokenizerAnnotator -
>> process(JCas)
>> 14 Dec 2017 09:42:58  INFO POSTagger - process(JCas)
>> 14 Dec 2017 09:43:10  INFO Chunker -  process(JCas)
>> 14 Dec 2017 09:43:46  INFO ChunkAdjuster -  process(JCas)
>> 14 Dec 2017 09:43:47  INFO ChunkAdjuster -  process(JCas)
>> 14 Dec 2017 09:43:48  INFO AbstractJCasTermAnnotator - Starting
>> processing
>> 14 Dec 2017 09:43:54  INFO AbstractJCasTermAnnotator - Finished
>> processing
>> 14 Dec 2017 09:43:54  INFO DrugMentionAnnotator - process(JCas)
>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:34  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:38  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:39  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:42  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:43  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:45  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:53  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:54  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:59  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:00  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:05  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:06  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:08  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:11  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:16  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:24  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:27  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:30  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:32  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:35  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:38  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:45  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:53  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:54  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:02  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:22  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:24  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:28  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:29  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:34  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:38  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:46  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:49  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:58  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:48:45  INFO MaxentParserWrapper - Started processing:
>> idd_secondTrial.txt
>> 14 Dec 2017 10:20:19  INFO MaxentParserWrapper - Done parsing:
>> idd_secondTrial.txt
>>
>>
>>
>>
>>
>>
>>
>> Regards,
>> Harish.
>>
>>
>> -----Original Message-----
>> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
>> Sent: Thursday, December 14, 2017 9:16 AM
>> To: user@ctakes.apache.org
>> Subject: Re: Slowness in processing files [EXTERNAL]
>>
>> Do not try to use AggregatePlainTextProcessor, it is just slow.
>> Use the fast version and debug the password issues.
>> Make sure you have your UMLS credentials set in:
>> $CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_r
>> x_
>> 16ab.xml
>>
>> in two different places.
>>
>> Tim
>>
>>
>>
>> On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote:
>>
>>
>> Hi James,
>>   Thanks for responding.
>>   Single file is taking ~5 hours to process with
>> AggregatePlainTextProcessor of size 2 Mb. This is how the process
>> looks like for JVM arguments regarding memory:
>>   C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
>> -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXXXX" -cp
>> "C:\New_Drive\apache-ctakes-4.0.0-bi
>> apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-
>> bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-
>> 4.0.0-
>> bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
>> nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
>> ctakes-
>> 4.0.0\config\log4j.xml -Xms512M -Xmx3g
>> org.apache.uima.tools.cpm.CpmFrame
>>   Also, just now I tried to process the file with AE
>>   AggregatePlaintextFastUMLSProcessor but ran into different problem
>> of not getting authentication error with same username password
>> being used in AggregatePlainTextProcessor.
>>   I can run it with AggregatePlaintextFastUMLSProcessor by increasing
>> Xms 5g and Xmx5g,  if you could please let me know how can it be
>> possible that with one AE AggregatePlainTextProcessor it is running
>> fine with above username and password but giving below exception
>> with same username, password with
>> AggregatePlaintextFastUMLSProcessor.
>>   Exception:
>>     C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
>> -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXX" -cp
>> "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-
>> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
>> ctakes-
>> 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-
>> ctakes-
>> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g
>> org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM
>> java.util.prefs.WindowsPreferences <init> WARNING: Could not
>> open/create prefs root node Software\JavaSoft\Prefs at root
>> 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
>> log4j:
>> attributes.... 13 Dec 2017 21:04:58  INFO Chunker - Chunker model
>> file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec
>> 2017
>> 21:05:00  INFO TokenizerAnnotatorPTB - Initializing
>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017
>> 21:05:00
>> INFO ContextDependentTokenizerAnnotator - Finite state machines
>> loaded. 13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using
>> dictionary lookup window type:
>> org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017
>> 21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded:
>> CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN
>> VBP VBZ WDT WP WPS WRB   13 Dec 2017 21:05:00  INFO
>> AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec
>> 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary
>> Descriptor:
>> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
>> 13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing
>> dictionary specifications: 13 Dec 2017 21:05:00  INFO
>> UmlsUserApprover
>> - Checking UMLS Account at https://urldefense.proofpoint.com/v2/url
>> ?u=https-3A__uts-
>> 2Dws.nlm.nih.go&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
>> eFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
>> vCEkIgDc6DU1Nbw&s=v_ivdTVH9oojQd-0bohfzxVCl5UxJlSZ5FUfi7qnmxo&e=
>> v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017
>> 21:05:02 ERROR UmlsUserApprover -   UMLS Account at https://urldefe
>> nse.proofpoint.com/v2/url?u=https-3A__uts-
>> 2Dws.nl&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=He
>> up-IbsIg9Q1TPOylpP9FE4GTK-
>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
>> vCEkIgDc6DU1Nbw&s=U8OuKgmE0YMDPABaTm39jDFIXG4tnVEeE1rrCS03cbM&e=
>> m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß
>> with XXXXXXX
>> org.apache.uima.resource.ResourceInitializationException:
>> Initialization of CAS Processor with name
>> "AggregatePlaintextFastUMLSProcessor" failed.         at
>> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
>> ti
>> alize(CollectionProcessingEngine_impl.java:81)         at
>> org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessin
>> gE
>> ngine(UIMAFramework_impl.java:420)         at
>> org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIM
>> AF
>> ramework.java:918)         at
>> org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:57
>> 3)
>>          at
>> org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
>>          at
>> org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
>> by: org.apache.uima.resource.ResourceConfigurationException:
>> Initialization of CAS Processor with name
>> "AggregatePlaintextFastUMLSProcessor" failed.         at
>> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
>> eg
>> ratedCasProcessor(CPEFactory.java:1101)         at
>> org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProc
>> es
>> sors(CPEFactory.java:547)         at
>> org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.ja
>> va
>> :253)         at
>> org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.
>> ja
>> va:127)         at
>> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
>> ti
>> alize(CollectionProcessingEngine_impl.java:73)         ... 5 more
>> Caused by:
>> org.apache.uima.resource.ResourceInitializationException:
>> Initialization of annotator class
>> "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator"
>> failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
>> bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
>> fast/desc/analysis_engine/UmlsLookupAnnotator.xml)         at
>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>> ni
>> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
>>          at
>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>> ni
>> tialize(PrimitiveAnalysisEngine_impl.java:170)         at
>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
>> ly
>> sisEngineFactory_impl.java:94)         at
>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
>> Co
>> mpositeResourceFactory_impl.java:62)         at
>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
>> 9)
>>          at
>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
>> av
>> a:407)         at
>> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.ja
>> va
>> :256)         at
>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>> ni
>> tASB(AggregateAnalysisEngine_impl.java:429)         at
>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>> ni
>> tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:37
>> 3)
>>          at
>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>> ni
>> tialize(AggregateAnalysisEngine_impl.java:186)         at
>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
>> ly
>> sisEngineFactory_impl.java:94)         at
>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
>> Co
>> mpositeResourceFactory_impl.java:62)         at
>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
>> 9)
>>          at
>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:33
>> 1)
>>          at
>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
>> av
>> a:448)         at
>> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
>> eg
>> ratedCasProcessor(CPEFactory.java:1085)         ... 9 more Caused
>> by:
>> org.apache.uima.resource.ResourceInitializationException: MESSAGE
>> LOCALIZATION FAILED: Can't find resource for bundle
>> java.util.PropertyResourceBundle, key C ould not construct
>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>> ti
>> onary         at
>> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
>> ni
>> tialize(AbstractJCasTermAnnotator.java:131)         at
>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>> ni
>> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
>>          ... 24 more Caused by:
>> org.apache.uima.analysis_engine.annotator.AnnotatorContextException
>> :
>> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
>> java.util.PropertyResourceBu ndle, key Could not construct
>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>> ti
>> onary         at
>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>> rP
>> arser.parseDictionary(DictionaryDescriptorParser.java:199)
>> at
>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>> rP
>> arser.parseDictionaries(DictionaryDescriptorParser.java:156)
>> at
>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>> rP
>> arser.parseDescriptor(DictionaryDescriptorParser.java:128)
>> at
>> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
>> ni
>> tialize(AbstractJCasTermAnnotator.java:129)         ... 25 more
>> Caused
>> by: java.lang.reflect.InvocationTargetException         at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>>          at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
>> Source)
>>          at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
>> Source)         at
>> java.lang.reflect.Constructor.newInstance(Unknown
>> Source)         at
>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>> rP
>> arser.parseDictionary(DictionaryDescriptorParser.java:196)
>> ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS
>> dictionary sno_rx_16abTerms         at
>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>> ti
>> onary.<init>(UmlsJdbcRareWordDictionary.java:29)         ... 33 more
>>       From: James Masanz [mailto:masanz.james@gmail.com]
>> Sent: Wednesday, December 13, 2017 8:56 PM
>> To: user@ctakes.apache.org
>> Subject: Re: Slowness in processing files
>>   Using AggregatePlaintextFastUMLSProcessor  is much faster than
>> AggregatePlainTextProcessor, so I suggest that to start with you
>> just use AggregatePlaintextFastUMLSProcessor.
>>   Do you mean it is taking ~5 hours for a single file to be processed
>> at times, or is that for a set of files?
>>   If your JVM heap space is not set large enough, you can get very
>> slow results.
>> Try increasing to 5G (or more) using the JVM parameter   -Xmx5G For
>> faster start up, you can also set the -Xms to the same or something
>> close to -Xmx value.
>>     -- James
>>   On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <hyadav@live.unc.edu
>>
>>
>>
>> wrote:
>> Hi All,
>>   When the medical records are run with the AE as
>> AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor
>> the processing is very slow. It is pretty fast when the smaller
>> files
>> (~2 kb) are fed as input but when I am processing with bigger files
>> say, 2Mb, it is very slow and the files are taking ~5 hours to
>> process. Any pointer will be of great help.
>>   Regards,
>> Harish.
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Re: Slowness in processing files [EXTERNAL]

Posted by James Masanz <ma...@gmail.com>.
I tried an input file of 5.5K and I was surprised to find it took 11
seconds on my laptop.

I'll run a 2MB input file and post results tomorrow. I'll also compare
running from binary vs. running from within an IDE in case the timings are
affected by the size of the jars built for the binary install.

With the 5.5K input file, the annotators taking the most time were
  ConstituencyParser - 39%
  HistoryCleartk - 11%
  PolarityCleartk - 11%
  LVG annotator - 8%
  GenericCleartk - 7.5%

Note the above numbers are from a single run of a single file.

If you're not using the output of any of the annotators that are among the
longer-running ones in your environment (or  any downstream annotators that
depend upon their output), you could consider removing some of them from
your pipeline.

For those not familiar with the CPE Gui, after it processes a set of
documents, it outputs a performance report showing the percentage and
absolute time taken by each annotator in a pipeline.




On Thu, Dec 14, 2017 at 2:15 PM, Yadav, Harish <hy...@live.unc.edu> wrote:

> Hi James,
>
>
>
> Below is the CAS consumer detail:
>
>
>
> FileWriterCasConsumer
>
>
>
> Descriptor in collection reader:
>
>
>
> FilesInDirectoryCollectionReader.xml
>
>
>
> The contents of AggregatePlaintextFastUMLSProcessor are not changed and I
> have always used CPE GUI by clear all option. I am not sure of hard drive
> error logs, but will check that as one of the possibilities.
>
>
>
> Could you please let me know approximately how much time it took for you
> to run files of sizes ~2Mb (or if you can share any other benchmarks for
> other file sizes you used earlier)
>
>
>
> Regards,
>
> Harish.
>
>
>
> *From:* James Masanz [mailto:masanz.james@gmail.com]
> *Sent:* Thursday, December 14, 2017 1:21 PM
> *To:* user@ctakes.apache.org
> *Subject:* Re: Slowness in processing files [EXTERNAL]
>
>
>
> sorry, I meant verify that the contents of  the xml file for the fast
> dictionary lookup haven't changed (AggregatePlaintextFastUMLSProcessor)
>
>
>
> On Thu, Dec 14, 2017 at 1:20 PM, James Masanz <ma...@gmail.com>
> wrote:
>
>
>
> Harish,
>
>
>
> with the AggregatePlaintextFastUMLSProcessor, it should not be taking
> that long. It sounds like either something outside of cTAKES is having an
> issue (a hard drive starting to fail) or that you are accidentally running
> AggregatePlaintextUMLSProcessor.
>
>
> I've had issues with the CPE GUI not always behaving well for me.
>
>
>
> I suggest when you run the CPE GUI, you use File->Clear all and
> re-enter/re-select what you want.
>
> If that doesn't help, verify that the contents of
> AggregatePlaintextUMLSProcessor haven't been changed.
>
>
> If none of that helps, as a last resort, I'd look into hard drive error
> logs.
>
>
>
> Also, are you using a  Cas  Consumer? if so, which one.
>
>
>
>
>
> On Thu, Dec 14, 2017 at 12:04 PM, <Jo...@informatik.hu-berlin.de> wrote:
>
> If a 2kb file takes about 11 seconds, then a 2mb file is expected to take
> ~11*1000 seconds which is about 3 hours (under the assumption that the
> runtime is linear to the file size).
>
> I do not know if the pipeline can be sped up. I would suggest to chunk the
> file into smaller chunks (pieces) and run the pipeline in parallel for each
> chunk.
>
> Jonas S
>
> Am 14.12.17 um 17:48 schrieb Yadav, Harish:
>
> Hi Timothy,
>
> Sorry for the typo, I meant ran with AE AggregatePlaintextFastUMLSProcessor
> with -Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours)
> for a single file of 2 Mb size. It runs fine for 2 Kb file.
>
> Regards,
> Harish.
>
> -----Original Message-----
> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
> Sent: Thursday, December 14, 2017 11:22 AM
> To: user@ctakes.apache.org
> Subject: Re: Slowness in processing files [EXTERNAL]
>
> You missed the most important part of my message:
>
> Do not try to use AggregatePlainTextProcessor, it is just slow.
>
>
> Use AggregatePlaintextFastUMLSProcessor
>
> Tim
>
>
> On Thu, 2017-12-14 at 16:15 +0000, Yadav, Harish wrote:
>
> Hi Timothy,
>
> I fixed the password issues and ran with AE
> AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a
> lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I
> have checked the memory consumption of the process and it never goes
> above 4.5 G, so I am not sure if it is the memory issue. However, AE
> AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but
> most of our files are in Mbs so processing time for each file for more
> than 2 hours is not feasible.
>
> Could you please suggest something which may improve the performance.
> Below are the logs for the process of 2 Mb file with
> AggregatePlainTextProcessor:
>
>
>
> Logs:
>
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp
> "C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes-
> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g
> org.apache.uima.tools.cpm.CpmFrame
> Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init>
> WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs
> at root 0x80000002. Windows
> RegCreateKeyEx(...) returned error code 5.
> log4j: reset attribute= "false".
> log4j: Threshold ="null".
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressAppender] additivity to [false].
> log4j: Level value for ProgressAppender is  [INFO].
> log4j: ProgressAppender level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m].
> log4j: Adding appender named [noEolAppender] to category
> [ProgressAppender].
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressDone] additivity to [false].
> log4j: Level value for ProgressDone is  [INFO].
> log4j: ProgressDone level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m%n].
> log4j: Adding appender named [eolAppender] to category [ProgressDone].
> log4j: Level value for root is  [INFO].
> log4j: root level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy
> HH:mm:ss} %5p %c{1} - %m%n].
> log4j: Adding appender named [consoleAppender] to category [root].
> 14 Dec 2017 09:42:09  INFO Chunker - Chunker model file:
> org/apache/ctakes/chunker/models/chunker-model.zip
> 14 Dec 2017 09:42:10  INFO TokenizerAnnotatorPTB - Initializing
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> 14 Dec 2017 09:42:10  INFO ContextDependentTokenizerAnnotator - Finite
> state machines loaded.
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
> dictionary lookup window type:
> org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Exclusion
> tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB
> VBD VBG VBN VBP VBZ WDT WP WPS WRB
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using minimum
> term text span: 3
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
> Dictionary Descriptor:
> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> 14 Dec 2017 09:42:10  INFO DictionaryDescriptorParser - Parsing
> dictionary specifications:
> 14 Dec 2017 09:42:10  INFO UmlsUserApprover - Checking UMLS Account at
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.
> nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW
> 14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
> user harish1234:
> .14 Dec 2017 09:42:11  INFO UmlsUserApprover -   UMLS Account at http
> s://urldefense.proofpoint.com/v2/url?u=https-3A__uts-
> 2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69z
> y_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
> user harish1234 has been validated
>
> 14 Dec 2017 09:42:11  INFO JdbcConnectionFactory - Connecting to
> jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/s
> no_rx_16ab/sno_rx_16ab:
> 14 Dec 2017 09:42:11  INFO ENGINE - open start - state not modified
> ..................
> 14 Dec 2017 09:42:17  INFO JdbcConnectionFactory -  Database connected
> 14 Dec 2017 09:42:17  INFO JdbcRareWordDictionary - Connected to cui
> and term table CUI_TERMS
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
> table TUI with class TUI
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
> table RXNORM with class LONG
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
> table PREFTERM with class PREFTERM
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
> table SNOMEDCT_US with class LONG
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
> sizes: 10 , 10
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
> LEFT,RIGHT
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
> org.apache.ctakes.necontexts.status.StatusContextAnalyzer
> 14 Dec 2017 09:42:17  INFO StatusContextAnalyzer - initBoundaryData()
> called for ContextInitializer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
> org.apache.ctakes.necontexts.status.StatusContextHitConsumer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
> type: org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
> org.apache.ctakes.typesystem.type.syntax.BaseToken
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
> sizes: 7 , 7
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
> LEFT,RIGHT
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
> org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
> 14 Dec 2017 09:42:17  INFO NegationContextAnalyzer -
> initBoundaryData() called for ContextInitializer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
> org.apache.ctakes.necontexts.negation.NegationContextHitConsumer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
> type: org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
> org.apache.ctakes.typesystem.type.syntax.BaseToken
> 14 Dec 2017 09:42:17  INFO SentenceDetector - Sentence detector model
> file: org/apache/ctakes/core/sentdetect/sd-med-model.zip
> 14 Dec 2017 09:42:17  INFO POSTagger - POS tagger model file:
> org/apache/ctakes/postagger/models/mayo-pos.zip
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - Loading NLM Norm
> and Lvg with config file = C:\New_Drive\apache-ctakes-4.0.0-
> bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl -   config file
> absolute path = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cwd =
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\
> 14 Dec 2017 09:42:18  INFO ENGINE - open start - state not modified
> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open start
> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open end
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
> 14 Dec 2017 09:42:18  INFO DrugMentionAnnotator - Finite state
> machines loaded.
> 14 Dec 2017 09:42:23  INFO ClearNLPDependencyParserAE - using Morphy
> analysis? true Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> .....................................................................
> ...................
> Loading configuration.
> Loading feature templates.
> Loading model:
> .
> Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> ...
> <various Loading model>
> .
> Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> ................................
> Loading model:
> .............................
> 14 Dec 2017 09:42:32  INFO ConstituencyParser - Initializing parser...
> 14 Dec 2017 09:42:33  INFO SentenceDetector - Starting processing.
> 14 Dec 2017 09:42:34  INFO TokenizerAnnotatorPTB - process(JCas) in
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> 14 Dec 2017 09:42:36  INFO LvgAnnotator - process(JCas)
> 14 Dec 2017 09:42:55  INFO ContextDependentTokenizerAnnotator -
> process(JCas)
> 14 Dec 2017 09:42:58  INFO POSTagger - process(JCas)
> 14 Dec 2017 09:43:10  INFO Chunker -  process(JCas)
> 14 Dec 2017 09:43:46  INFO ChunkAdjuster -  process(JCas)
> 14 Dec 2017 09:43:47  INFO ChunkAdjuster -  process(JCas)
> 14 Dec 2017 09:43:48  INFO AbstractJCasTermAnnotator - Starting
> processing
> 14 Dec 2017 09:43:54  INFO AbstractJCasTermAnnotator - Finished
> processing
> 14 Dec 2017 09:43:54  INFO DrugMentionAnnotator - process(JCas)
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:34  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:38  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:39  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:42  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:43  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:45  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:53  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:59  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:00  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:05  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:06  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:08  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:11  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:16  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:24  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:27  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:30  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:35  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:38  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:45  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:53  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:02  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:22  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:24  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:28  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:29  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:34  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:38  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:46  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:49  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:58  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:48:45  INFO MaxentParserWrapper - Started processing:
> idd_secondTrial.txt
> 14 Dec 2017 10:20:19  INFO MaxentParserWrapper - Done parsing:
> idd_secondTrial.txt
>
>
>
>
>
>
>
> Regards,
> Harish.
>
>
> -----Original Message-----
> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
> Sent: Thursday, December 14, 2017 9:16 AM
> To: user@ctakes.apache.org
> Subject: Re: Slowness in processing files [EXTERNAL]
>
> Do not try to use AggregatePlainTextProcessor, it is just slow.
> Use the fast version and debug the password issues.
> Make sure you have your UMLS credentials set in:
> $CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_r
> x_
> 16ab.xml
>
> in two different places.
>
> Tim
>
>
>
> On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote:
>
>
> Hi James,
>   Thanks for responding.
>   Single file is taking ~5 hours to process with
> AggregatePlainTextProcessor of size 2 Mb. This is how the process
> looks like for JVM arguments regarding memory:
>   C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXXXX" -cp
> "C:\New_Drive\apache-ctakes-4.0.0-bi
> apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-
> bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-
> 4.0.0-
> bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
> nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
> ctakes-
> 4.0.0\config\log4j.xml -Xms512M -Xmx3g
> org.apache.uima.tools.cpm.CpmFrame
>   Also, just now I tried to process the file with AE
>   AggregatePlaintextFastUMLSProcessor but ran into different problem
> of not getting authentication error with same username password
> being used in AggregatePlainTextProcessor.
>   I can run it with AggregatePlaintextFastUMLSProcessor by increasing
> Xms 5g and Xmx5g,  if you could please let me know how can it be
> possible that with one AE AggregatePlainTextProcessor it is running
> fine with above username and password but giving below exception
> with same username, password with
> AggregatePlaintextFastUMLSProcessor.
>   Exception:
>     C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXX" -cp
> "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-
> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
> ctakes-
> 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-
> ctakes-
> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g
> org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM
> java.util.prefs.WindowsPreferences <init> WARNING: Could not
> open/create prefs root node Software\JavaSoft\Prefs at root
> 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
> log4j:
> attributes.... 13 Dec 2017 21:04:58  INFO Chunker - Chunker model
> file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec
> 2017
> 21:05:00  INFO TokenizerAnnotatorPTB - Initializing
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017
> 21:05:00
> INFO ContextDependentTokenizerAnnotator - Finite state machines
> loaded. 13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using
> dictionary lookup window type:
> org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017
> 21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded:
> CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN
> VBP VBZ WDT WP WPS WRB   13 Dec 2017 21:05:00  INFO
> AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec
> 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary
> Descriptor:
> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> 13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing
> dictionary specifications: 13 Dec 2017 21:05:00  INFO
> UmlsUserApprover
> - Checking UMLS Account at https://urldefense.proofpoint.com/v2/url
> ?u=https-3A__uts-
> 2Dws.nlm.nih.go&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
> eFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
> vCEkIgDc6DU1Nbw&s=v_ivdTVH9oojQd-0bohfzxVCl5UxJlSZ5FUfi7qnmxo&e=
> v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017
> 21:05:02 ERROR UmlsUserApprover -   UMLS Account at https://urldefe
> nse.proofpoint.com/v2/url?u=https-3A__uts-
> 2Dws.nl&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=He
> up-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
> vCEkIgDc6DU1Nbw&s=U8OuKgmE0YMDPABaTm39jDFIXG4tnVEeE1rrCS03cbM&e=
> m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß
> with XXXXXXX
> org.apache.uima.resource.ResourceInitializationException:
> Initialization of CAS Processor with name
> "AggregatePlaintextFastUMLSProcessor" failed.         at
> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
> ti
> alize(CollectionProcessingEngine_impl.java:81)         at
> org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessin
> gE
> ngine(UIMAFramework_impl.java:420)         at
> org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIM
> AF
> ramework.java:918)         at
> org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:57
> 3)
>          at
> org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
>          at
> org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
> by: org.apache.uima.resource.ResourceConfigurationException:
> Initialization of CAS Processor with name
> "AggregatePlaintextFastUMLSProcessor" failed.         at
> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
> eg
> ratedCasProcessor(CPEFactory.java:1101)         at
> org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProc
> es
> sors(CPEFactory.java:547)         at
> org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.ja
> va
> :253)         at
> org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.
> ja
> va:127)         at
> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
> ti
> alize(CollectionProcessingEngine_impl.java:73)         ... 5 more
> Caused by:
> org.apache.uima.resource.ResourceInitializationException:
> Initialization of annotator class
> "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator"
> failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
> bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
> fast/desc/analysis_engine/UmlsLookupAnnotator.xml)         at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> ni
> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
>          at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> ni
> tialize(PrimitiveAnalysisEngine_impl.java:170)         at
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
> ly
> sisEngineFactory_impl.java:94)         at
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
> Co
> mpositeResourceFactory_impl.java:62)         at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
> 9)
>          at
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
> av
> a:407)         at
> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.ja
> va
> :256)         at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> ni
> tASB(AggregateAnalysisEngine_impl.java:429)         at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> ni
> tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:37
> 3)
>          at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> ni
> tialize(AggregateAnalysisEngine_impl.java:186)         at
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
> ly
> sisEngineFactory_impl.java:94)         at
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
> Co
> mpositeResourceFactory_impl.java:62)         at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
> 9)
>          at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:33
> 1)
>          at
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
> av
> a:448)         at
> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
> eg
> ratedCasProcessor(CPEFactory.java:1085)         ... 9 more Caused
> by:
> org.apache.uima.resource.ResourceInitializationException: MESSAGE
> LOCALIZATION FAILED: Can't find resource for bundle
> java.util.PropertyResourceBundle, key C ould not construct
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> ti
> onary         at
> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
> ni
> tialize(AbstractJCasTermAnnotator.java:131)         at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> ni
> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
>          ... 24 more Caused by:
> org.apache.uima.analysis_engine.annotator.AnnotatorContextException
> :
> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
> java.util.PropertyResourceBu ndle, key Could not construct
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> ti
> onary         at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> rP
> arser.parseDictionary(DictionaryDescriptorParser.java:199)
> at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> rP
> arser.parseDictionaries(DictionaryDescriptorParser.java:156)
> at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> rP
> arser.parseDescriptor(DictionaryDescriptorParser.java:128)
> at
> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
> ni
> tialize(AbstractJCasTermAnnotator.java:129)         ... 25 more
> Caused
> by: java.lang.reflect.InvocationTargetException         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>          at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
> Source)
>          at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
> Source)         at
> java.lang.reflect.Constructor.newInstance(Unknown
> Source)         at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> rP
> arser.parseDictionary(DictionaryDescriptorParser.java:196)
> ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS
> dictionary sno_rx_16abTerms         at
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> ti
> onary.<init>(UmlsJdbcRareWordDictionary.java:29)         ... 33 more
>       From: James Masanz [mailto:masanz.james@gmail.com]
> Sent: Wednesday, December 13, 2017 8:56 PM
> To: user@ctakes.apache.org
> Subject: Re: Slowness in processing files
>   Using AggregatePlaintextFastUMLSProcessor  is much faster than
> AggregatePlainTextProcessor, so I suggest that to start with you
> just use AggregatePlaintextFastUMLSProcessor.
>   Do you mean it is taking ~5 hours for a single file to be processed
> at times, or is that for a set of files?
>   If your JVM heap space is not set large enough, you can get very
> slow results.
> Try increasing to 5G (or more) using the JVM parameter   -Xmx5G For
> faster start up, you can also set the -Xms to the same or something
> close to -Xmx value.
>     -- James
>   On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <hyadav@live.unc.edu
>
>
>
> wrote:
> Hi All,
>   When the medical records are run with the AE as
> AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor
> the processing is very slow. It is pretty fast when the smaller
> files
> (~2 kb) are fed as input but when I am processing with bigger files
> say, 2Mb, it is very slow and the files are taking ~5 hours to
> process. Any pointer will be of great help.
>   Regards,
> Harish.
>
>
>
>
>
>
>
>

RE: Slowness in processing files [EXTERNAL]

Posted by "Yadav, Harish" <hy...@live.unc.edu>.
Hi James,

Below is the CAS consumer detail:

FileWriterCasConsumer

Descriptor in collection reader:

FilesInDirectoryCollectionReader.xml

The contents of AggregatePlaintextFastUMLSProcessor are not changed and I have always used CPE GUI by clear all option. I am not sure of hard drive error logs, but will check that as one of the possibilities.

Could you please let me know approximately how much time it took for you to run files of sizes ~2Mb (or if you can share any other benchmarks for other file sizes you used earlier)

Regards,
Harish.

From: James Masanz [mailto:masanz.james@gmail.com]
Sent: Thursday, December 14, 2017 1:21 PM
To: user@ctakes.apache.org
Subject: Re: Slowness in processing files [EXTERNAL]

sorry, I meant verify that the contents of  the xml file for the fast dictionary lookup haven't changed (AggregatePlaintextFastUMLSProcessor)

On Thu, Dec 14, 2017 at 1:20 PM, James Masanz <ma...@gmail.com>> wrote:

Harish,

with the AggregatePlaintextFastUMLSProcessor, it should not be taking that long. It sounds like either something outside of cTAKES is having an issue (a hard drive starting to fail) or that you are accidentally running AggregatePlaintextUMLSProcessor.

I've had issues with the CPE GUI not always behaving well for me.

I suggest when you run the CPE GUI, you use File->Clear all and re-enter/re-select what you want.
If that doesn't help, verify that the contents of AggregatePlaintextUMLSProcessor haven't been changed.

If none of that helps, as a last resort, I'd look into hard drive error logs.

Also, are you using a  Cas  Consumer? if so, which one.


On Thu, Dec 14, 2017 at 12:04 PM, <Jo...@informatik.hu-berlin.de>> wrote:
If a 2kb file takes about 11 seconds, then a 2mb file is expected to take ~11*1000 seconds which is about 3 hours (under the assumption that the runtime is linear to the file size).

I do not know if the pipeline can be sped up. I would suggest to chunk the file into smaller chunks (pieces) and run the pipeline in parallel for each chunk.

Jonas S

Am 14.12.17 um 17:48 schrieb Yadav, Harish:
Hi Timothy,

Sorry for the typo, I meant ran with AE AggregatePlaintextFastUMLSProcessor with -Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours) for a single file of 2 Mb size. It runs fine for 2 Kb file.

Regards,
Harish.

-----Original Message-----
From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu<ma...@childrens.harvard.edu>]
Sent: Thursday, December 14, 2017 11:22 AM
To: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Slowness in processing files [EXTERNAL]

You missed the most important part of my message:
Do not try to use AggregatePlainTextProcessor, it is just slow.

Use AggregatePlaintextFastUMLSProcessor

Tim


On Thu, 2017-12-14 at 16:15 +0000, Yadav, Harish wrote:
Hi Timothy,

I fixed the password issues and ran with AE
AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a
lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I
have checked the memory consumption of the process and it never goes
above 4.5 G, so I am not sure if it is the memory issue. However, AE
AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but
most of our files are in Mbs so processing time for each file for more
than 2 hours is not feasible.

Could you please suggest something which may improve the performance.
Below are the logs for the process of 2 Mb file with
AggregatePlainTextProcessor:



Logs:

C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp
"C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes-
4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g
org.apache.uima.tools.cpm.CpmFrame
Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init>
WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs
at root 0x80000002. Windows
RegCreateKeyEx(...) returned error code 5.
log4j: reset attribute= "false".
log4j: Threshold ="null".
log4j: Retreiving an instance of org.apache.log4j.Logger.
log4j: Setting [ProgressAppender] additivity to [false].
log4j: Level value for ProgressAppender is  [INFO].
log4j: ProgressAppender level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%m].
log4j: Adding appender named [noEolAppender] to category
[ProgressAppender].
log4j: Retreiving an instance of org.apache.log4j.Logger.
log4j: Setting [ProgressDone] additivity to [false].
log4j: Level value for ProgressDone is  [INFO].
log4j: ProgressDone level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%m%n].
log4j: Adding appender named [eolAppender] to category [ProgressDone].
log4j: Level value for root is  [INFO].
log4j: root level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy
HH:mm:ss} %5p %c{1} - %m%n].
log4j: Adding appender named [consoleAppender] to category [root].
14 Dec 2017 09:42:09  INFO Chunker - Chunker model file:
org/apache/ctakes/chunker/models/chunker-model.zip
14 Dec 2017 09:42:10  INFO TokenizerAnnotatorPTB - Initializing
org.apache.ctakes.core.ae<http://org.apache.ctakes.core.ae>.TokenizerAnnotatorPTB
14 Dec 2017 09:42:10  INFO ContextDependentTokenizerAnnotator - Finite
state machines loaded.
14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
dictionary lookup window type:
org.apache.ctakes.typesystem.type.textspan.Sentence
14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Exclusion
tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB
VBD VBG VBN VBP VBZ WDT WP WPS WRB
14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using minimum
term text span: 3
14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
Dictionary Descriptor:
org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
14 Dec 2017 09:42:10  INFO DictionaryDescriptorParser - Parsing
dictionary specifications:
14 Dec 2017 09:42:10  INFO UmlsUserApprover - Checking UMLS Account at
https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.
nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW
14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
user harish1234:
.14 Dec 2017 09:42:11  INFO UmlsUserApprover -   UMLS Account at http
s://urldefense.proofpoint.com/v2/url?u=https-3A__uts-<http://urldefense.proofpoint.com/v2/url?u=https-3A__uts->
2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69z
y_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
user harish1234 has been validated

14 Dec 2017 09:42:11  INFO JdbcConnectionFactory - Connecting to
jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/s
no_rx_16ab/sno_rx_16ab:
14 Dec 2017 09:42:11  INFO ENGINE - open start - state not modified
..................
14 Dec 2017 09:42:17  INFO JdbcConnectionFactory -  Database connected
14 Dec 2017 09:42:17  INFO JdbcRareWordDictionary - Connected to cui
and term table CUI_TERMS
14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
table TUI with class TUI
14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
table RXNORM with class LONG
14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
table PREFTERM with class PREFTERM
14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
table SNOMEDCT_US with class LONG
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
sizes: 10 , 10
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
LEFT,RIGHT
14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
org.apache.ctakes.necontexts.status.StatusContextAnalyzer
14 Dec 2017 09:42:17  INFO StatusContextAnalyzer - initBoundaryData()
called for ContextInitializer
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
org.apache.ctakes.necontexts.status.StatusContextHitConsumer
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
type: org.apache.ctakes.typesystem.type.textspan.Sentence
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
org.apache.ctakes.typesystem.type.syntax.BaseToken
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
sizes: 7 , 7
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
LEFT,RIGHT
14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
14 Dec 2017 09:42:17  INFO NegationContextAnalyzer -
initBoundaryData() called for ContextInitializer
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
org.apache.ctakes.necontexts.negation.NegationContextHitConsumer
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
type: org.apache.ctakes.typesystem.type.textspan.Sentence
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
org.apache.ctakes.typesystem.type.syntax.BaseToken
14 Dec 2017 09:42:17  INFO SentenceDetector - Sentence detector model
file: org/apache/ctakes/core/sentdetect/sd-med-model.zip
14 Dec 2017 09:42:17  INFO POSTagger - POS tagger model file:
org/apache/ctakes/postagger/models/mayo-pos.zip
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - Loading NLM Norm
and Lvg with config file = C:\New_Drive\apache-ctakes-4.0.0-
bin\apache-ctakes-
4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl -   config file
absolute path = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cwd =
C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
4.0.0\resources\org\apache\ctakes\lvg\
14 Dec 2017 09:42:18  INFO ENGINE - open start - state not modified
14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open start
14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open end
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
14 Dec 2017 09:42:18  INFO DrugMentionAnnotator - Finite state
machines loaded.
14 Dec 2017 09:42:23  INFO ClearNLPDependencyParserAE - using Morphy
analysis? true Loading configuration.
Loading feature templates.
Loading lexica.
Loading model:
.....................................................................
...................
Loading configuration.
Loading feature templates.
Loading model:
.
Loading configuration.
Loading feature templates.
Loading lexica.
Loading model:
...
<various Loading model>
.
Loading configuration.
Loading feature templates.
Loading lexica.
Loading model:
................................
Loading model:
.............................
14 Dec 2017 09:42:32  INFO ConstituencyParser - Initializing parser...
14 Dec 2017 09:42:33  INFO SentenceDetector - Starting processing.
14 Dec 2017 09:42:34  INFO TokenizerAnnotatorPTB - process(JCas) in
org.apache.ctakes.core.ae<http://org.apache.ctakes.core.ae>.TokenizerAnnotatorPTB
14 Dec 2017 09:42:36  INFO LvgAnnotator - process(JCas)
14 Dec 2017 09:42:55  INFO ContextDependentTokenizerAnnotator -
process(JCas)
14 Dec 2017 09:42:58  INFO POSTagger - process(JCas)
14 Dec 2017 09:43:10  INFO Chunker -  process(JCas)
14 Dec 2017 09:43:46  INFO ChunkAdjuster -  process(JCas)
14 Dec 2017 09:43:47  INFO ChunkAdjuster -  process(JCas)
14 Dec 2017 09:43:48  INFO AbstractJCasTermAnnotator - Starting
processing
14 Dec 2017 09:43:54  INFO AbstractJCasTermAnnotator - Finished
processing
14 Dec 2017 09:43:54  INFO DrugMentionAnnotator - process(JCas)
14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:34  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:38  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:39  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:42  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:43  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:45  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:53  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:54  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:59  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:00  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:05  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:06  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:08  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:11  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:16  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:24  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:27  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:30  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:35  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:38  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:45  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:53  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:54  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:02  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:22  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:24  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:28  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:29  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:34  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:38  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:46  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:49  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:58  INFO DrugMentionAnnotator -
14 Dec 2017 09:48:45  INFO MaxentParserWrapper - Started processing:
idd_secondTrial.txt
14 Dec 2017 10:20:19  INFO MaxentParserWrapper - Done parsing:
idd_secondTrial.txt







Regards,
Harish.


-----Original Message-----
From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu<ma...@childrens.harvard.edu>]
Sent: Thursday, December 14, 2017 9:16 AM
To: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Slowness in processing files [EXTERNAL]

Do not try to use AggregatePlainTextProcessor, it is just slow.
Use the fast version and debug the password issues.
Make sure you have your UMLS credentials set in:
$CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_r
x_
16ab.xml

in two different places.

Tim



On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote:

Hi James,
  Thanks for responding.
  Single file is taking ~5 hours to process with
AggregatePlainTextProcessor of size 2 Mb. This is how the process
looks like for JVM arguments regarding memory:
  C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
-Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXXXX" -cp
"C:\New_Drive\apache-ctakes-4.0.0-bi
apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-
bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-
4.0.0-
bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
ctakes-
4.0.0\config\log4j.xml -Xms512M -Xmx3g
org.apache.uima.tools.cpm.CpmFrame
  Also, just now I tried to process the file with AE
  AggregatePlaintextFastUMLSProcessor but ran into different problem
of not getting authentication error with same username password
being used in AggregatePlainTextProcessor.
  I can run it with AggregatePlaintextFastUMLSProcessor by increasing
Xms 5g and Xmx5g,  if you could please let me know how can it be
possible that with one AE AggregatePlainTextProcessor it is running
fine with above username and password but giving below exception
with same username, password with
AggregatePlaintextFastUMLSProcessor.
  Exception:
    C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
-Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXX" -cp
"C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-
4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
ctakes-
4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-
ctakes-
4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g
org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM
java.util.prefs.WindowsPreferences <init> WARNING: Could not
open/create prefs root node Software\JavaSoft\Prefs at root
0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
log4j:
attributes.... 13 Dec 2017 21:04:58  INFO Chunker - Chunker model
file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec
2017
21:05:00  INFO TokenizerAnnotatorPTB - Initializing
org.apache.ctakes.core.ae<http://org.apache.ctakes.core.ae>.TokenizerAnnotatorPTB 13 Dec 2017
21:05:00
INFO ContextDependentTokenizerAnnotator - Finite state machines
loaded. 13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using
dictionary lookup window type:
org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017
21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded:
CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN
VBP VBZ WDT WP WPS WRB   13 Dec 2017 21:05:00  INFO
AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec
2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary
Descriptor:
org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing
dictionary specifications: 13 Dec 2017 21:05:00  INFO
UmlsUserApprover
- Checking UMLS Account at https://urldefense.proofpoint.com/v2/url
?u=https-3A__uts-
2Dws.nlm.nih.go&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
eFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
vCEkIgDc6DU1Nbw&s=v_ivdTVH9oojQd-0bohfzxVCl5UxJlSZ5FUfi7qnmxo&e=
v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017
21:05:02 ERROR UmlsUserApprover -   UMLS Account at https://urldefe
nse.proofpoint.com/v2/url?u=https-3A__uts-<http://nse.proofpoint.com/v2/url?u=https-3A__uts->
2Dws.nl&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=He
up-IbsIg9Q1TPOylpP9FE4GTK-
OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
vCEkIgDc6DU1Nbw&s=U8OuKgmE0YMDPABaTm39jDFIXG4tnVEeE1rrCS03cbM&e=
m.nih.gov/restful/isValidUMLSUser<http://m.nih.gov/restful/isValidUMLSUser> is not valid for user XXXXXXX-ß
with XXXXXXX
org.apache.uima.resource.ResourceInitializationException:
Initialization of CAS Processor with name
"AggregatePlaintextFastUMLSProcessor" failed.         at
org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
ti
alize(CollectionProcessingEngine_impl.java:81)         at
org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessin
gE
ngine(UIMAFramework_impl.java:420)         at
org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIM
AF
ramework.java:918)         at
org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:57
3)
         at
org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
         at
org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
by: org.apache.uima.resource.ResourceConfigurationException:
Initialization of CAS Processor with name
"AggregatePlaintextFastUMLSProcessor" failed.         at
org.apache.uima.collection.impl.cpm.container.CPEFactory.pro<http://l.cpm.container.CPEFactory.pro>duceInt
eg
ratedCasProcessor(CPEFactory.java:1101)         at
org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProc
es
sors(CPEFactory.java:547)         at
org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.ja
va
:253)         at
org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.
ja
va:127)         at
org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
ti
alize(CollectionProcessingEngine_impl.java:73)         ... 5 more
Caused by:
org.apache.uima.resource.ResourceInitializationException:
Initialization of annotator class
"org.apache.ctakes.dictionary.lookup2.ae<http://lookup2.ae>.DefaultJCasTermAnnotator"
failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
fast/desc/analysis_engine/UmlsLookupAnnotator.xml)         at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
ni
tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
         at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
ni
tialize(PrimitiveAnalysisEngine_impl.java:170)         at
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
ly
sisEngineFactory_impl.java:94)         at
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
Co
mpositeResourceFactory_impl.java:62)         at
org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
9)
         at
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
av
a:407)         at
org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.ja
va
:256)         at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
ni
tASB(AggregateAnalysisEngine_impl.java:429)         at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
ni
tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:37
3)
         at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
ni
tialize(AggregateAnalysisEngine_impl.java:186)         at
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
ly
sisEngineFactory_impl.java:94)         at
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
Co
mpositeResourceFactory_impl.java:62)         at
org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
9)
         at
org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:33
1)
         at
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
av
a:448)         at
org.apache.uima.collection.impl.cpm.container.CPEFactory.pro<http://l.cpm.container.CPEFactory.pro>duceInt
eg
ratedCasProcessor(CPEFactory.java:1085)         ... 9 more Caused
by:
org.apache.uima.resource.ResourceInitializationException: MESSAGE
LOCALIZATION FAILED: Can't find resource for bundle
java.util.PropertyResourceBundle, key C ould not construct
org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
ti
onary         at
org.apache.ctakes.dictionary.lookup2.ae<http://lookup2.ae>.AbstractJCasTermAnnotator.i
ni
tialize(AbstractJCasTermAnnotator.java:131)         at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
ni
tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
         ... 24 more Caused by:
org.apache.uima.analysis_engine.annotator.AnnotatorContextException
:
MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
java.util.PropertyResourceBu ndle, key Could not construct
org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
ti
onary         at
org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
rP
arser.parseDictionary(DictionaryDescriptorParser.java:199)
at
org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
rP
arser.parseDictionaries(DictionaryDescriptorParser.java:156)
at
org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
rP
arser.parseDescriptor(DictionaryDescriptorParser.java:128)
at
org.apache.ctakes.dictionary.lookup2.ae<http://lookup2.ae>.AbstractJCasTermAnnotator.i
ni
tialize(AbstractJCasTermAnnotator.java:129)         ... 25 more
Caused
by: java.lang.reflect.InvocationTargetException         at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
         at
sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
Source)
         at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
Source)         at
java.lang.reflect.Constructor.newInstance(Unknown
Source)         at
org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
rP
arser.parseDictionary(DictionaryDescriptorParser.java:196)
... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS
dictionary sno_rx_16abTerms         at
org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
ti
onary.<init>(UmlsJdbcRareWordDictionary.java:29)         ... 33 more
      From: James Masanz [mailto:masanz.james@gmail.com<ma...@gmail.com>]
Sent: Wednesday, December 13, 2017 8:56 PM
To: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Slowness in processing files
  Using AggregatePlaintextFastUMLSProcessor  is much faster than
AggregatePlainTextProcessor, so I suggest that to start with you
just use AggregatePlaintextFastUMLSProcessor.
  Do you mean it is taking ~5 hours for a single file to be processed
at times, or is that for a set of files?
  If your JVM heap space is not set large enough, you can get very
slow results.
Try increasing to 5G (or more) using the JVM parameter   -Xmx5G For
faster start up, you can also set the -Xms to the same or something
close to -Xmx value.
    -- James
  On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <hy...@live.unc.edu>

wrote:
Hi All,
  When the medical records are run with the AE as
AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor
the processing is very slow. It is pretty fast when the smaller
files
(~2 kb) are fed as input but when I am processing with bigger files
say, 2Mb, it is very slow and the files are taking ~5 hours to
process. Any pointer will be of great help.
  Regards,
Harish.





Re: Slowness in processing files [EXTERNAL]

Posted by James Masanz <ma...@gmail.com>.
sorry, I meant verify that the contents of  the xml file for the fast
dictionary lookup haven't changed (AggregatePlaintextFastUMLSProcessor)

On Thu, Dec 14, 2017 at 1:20 PM, James Masanz <ma...@gmail.com>
wrote:

>
> Harish,
>
> with the AggregatePlaintextFastUMLSProcessor, it should not be taking
> that long. It sounds like either something outside of cTAKES is having an
> issue (a hard drive starting to fail) or that you are accidentally running
> AggregatePlaintextUMLSProcessor.
>
> I've had issues with the CPE GUI not always behaving well for me.
>
> I suggest when you run the CPE GUI, you use File->Clear all and
> re-enter/re-select what you want.
> If that doesn't help, verify that the contents of
> AggregatePlaintextUMLSProcessor haven't been changed.
>
> If none of that helps, as a last resort, I'd look into hard drive error
> logs.
>
> Also, are you using a  Cas  Consumer? if so, which one.
>
>
> On Thu, Dec 14, 2017 at 12:04 PM, <Jo...@informatik.hu-berlin.de> wrote:
>
>> If a 2kb file takes about 11 seconds, then a 2mb file is expected to take
>> ~11*1000 seconds which is about 3 hours (under the assumption that the
>> runtime is linear to the file size).
>>
>> I do not know if the pipeline can be sped up. I would suggest to chunk
>> the file into smaller chunks (pieces) and run the pipeline in parallel for
>> each chunk.
>>
>> Jonas S
>>
>> Am 14.12.17 um 17:48 schrieb Yadav, Harish:
>>
>>> Hi Timothy,
>>>
>>> Sorry for the typo, I meant ran with AE AggregatePlaintextFastUMLSProcessor
>>> with -Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours)
>>> for a single file of 2 Mb size. It runs fine for 2 Kb file.
>>>
>>> Regards,
>>> Harish.
>>>
>>> -----Original Message-----
>>> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
>>> Sent: Thursday, December 14, 2017 11:22 AM
>>> To: user@ctakes.apache.org
>>> Subject: Re: Slowness in processing files [EXTERNAL]
>>>
>>> You missed the most important part of my message:
>>>
>>>> Do not try to use AggregatePlainTextProcessor, it is just slow.
>>>>
>>>
>>> Use AggregatePlaintextFastUMLSProcessor
>>>
>>> Tim
>>>
>>>
>>> On Thu, 2017-12-14 at 16:15 +0000, Yadav, Harish wrote:
>>>
>>>> Hi Timothy,
>>>>
>>>> I fixed the password issues and ran with AE
>>>> AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a
>>>> lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I
>>>> have checked the memory consumption of the process and it never goes
>>>> above 4.5 G, so I am not sure if it is the memory issue. However, AE
>>>> AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but
>>>> most of our files are in Mbs so processing time for each file for more
>>>> than 2 hours is not feasible.
>>>>
>>>> Could you please suggest something which may improve the performance.
>>>> Below are the logs for the process of 2 Mb file with
>>>> AggregatePlainTextProcessor:
>>>>
>>>>
>>>>
>>>> Logs:
>>>>
>>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp
>>>> "C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>>> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>>> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>>> 4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes-
>>>> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g
>>>> org.apache.uima.tools.cpm.CpmFrame
>>>> Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init>
>>>> WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs
>>>> at root 0x80000002. Windows
>>>> RegCreateKeyEx(...) returned error code 5.
>>>> log4j: reset attribute= "false".
>>>> log4j: Threshold ="null".
>>>> log4j: Retreiving an instance of org.apache.log4j.Logger.
>>>> log4j: Setting [ProgressAppender] additivity to [false].
>>>> log4j: Level value for ProgressAppender is  [INFO].
>>>> log4j: ProgressAppender level set to INFO
>>>> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>>>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>>>> log4j: Setting property [conversionPattern] to [%m].
>>>> log4j: Adding appender named [noEolAppender] to category
>>>> [ProgressAppender].
>>>> log4j: Retreiving an instance of org.apache.log4j.Logger.
>>>> log4j: Setting [ProgressDone] additivity to [false].
>>>> log4j: Level value for ProgressDone is  [INFO].
>>>> log4j: ProgressDone level set to INFO
>>>> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>>>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>>>> log4j: Setting property [conversionPattern] to [%m%n].
>>>> log4j: Adding appender named [eolAppender] to category [ProgressDone].
>>>> log4j: Level value for root is  [INFO].
>>>> log4j: root level set to INFO
>>>> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>>>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>>>> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy
>>>> HH:mm:ss} %5p %c{1} - %m%n].
>>>> log4j: Adding appender named [consoleAppender] to category [root].
>>>> 14 Dec 2017 09:42:09  INFO Chunker - Chunker model file:
>>>> org/apache/ctakes/chunker/models/chunker-model.zip
>>>> 14 Dec 2017 09:42:10  INFO TokenizerAnnotatorPTB - Initializing
>>>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
>>>> 14 Dec 2017 09:42:10  INFO ContextDependentTokenizerAnnotator - Finite
>>>> state machines loaded.
>>>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
>>>> dictionary lookup window type:
>>>> org.apache.ctakes.typesystem.type.textspan.Sentence
>>>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Exclusion
>>>> tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB
>>>> VBD VBG VBN VBP VBZ WDT WP WPS WRB
>>>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using minimum
>>>> term text span: 3
>>>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
>>>> Dictionary Descriptor:
>>>> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
>>>> 14 Dec 2017 09:42:10  INFO DictionaryDescriptorParser - Parsing
>>>> dictionary specifications:
>>>> 14 Dec 2017 09:42:10  INFO UmlsUserApprover - Checking UMLS Account at
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.
>>>> nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW
>>>> 14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
>>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
>>>> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
>>>> user harish1234:
>>>> .14 Dec 2017 09:42:11  INFO UmlsUserApprover -   UMLS Account at http
>>>> s://urldefense.proofpoint.com/v2/url?u=https-3A__uts-
>>>> 2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69z
>>>> y_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
>>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
>>>> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
>>>> user harish1234 has been validated
>>>>
>>>> 14 Dec 2017 09:42:11  INFO JdbcConnectionFactory - Connecting to
>>>> jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/s
>>>> no_rx_16ab/sno_rx_16ab:
>>>> 14 Dec 2017 09:42:11  INFO ENGINE - open start - state not modified
>>>> ..................
>>>> 14 Dec 2017 09:42:17  INFO JdbcConnectionFactory -  Database connected
>>>> 14 Dec 2017 09:42:17  INFO JdbcRareWordDictionary - Connected to cui
>>>> and term table CUI_TERMS
>>>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>>>> table TUI with class TUI
>>>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>>>> table RXNORM with class LONG
>>>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>>>> table PREFTERM with class PREFTERM
>>>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>>>> table SNOMEDCT_US with class LONG
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
>>>> sizes: 10 , 10
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
>>>> LEFT,RIGHT
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
>>>> org.apache.ctakes.necontexts.status.StatusContextAnalyzer
>>>> 14 Dec 2017 09:42:17  INFO StatusContextAnalyzer - initBoundaryData()
>>>> called for ContextInitializer
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
>>>> org.apache.ctakes.necontexts.status.StatusContextHitConsumer
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
>>>> type: org.apache.ctakes.typesystem.type.textspan.Sentence
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
>>>> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
>>>> org.apache.ctakes.typesystem.type.syntax.BaseToken
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
>>>> sizes: 7 , 7
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
>>>> LEFT,RIGHT
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
>>>> org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
>>>> 14 Dec 2017 09:42:17  INFO NegationContextAnalyzer -
>>>> initBoundaryData() called for ContextInitializer
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
>>>> org.apache.ctakes.necontexts.negation.NegationContextHitConsumer
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
>>>> type: org.apache.ctakes.typesystem.type.textspan.Sentence
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
>>>> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
>>>> org.apache.ctakes.typesystem.type.syntax.BaseToken
>>>> 14 Dec 2017 09:42:17  INFO SentenceDetector - Sentence detector model
>>>> file: org/apache/ctakes/core/sentdetect/sd-med-model.zip
>>>> 14 Dec 2017 09:42:17  INFO POSTagger - POS tagger model file:
>>>> org/apache/ctakes/postagger/models/mayo-pos.zip
>>>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - Loading NLM Norm
>>>> and Lvg with config file = C:\New_Drive\apache-ctakes-4.0.0-
>>>> bin\apache-ctakes-
>>>> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
>>>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl -   config file
>>>> absolute path = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>>> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
>>>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cwd =
>>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
>>>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
>>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>>> 4.0.0\resources\org\apache\ctakes\lvg\
>>>> 14 Dec 2017 09:42:18  INFO ENGINE - open start - state not modified
>>>> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open start
>>>> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open end
>>>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
>>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
>>>> 14 Dec 2017 09:42:18  INFO DrugMentionAnnotator - Finite state
>>>> machines loaded.
>>>> 14 Dec 2017 09:42:23  INFO ClearNLPDependencyParserAE - using Morphy
>>>> analysis? true Loading configuration.
>>>> Loading feature templates.
>>>> Loading lexica.
>>>> Loading model:
>>>> .....................................................................
>>>> ...................
>>>> Loading configuration.
>>>> Loading feature templates.
>>>> Loading model:
>>>> .
>>>> Loading configuration.
>>>> Loading feature templates.
>>>> Loading lexica.
>>>> Loading model:
>>>> ...
>>>> <various Loading model>
>>>> .
>>>> Loading configuration.
>>>> Loading feature templates.
>>>> Loading lexica.
>>>> Loading model:
>>>> ................................
>>>> Loading model:
>>>> .............................
>>>> 14 Dec 2017 09:42:32  INFO ConstituencyParser - Initializing parser...
>>>> 14 Dec 2017 09:42:33  INFO SentenceDetector - Starting processing.
>>>> 14 Dec 2017 09:42:34  INFO TokenizerAnnotatorPTB - process(JCas) in
>>>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
>>>> 14 Dec 2017 09:42:36  INFO LvgAnnotator - process(JCas)
>>>> 14 Dec 2017 09:42:55  INFO ContextDependentTokenizerAnnotator -
>>>> process(JCas)
>>>> 14 Dec 2017 09:42:58  INFO POSTagger - process(JCas)
>>>> 14 Dec 2017 09:43:10  INFO Chunker -  process(JCas)
>>>> 14 Dec 2017 09:43:46  INFO ChunkAdjuster -  process(JCas)
>>>> 14 Dec 2017 09:43:47  INFO ChunkAdjuster -  process(JCas)
>>>> 14 Dec 2017 09:43:48  INFO AbstractJCasTermAnnotator - Starting
>>>> processing
>>>> 14 Dec 2017 09:43:54  INFO AbstractJCasTermAnnotator - Finished
>>>> processing
>>>> 14 Dec 2017 09:43:54  INFO DrugMentionAnnotator - process(JCas)
>>>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:34  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:38  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:39  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:42  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:43  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:45  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:53  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:54  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:59  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:00  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:05  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:06  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:08  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:11  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:16  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:24  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:27  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:30  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:32  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:35  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:38  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:45  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:53  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:54  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:02  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:22  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:24  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:28  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:29  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:34  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:38  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:46  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:49  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:58  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:48:45  INFO MaxentParserWrapper - Started processing:
>>>> idd_secondTrial.txt
>>>> 14 Dec 2017 10:20:19  INFO MaxentParserWrapper - Done parsing:
>>>> idd_secondTrial.txt
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Harish.
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
>>>> Sent: Thursday, December 14, 2017 9:16 AM
>>>> To: user@ctakes.apache.org
>>>> Subject: Re: Slowness in processing files [EXTERNAL]
>>>>
>>>> Do not try to use AggregatePlainTextProcessor, it is just slow.
>>>> Use the fast version and debug the password issues.
>>>> Make sure you have your UMLS credentials set in:
>>>> $CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_r
>>>> x_
>>>> 16ab.xml
>>>>
>>>> in two different places.
>>>>
>>>> Tim
>>>>
>>>>
>>>>
>>>> On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote:
>>>>
>>>>>
>>>>> Hi James,
>>>>>   Thanks for responding.
>>>>>   Single file is taking ~5 hours to process with
>>>>> AggregatePlainTextProcessor of size 2 Mb. This is how the process
>>>>> looks like for JVM arguments regarding memory:
>>>>>   C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
>>>>> -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXXXX" -cp
>>>>> "C:\New_Drive\apache-ctakes-4.0.0-bi
>>>>> apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-
>>>>> bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-
>>>>> 4.0.0-
>>>>> bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
>>>>> nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
>>>>> ctakes-
>>>>> 4.0.0\config\log4j.xml -Xms512M -Xmx3g
>>>>> org.apache.uima.tools.cpm.CpmFrame
>>>>>   Also, just now I tried to process the file with AE
>>>>>   AggregatePlaintextFastUMLSProcessor but ran into different problem
>>>>> of not getting authentication error with same username password
>>>>> being used in AggregatePlainTextProcessor.
>>>>>   I can run it with AggregatePlaintextFastUMLSProcessor by increasing
>>>>> Xms 5g and Xmx5g,  if you could please let me know how can it be
>>>>> possible that with one AE AggregatePlainTextProcessor it is running
>>>>> fine with above username and password but giving below exception
>>>>> with same username, password with
>>>>> AggregatePlaintextFastUMLSProcessor.
>>>>>   Exception:
>>>>>     C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
>>>>> -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXX" -cp
>>>>> "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-
>>>>> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>>>> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
>>>>> ctakes-
>>>>> 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-
>>>>> ctakes-
>>>>> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g
>>>>> org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM
>>>>> java.util.prefs.WindowsPreferences <init> WARNING: Could not
>>>>> open/create prefs root node Software\JavaSoft\Prefs at root
>>>>> 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
>>>>> log4j:
>>>>> attributes.... 13 Dec 2017 21:04:58  INFO Chunker - Chunker model
>>>>> file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec
>>>>> 2017
>>>>> 21:05:00  INFO TokenizerAnnotatorPTB - Initializing
>>>>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017
>>>>> 21:05:00
>>>>> INFO ContextDependentTokenizerAnnotator - Finite state machines
>>>>> loaded. 13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using
>>>>> dictionary lookup window type:
>>>>> org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017
>>>>> 21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded:
>>>>> CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN
>>>>> VBP VBZ WDT WP WPS WRB   13 Dec 2017 21:05:00  INFO
>>>>> AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec
>>>>> 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary
>>>>> Descriptor:
>>>>> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
>>>>> 13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing
>>>>> dictionary specifications: 13 Dec 2017 21:05:00  INFO
>>>>> UmlsUserApprover
>>>>> - Checking UMLS Account at https://urldefense.proofpoint.com/v2/url
>>>>> ?u=https-3A__uts-
>>>>> 2Dws.nlm.nih.go&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
>>>>> eFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
>>>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
>>>>> vCEkIgDc6DU1Nbw&s=v_ivdTVH9oojQd-0bohfzxVCl5UxJlSZ5FUfi7qnmxo&e=
>>>>> v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017
>>>>> 21:05:02 ERROR UmlsUserApprover -   UMLS Account at https://urldefe
>>>>> nse.proofpoint.com/v2/url?u=https-3A__uts-
>>>>> 2Dws.nl&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=He
>>>>> up-IbsIg9Q1TPOylpP9FE4GTK-
>>>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
>>>>> vCEkIgDc6DU1Nbw&s=U8OuKgmE0YMDPABaTm39jDFIXG4tnVEeE1rrCS03cbM&e=
>>>>> m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß
>>>>> with XXXXXXX
>>>>> org.apache.uima.resource.ResourceInitializationException:
>>>>> Initialization of CAS Processor with name
>>>>> "AggregatePlaintextFastUMLSProcessor" failed.         at
>>>>> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
>>>>> ti
>>>>> alize(CollectionProcessingEngine_impl.java:81)         at
>>>>> org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessin
>>>>> gE
>>>>> ngine(UIMAFramework_impl.java:420)         at
>>>>> org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIM
>>>>> AF
>>>>> ramework.java:918)         at
>>>>> org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:57
>>>>> 3)
>>>>>          at
>>>>> org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
>>>>>          at
>>>>> org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
>>>>> by: org.apache.uima.resource.ResourceConfigurationException:
>>>>> Initialization of CAS Processor with name
>>>>> "AggregatePlaintextFastUMLSProcessor" failed.         at
>>>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
>>>>> eg
>>>>> ratedCasProcessor(CPEFactory.java:1101)         at
>>>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProc
>>>>> es
>>>>> sors(CPEFactory.java:547)         at
>>>>> org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.ja
>>>>> va
>>>>> :253)         at
>>>>> org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.
>>>>> ja
>>>>> va:127)         at
>>>>> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
>>>>> ti
>>>>> alize(CollectionProcessingEngine_impl.java:73)         ... 5 more
>>>>> Caused by:
>>>>> org.apache.uima.resource.ResourceInitializationException:
>>>>> Initialization of annotator class
>>>>> "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator"
>>>>> failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
>>>>> bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
>>>>> fast/desc/analysis_engine/UmlsLookupAnnotator.xml)         at
>>>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>>>>> ni
>>>>> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
>>>>>          at
>>>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>>>>> ni
>>>>> tialize(PrimitiveAnalysisEngine_impl.java:170)         at
>>>>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
>>>>> ly
>>>>> sisEngineFactory_impl.java:94)         at
>>>>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
>>>>> Co
>>>>> mpositeResourceFactory_impl.java:62)         at
>>>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
>>>>> 9)
>>>>>          at
>>>>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
>>>>> av
>>>>> a:407)         at
>>>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.ja
>>>>> va
>>>>> :256)         at
>>>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>>>>> ni
>>>>> tASB(AggregateAnalysisEngine_impl.java:429)         at
>>>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>>>>> ni
>>>>> tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:37
>>>>> 3)
>>>>>          at
>>>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>>>>> ni
>>>>> tialize(AggregateAnalysisEngine_impl.java:186)         at
>>>>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
>>>>> ly
>>>>> sisEngineFactory_impl.java:94)         at
>>>>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
>>>>> Co
>>>>> mpositeResourceFactory_impl.java:62)         at
>>>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
>>>>> 9)
>>>>>          at
>>>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:33
>>>>> 1)
>>>>>          at
>>>>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
>>>>> av
>>>>> a:448)         at
>>>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
>>>>> eg
>>>>> ratedCasProcessor(CPEFactory.java:1085)         ... 9 more Caused
>>>>> by:
>>>>> org.apache.uima.resource.ResourceInitializationException: MESSAGE
>>>>> LOCALIZATION FAILED: Can't find resource for bundle
>>>>> java.util.PropertyResourceBundle, key C ould not construct
>>>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>>>>> ti
>>>>> onary         at
>>>>> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
>>>>> ni
>>>>> tialize(AbstractJCasTermAnnotator.java:131)         at
>>>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>>>>> ni
>>>>> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
>>>>>          ... 24 more Caused by:
>>>>> org.apache.uima.analysis_engine.annotator.AnnotatorContextException
>>>>> :
>>>>> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
>>>>> java.util.PropertyResourceBu ndle, key Could not construct
>>>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>>>>> ti
>>>>> onary         at
>>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>>>> rP
>>>>> arser.parseDictionary(DictionaryDescriptorParser.java:199)
>>>>> at
>>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>>>> rP
>>>>> arser.parseDictionaries(DictionaryDescriptorParser.java:156)
>>>>> at
>>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>>>> rP
>>>>> arser.parseDescriptor(DictionaryDescriptorParser.java:128)
>>>>> at
>>>>> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
>>>>> ni
>>>>> tialize(AbstractJCasTermAnnotator.java:129)         ... 25 more
>>>>> Caused
>>>>> by: java.lang.reflect.InvocationTargetException         at
>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>>> Method)
>>>>>          at
>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
>>>>> Source)
>>>>>          at
>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
>>>>> Source)         at
>>>>> java.lang.reflect.Constructor.newInstance(Unknown
>>>>> Source)         at
>>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>>>> rP
>>>>> arser.parseDictionary(DictionaryDescriptorParser.java:196)
>>>>> ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS
>>>>> dictionary sno_rx_16abTerms         at
>>>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>>>>> ti
>>>>> onary.<init>(UmlsJdbcRareWordDictionary.java:29)         ... 33 more
>>>>>       From: James Masanz [mailto:masanz.james@gmail.com]
>>>>> Sent: Wednesday, December 13, 2017 8:56 PM
>>>>> To: user@ctakes.apache.org
>>>>> Subject: Re: Slowness in processing files
>>>>>   Using AggregatePlaintextFastUMLSProcessor  is much faster than
>>>>> AggregatePlainTextProcessor, so I suggest that to start with you
>>>>> just use AggregatePlaintextFastUMLSProcessor.
>>>>>   Do you mean it is taking ~5 hours for a single file to be processed
>>>>> at times, or is that for a set of files?
>>>>>   If your JVM heap space is not set large enough, you can get very
>>>>> slow results.
>>>>> Try increasing to 5G (or more) using the JVM parameter   -Xmx5G For
>>>>> faster start up, you can also set the -Xms to the same or something
>>>>> close to -Xmx value.
>>>>>     -- James
>>>>>   On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <hyadav@live.unc.edu
>>>>>
>>>>>>
>>>>>> wrote:
>>>>> Hi All,
>>>>>   When the medical records are run with the AE as
>>>>> AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor
>>>>> the processing is very slow. It is pretty fast when the smaller
>>>>> files
>>>>> (~2 kb) are fed as input but when I am processing with bigger files
>>>>> say, 2Mb, it is very slow and the files are taking ~5 hours to
>>>>> process. Any pointer will be of great help.
>>>>>   Regards,
>>>>> Harish.
>>>>>
>>>>>
>>>>
>>>
>

Re: Slowness in processing files [EXTERNAL]

Posted by James Masanz <ma...@gmail.com>.
Harish,

with the AggregatePlaintextFastUMLSProcessor, it should not be taking that
long. It sounds like either something outside of cTAKES is having an issue
(a hard drive starting to fail) or that you are accidentally running
AggregatePlaintextUMLSProcessor.

I've had issues with the CPE GUI not always behaving well for me.

I suggest when you run the CPE GUI, you use File->Clear all and
re-enter/re-select what you want.
If that doesn't help, verify that the contents of
AggregatePlaintextUMLSProcessor
haven't been changed.

If none of that helps, as a last resort, I'd look into hard drive error
logs.

Also, are you using a  Cas  Consumer? if so, which one.


On Thu, Dec 14, 2017 at 12:04 PM, <Jo...@informatik.hu-berlin.de> wrote:

> If a 2kb file takes about 11 seconds, then a 2mb file is expected to take
> ~11*1000 seconds which is about 3 hours (under the assumption that the
> runtime is linear to the file size).
>
> I do not know if the pipeline can be sped up. I would suggest to chunk the
> file into smaller chunks (pieces) and run the pipeline in parallel for each
> chunk.
>
> Jonas S
>
> Am 14.12.17 um 17:48 schrieb Yadav, Harish:
>
>> Hi Timothy,
>>
>> Sorry for the typo, I meant ran with AE AggregatePlaintextFastUMLSProcessor
>> with -Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours)
>> for a single file of 2 Mb size. It runs fine for 2 Kb file.
>>
>> Regards,
>> Harish.
>>
>> -----Original Message-----
>> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
>> Sent: Thursday, December 14, 2017 11:22 AM
>> To: user@ctakes.apache.org
>> Subject: Re: Slowness in processing files [EXTERNAL]
>>
>> You missed the most important part of my message:
>>
>>> Do not try to use AggregatePlainTextProcessor, it is just slow.
>>>
>>
>> Use AggregatePlaintextFastUMLSProcessor
>>
>> Tim
>>
>>
>> On Thu, 2017-12-14 at 16:15 +0000, Yadav, Harish wrote:
>>
>>> Hi Timothy,
>>>
>>> I fixed the password issues and ran with AE
>>> AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a
>>> lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I
>>> have checked the memory consumption of the process and it never goes
>>> above 4.5 G, so I am not sure if it is the memory issue. However, AE
>>> AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but
>>> most of our files are in Mbs so processing time for each file for more
>>> than 2 hours is not feasible.
>>>
>>> Could you please suggest something which may improve the performance.
>>> Below are the logs for the process of 2 Mb file with
>>> AggregatePlainTextProcessor:
>>>
>>>
>>>
>>> Logs:
>>>
>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp
>>> "C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>> 4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes-
>>> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g
>>> org.apache.uima.tools.cpm.CpmFrame
>>> Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init>
>>> WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs
>>> at root 0x80000002. Windows
>>> RegCreateKeyEx(...) returned error code 5.
>>> log4j: reset attribute= "false".
>>> log4j: Threshold ="null".
>>> log4j: Retreiving an instance of org.apache.log4j.Logger.
>>> log4j: Setting [ProgressAppender] additivity to [false].
>>> log4j: Level value for ProgressAppender is  [INFO].
>>> log4j: ProgressAppender level set to INFO
>>> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>>> log4j: Setting property [conversionPattern] to [%m].
>>> log4j: Adding appender named [noEolAppender] to category
>>> [ProgressAppender].
>>> log4j: Retreiving an instance of org.apache.log4j.Logger.
>>> log4j: Setting [ProgressDone] additivity to [false].
>>> log4j: Level value for ProgressDone is  [INFO].
>>> log4j: ProgressDone level set to INFO
>>> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>>> log4j: Setting property [conversionPattern] to [%m%n].
>>> log4j: Adding appender named [eolAppender] to category [ProgressDone].
>>> log4j: Level value for root is  [INFO].
>>> log4j: root level set to INFO
>>> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>>> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy
>>> HH:mm:ss} %5p %c{1} - %m%n].
>>> log4j: Adding appender named [consoleAppender] to category [root].
>>> 14 Dec 2017 09:42:09  INFO Chunker - Chunker model file:
>>> org/apache/ctakes/chunker/models/chunker-model.zip
>>> 14 Dec 2017 09:42:10  INFO TokenizerAnnotatorPTB - Initializing
>>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
>>> 14 Dec 2017 09:42:10  INFO ContextDependentTokenizerAnnotator - Finite
>>> state machines loaded.
>>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
>>> dictionary lookup window type:
>>> org.apache.ctakes.typesystem.type.textspan.Sentence
>>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Exclusion
>>> tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB
>>> VBD VBG VBN VBP VBZ WDT WP WPS WRB
>>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using minimum
>>> term text span: 3
>>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
>>> Dictionary Descriptor:
>>> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
>>> 14 Dec 2017 09:42:10  INFO DictionaryDescriptorParser - Parsing
>>> dictionary specifications:
>>> 14 Dec 2017 09:42:10  INFO UmlsUserApprover - Checking UMLS Account at
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.
>>> nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW
>>> 14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
>>> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
>>> user harish1234:
>>> .14 Dec 2017 09:42:11  INFO UmlsUserApprover -   UMLS Account at http
>>> s://urldefense.proofpoint.com/v2/url?u=https-3A__uts-
>>> 2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69z
>>> y_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
>>> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
>>> user harish1234 has been validated
>>>
>>> 14 Dec 2017 09:42:11  INFO JdbcConnectionFactory - Connecting to
>>> jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/s
>>> no_rx_16ab/sno_rx_16ab:
>>> 14 Dec 2017 09:42:11  INFO ENGINE - open start - state not modified
>>> ..................
>>> 14 Dec 2017 09:42:17  INFO JdbcConnectionFactory -  Database connected
>>> 14 Dec 2017 09:42:17  INFO JdbcRareWordDictionary - Connected to cui
>>> and term table CUI_TERMS
>>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>>> table TUI with class TUI
>>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>>> table RXNORM with class LONG
>>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>>> table PREFTERM with class PREFTERM
>>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>>> table SNOMEDCT_US with class LONG
>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
>>> sizes: 10 , 10
>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
>>> LEFT,RIGHT
>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
>>> org.apache.ctakes.necontexts.status.StatusContextAnalyzer
>>> 14 Dec 2017 09:42:17  INFO StatusContextAnalyzer - initBoundaryData()
>>> called for ContextInitializer
>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
>>> org.apache.ctakes.necontexts.status.StatusContextHitConsumer
>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
>>> type: org.apache.ctakes.typesystem.type.textspan.Sentence
>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
>>> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
>>> org.apache.ctakes.typesystem.type.syntax.BaseToken
>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
>>> sizes: 7 , 7
>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
>>> LEFT,RIGHT
>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
>>> org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
>>> 14 Dec 2017 09:42:17  INFO NegationContextAnalyzer -
>>> initBoundaryData() called for ContextInitializer
>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
>>> org.apache.ctakes.necontexts.negation.NegationContextHitConsumer
>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
>>> type: org.apache.ctakes.typesystem.type.textspan.Sentence
>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
>>> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
>>> org.apache.ctakes.typesystem.type.syntax.BaseToken
>>> 14 Dec 2017 09:42:17  INFO SentenceDetector - Sentence detector model
>>> file: org/apache/ctakes/core/sentdetect/sd-med-model.zip
>>> 14 Dec 2017 09:42:17  INFO POSTagger - POS tagger model file:
>>> org/apache/ctakes/postagger/models/mayo-pos.zip
>>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - Loading NLM Norm
>>> and Lvg with config file = C:\New_Drive\apache-ctakes-4.0.0-
>>> bin\apache-ctakes-
>>> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
>>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl -   config file
>>> absolute path = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
>>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cwd =
>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
>>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>> 4.0.0\resources\org\apache\ctakes\lvg\
>>> 14 Dec 2017 09:42:18  INFO ENGINE - open start - state not modified
>>> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open start
>>> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open end
>>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
>>> 14 Dec 2017 09:42:18  INFO DrugMentionAnnotator - Finite state
>>> machines loaded.
>>> 14 Dec 2017 09:42:23  INFO ClearNLPDependencyParserAE - using Morphy
>>> analysis? true Loading configuration.
>>> Loading feature templates.
>>> Loading lexica.
>>> Loading model:
>>> .....................................................................
>>> ...................
>>> Loading configuration.
>>> Loading feature templates.
>>> Loading model:
>>> .
>>> Loading configuration.
>>> Loading feature templates.
>>> Loading lexica.
>>> Loading model:
>>> ...
>>> <various Loading model>
>>> .
>>> Loading configuration.
>>> Loading feature templates.
>>> Loading lexica.
>>> Loading model:
>>> ................................
>>> Loading model:
>>> .............................
>>> 14 Dec 2017 09:42:32  INFO ConstituencyParser - Initializing parser...
>>> 14 Dec 2017 09:42:33  INFO SentenceDetector - Starting processing.
>>> 14 Dec 2017 09:42:34  INFO TokenizerAnnotatorPTB - process(JCas) in
>>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
>>> 14 Dec 2017 09:42:36  INFO LvgAnnotator - process(JCas)
>>> 14 Dec 2017 09:42:55  INFO ContextDependentTokenizerAnnotator -
>>> process(JCas)
>>> 14 Dec 2017 09:42:58  INFO POSTagger - process(JCas)
>>> 14 Dec 2017 09:43:10  INFO Chunker -  process(JCas)
>>> 14 Dec 2017 09:43:46  INFO ChunkAdjuster -  process(JCas)
>>> 14 Dec 2017 09:43:47  INFO ChunkAdjuster -  process(JCas)
>>> 14 Dec 2017 09:43:48  INFO AbstractJCasTermAnnotator - Starting
>>> processing
>>> 14 Dec 2017 09:43:54  INFO AbstractJCasTermAnnotator - Finished
>>> processing
>>> 14 Dec 2017 09:43:54  INFO DrugMentionAnnotator - process(JCas)
>>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:34  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:38  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:39  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:42  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:43  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:45  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:53  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:54  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:45:59  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:00  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:05  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:06  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:08  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:11  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:16  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:24  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:27  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:30  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:32  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:35  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:38  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:45  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:53  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:46:54  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:47:02  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:47:22  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:47:24  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:47:28  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:47:29  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:47:34  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:47:38  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:47:46  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:47:49  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:47:58  INFO DrugMentionAnnotator -
>>> 14 Dec 2017 09:48:45  INFO MaxentParserWrapper - Started processing:
>>> idd_secondTrial.txt
>>> 14 Dec 2017 10:20:19  INFO MaxentParserWrapper - Done parsing:
>>> idd_secondTrial.txt
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Regards,
>>> Harish.
>>>
>>>
>>> -----Original Message-----
>>> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
>>> Sent: Thursday, December 14, 2017 9:16 AM
>>> To: user@ctakes.apache.org
>>> Subject: Re: Slowness in processing files [EXTERNAL]
>>>
>>> Do not try to use AggregatePlainTextProcessor, it is just slow.
>>> Use the fast version and debug the password issues.
>>> Make sure you have your UMLS credentials set in:
>>> $CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_r
>>> x_
>>> 16ab.xml
>>>
>>> in two different places.
>>>
>>> Tim
>>>
>>>
>>>
>>> On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote:
>>>
>>>>
>>>> Hi James,
>>>>   Thanks for responding.
>>>>   Single file is taking ~5 hours to process with
>>>> AggregatePlainTextProcessor of size 2 Mb. This is how the process
>>>> looks like for JVM arguments regarding memory:
>>>>   C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
>>>> -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXXXX" -cp
>>>> "C:\New_Drive\apache-ctakes-4.0.0-bi
>>>> apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-
>>>> bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-
>>>> 4.0.0-
>>>> bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
>>>> nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
>>>> ctakes-
>>>> 4.0.0\config\log4j.xml -Xms512M -Xmx3g
>>>> org.apache.uima.tools.cpm.CpmFrame
>>>>   Also, just now I tried to process the file with AE
>>>>   AggregatePlaintextFastUMLSProcessor but ran into different problem
>>>> of not getting authentication error with same username password
>>>> being used in AggregatePlainTextProcessor.
>>>>   I can run it with AggregatePlaintextFastUMLSProcessor by increasing
>>>> Xms 5g and Xmx5g,  if you could please let me know how can it be
>>>> possible that with one AE AggregatePlainTextProcessor it is running
>>>> fine with above username and password but giving below exception
>>>> with same username, password with
>>>> AggregatePlaintextFastUMLSProcessor.
>>>>   Exception:
>>>>     C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
>>>> -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXX" -cp
>>>> "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-
>>>> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>>> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
>>>> ctakes-
>>>> 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-
>>>> ctakes-
>>>> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g
>>>> org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM
>>>> java.util.prefs.WindowsPreferences <init> WARNING: Could not
>>>> open/create prefs root node Software\JavaSoft\Prefs at root
>>>> 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
>>>> log4j:
>>>> attributes.... 13 Dec 2017 21:04:58  INFO Chunker - Chunker model
>>>> file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec
>>>> 2017
>>>> 21:05:00  INFO TokenizerAnnotatorPTB - Initializing
>>>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017
>>>> 21:05:00
>>>> INFO ContextDependentTokenizerAnnotator - Finite state machines
>>>> loaded. 13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using
>>>> dictionary lookup window type:
>>>> org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017
>>>> 21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded:
>>>> CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN
>>>> VBP VBZ WDT WP WPS WRB   13 Dec 2017 21:05:00  INFO
>>>> AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec
>>>> 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary
>>>> Descriptor:
>>>> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
>>>> 13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing
>>>> dictionary specifications: 13 Dec 2017 21:05:00  INFO
>>>> UmlsUserApprover
>>>> - Checking UMLS Account at https://urldefense.proofpoint.com/v2/url
>>>> ?u=https-3A__uts-
>>>> 2Dws.nlm.nih.go&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
>>>> eFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
>>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
>>>> vCEkIgDc6DU1Nbw&s=v_ivdTVH9oojQd-0bohfzxVCl5UxJlSZ5FUfi7qnmxo&e=
>>>> v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017
>>>> 21:05:02 ERROR UmlsUserApprover -   UMLS Account at https://urldefe
>>>> nse.proofpoint.com/v2/url?u=https-3A__uts-
>>>> 2Dws.nl&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=He
>>>> up-IbsIg9Q1TPOylpP9FE4GTK-
>>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
>>>> vCEkIgDc6DU1Nbw&s=U8OuKgmE0YMDPABaTm39jDFIXG4tnVEeE1rrCS03cbM&e=
>>>> m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß
>>>> with XXXXXXX
>>>> org.apache.uima.resource.ResourceInitializationException:
>>>> Initialization of CAS Processor with name
>>>> "AggregatePlaintextFastUMLSProcessor" failed.         at
>>>> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
>>>> ti
>>>> alize(CollectionProcessingEngine_impl.java:81)         at
>>>> org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessin
>>>> gE
>>>> ngine(UIMAFramework_impl.java:420)         at
>>>> org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIM
>>>> AF
>>>> ramework.java:918)         at
>>>> org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:57
>>>> 3)
>>>>          at
>>>> org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
>>>>          at
>>>> org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
>>>> by: org.apache.uima.resource.ResourceConfigurationException:
>>>> Initialization of CAS Processor with name
>>>> "AggregatePlaintextFastUMLSProcessor" failed.         at
>>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
>>>> eg
>>>> ratedCasProcessor(CPEFactory.java:1101)         at
>>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProc
>>>> es
>>>> sors(CPEFactory.java:547)         at
>>>> org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.ja
>>>> va
>>>> :253)         at
>>>> org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.
>>>> ja
>>>> va:127)         at
>>>> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
>>>> ti
>>>> alize(CollectionProcessingEngine_impl.java:73)         ... 5 more
>>>> Caused by:
>>>> org.apache.uima.resource.ResourceInitializationException:
>>>> Initialization of annotator class
>>>> "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator"
>>>> failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
>>>> bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
>>>> fast/desc/analysis_engine/UmlsLookupAnnotator.xml)         at
>>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>>>> ni
>>>> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
>>>>          at
>>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>>>> ni
>>>> tialize(PrimitiveAnalysisEngine_impl.java:170)         at
>>>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
>>>> ly
>>>> sisEngineFactory_impl.java:94)         at
>>>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
>>>> Co
>>>> mpositeResourceFactory_impl.java:62)         at
>>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
>>>> 9)
>>>>          at
>>>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
>>>> av
>>>> a:407)         at
>>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.ja
>>>> va
>>>> :256)         at
>>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>>>> ni
>>>> tASB(AggregateAnalysisEngine_impl.java:429)         at
>>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>>>> ni
>>>> tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:37
>>>> 3)
>>>>          at
>>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>>>> ni
>>>> tialize(AggregateAnalysisEngine_impl.java:186)         at
>>>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
>>>> ly
>>>> sisEngineFactory_impl.java:94)         at
>>>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
>>>> Co
>>>> mpositeResourceFactory_impl.java:62)         at
>>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
>>>> 9)
>>>>          at
>>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:33
>>>> 1)
>>>>          at
>>>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
>>>> av
>>>> a:448)         at
>>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
>>>> eg
>>>> ratedCasProcessor(CPEFactory.java:1085)         ... 9 more Caused
>>>> by:
>>>> org.apache.uima.resource.ResourceInitializationException: MESSAGE
>>>> LOCALIZATION FAILED: Can't find resource for bundle
>>>> java.util.PropertyResourceBundle, key C ould not construct
>>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>>>> ti
>>>> onary         at
>>>> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
>>>> ni
>>>> tialize(AbstractJCasTermAnnotator.java:131)         at
>>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>>>> ni
>>>> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
>>>>          ... 24 more Caused by:
>>>> org.apache.uima.analysis_engine.annotator.AnnotatorContextException
>>>> :
>>>> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
>>>> java.util.PropertyResourceBu ndle, key Could not construct
>>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>>>> ti
>>>> onary         at
>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>>> rP
>>>> arser.parseDictionary(DictionaryDescriptorParser.java:199)
>>>> at
>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>>> rP
>>>> arser.parseDictionaries(DictionaryDescriptorParser.java:156)
>>>> at
>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>>> rP
>>>> arser.parseDescriptor(DictionaryDescriptorParser.java:128)
>>>> at
>>>> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
>>>> ni
>>>> tialize(AbstractJCasTermAnnotator.java:129)         ... 25 more
>>>> Caused
>>>> by: java.lang.reflect.InvocationTargetException         at
>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>> Method)
>>>>          at
>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
>>>> Source)
>>>>          at
>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
>>>> Source)         at
>>>> java.lang.reflect.Constructor.newInstance(Unknown
>>>> Source)         at
>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>>> rP
>>>> arser.parseDictionary(DictionaryDescriptorParser.java:196)
>>>> ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS
>>>> dictionary sno_rx_16abTerms         at
>>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>>>> ti
>>>> onary.<init>(UmlsJdbcRareWordDictionary.java:29)         ... 33 more
>>>>       From: James Masanz [mailto:masanz.james@gmail.com]
>>>> Sent: Wednesday, December 13, 2017 8:56 PM
>>>> To: user@ctakes.apache.org
>>>> Subject: Re: Slowness in processing files
>>>>   Using AggregatePlaintextFastUMLSProcessor  is much faster than
>>>> AggregatePlainTextProcessor, so I suggest that to start with you
>>>> just use AggregatePlaintextFastUMLSProcessor.
>>>>   Do you mean it is taking ~5 hours for a single file to be processed
>>>> at times, or is that for a set of files?
>>>>   If your JVM heap space is not set large enough, you can get very
>>>> slow results.
>>>> Try increasing to 5G (or more) using the JVM parameter   -Xmx5G For
>>>> faster start up, you can also set the -Xms to the same or something
>>>> close to -Xmx value.
>>>>     -- James
>>>>   On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <hyadav@live.unc.edu
>>>>
>>>>>
>>>>> wrote:
>>>> Hi All,
>>>>   When the medical records are run with the AE as
>>>> AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor
>>>> the processing is very slow. It is pretty fast when the smaller
>>>> files
>>>> (~2 kb) are fed as input but when I am processing with bigger files
>>>> say, 2Mb, it is very slow and the files are taking ~5 hours to
>>>> process. Any pointer will be of great help.
>>>>   Regards,
>>>> Harish.
>>>>
>>>>
>>>
>>

Re: Slowness in processing files [EXTERNAL]

Posted by Jo...@informatik.hu-berlin.de.
If a 2kb file takes about 11 seconds, then a 2mb file is expected to 
take ~11*1000 seconds which is about 3 hours (under the assumption that 
the runtime is linear to the file size).

I do not know if the pipeline can be sped up. I would suggest to chunk 
the file into smaller chunks (pieces) and run the pipeline in parallel 
for each chunk.

Jonas S

Am 14.12.17 um 17:48 schrieb Yadav, Harish:
> Hi Timothy,
> 
> Sorry for the typo, I meant ran with AE AggregatePlaintextFastUMLSProcessor with -Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours) for a single file of 2 Mb size. It runs fine for 2 Kb file.
> 
> Regards,
> Harish.
> 
> -----Original Message-----
> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
> Sent: Thursday, December 14, 2017 11:22 AM
> To: user@ctakes.apache.org
> Subject: Re: Slowness in processing files [EXTERNAL]
> 
> You missed the most important part of my message:
>> Do not try to use AggregatePlainTextProcessor, it is just slow.
> 
> Use AggregatePlaintextFastUMLSProcessor
> 
> Tim
> 
> 
> On Thu, 2017-12-14 at 16:15 +0000, Yadav, Harish wrote:
>> Hi Timothy,
>>
>> I fixed the password issues and ran with AE
>> AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a
>> lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I
>> have checked the memory consumption of the process and it never goes
>> above 4.5 G, so I am not sure if it is the memory issue. However, AE
>> AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but
>> most of our files are in Mbs so processing time for each file for more
>> than 2 hours is not feasible.
>>
>> Could you please suggest something which may improve the performance.
>> Below are the logs for the process of 2 Mb file with
>> AggregatePlainTextProcessor:
>>
>>
>>
>> Logs:
>>
>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp
>> "C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>> 4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes-
>> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g
>> org.apache.uima.tools.cpm.CpmFrame
>> Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init>
>> WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs
>> at root 0x80000002. Windows
>> RegCreateKeyEx(...) returned error code 5.
>> log4j: reset attribute= "false".
>> log4j: Threshold ="null".
>> log4j: Retreiving an instance of org.apache.log4j.Logger.
>> log4j: Setting [ProgressAppender] additivity to [false].
>> log4j: Level value for ProgressAppender is  [INFO].
>> log4j: ProgressAppender level set to INFO
>> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>> log4j: Setting property [conversionPattern] to [%m].
>> log4j: Adding appender named [noEolAppender] to category
>> [ProgressAppender].
>> log4j: Retreiving an instance of org.apache.log4j.Logger.
>> log4j: Setting [ProgressDone] additivity to [false].
>> log4j: Level value for ProgressDone is  [INFO].
>> log4j: ProgressDone level set to INFO
>> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>> log4j: Setting property [conversionPattern] to [%m%n].
>> log4j: Adding appender named [eolAppender] to category [ProgressDone].
>> log4j: Level value for root is  [INFO].
>> log4j: root level set to INFO
>> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy
>> HH:mm:ss} %5p %c{1} - %m%n].
>> log4j: Adding appender named [consoleAppender] to category [root].
>> 14 Dec 2017 09:42:09  INFO Chunker - Chunker model file:
>> org/apache/ctakes/chunker/models/chunker-model.zip
>> 14 Dec 2017 09:42:10  INFO TokenizerAnnotatorPTB - Initializing
>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
>> 14 Dec 2017 09:42:10  INFO ContextDependentTokenizerAnnotator - Finite
>> state machines loaded.
>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
>> dictionary lookup window type:
>> org.apache.ctakes.typesystem.type.textspan.Sentence
>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Exclusion
>> tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB
>> VBD VBG VBN VBP VBZ WDT WP WPS WRB
>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using minimum
>> term text span: 3
>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
>> Dictionary Descriptor:
>> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
>> 14 Dec 2017 09:42:10  INFO DictionaryDescriptorParser - Parsing
>> dictionary specifications:
>> 14 Dec 2017 09:42:10  INFO UmlsUserApprover - Checking UMLS Account at
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.
>> nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW
>> 14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
>> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
>> user harish1234:
>> .14 Dec 2017 09:42:11  INFO UmlsUserApprover -   UMLS Account at http
>> s://urldefense.proofpoint.com/v2/url?u=https-3A__uts-
>> 2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69z
>> y_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
>> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
>> user harish1234 has been validated
>>
>> 14 Dec 2017 09:42:11  INFO JdbcConnectionFactory - Connecting to
>> jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/s
>> no_rx_16ab/sno_rx_16ab:
>> 14 Dec 2017 09:42:11  INFO ENGINE - open start - state not modified
>> ..................
>> 14 Dec 2017 09:42:17  INFO JdbcConnectionFactory -  Database connected
>> 14 Dec 2017 09:42:17  INFO JdbcRareWordDictionary - Connected to cui
>> and term table CUI_TERMS
>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>> table TUI with class TUI
>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>> table RXNORM with class LONG
>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>> table PREFTERM with class PREFTERM
>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>> table SNOMEDCT_US with class LONG
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
>> sizes: 10 , 10
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
>> LEFT,RIGHT
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
>> org.apache.ctakes.necontexts.status.StatusContextAnalyzer
>> 14 Dec 2017 09:42:17  INFO StatusContextAnalyzer - initBoundaryData()
>> called for ContextInitializer
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
>> org.apache.ctakes.necontexts.status.StatusContextHitConsumer
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
>> type: org.apache.ctakes.typesystem.type.textspan.Sentence
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
>> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
>> org.apache.ctakes.typesystem.type.syntax.BaseToken
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
>> sizes: 7 , 7
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
>> LEFT,RIGHT
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
>> org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
>> 14 Dec 2017 09:42:17  INFO NegationContextAnalyzer -
>> initBoundaryData() called for ContextInitializer
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
>> org.apache.ctakes.necontexts.negation.NegationContextHitConsumer
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
>> type: org.apache.ctakes.typesystem.type.textspan.Sentence
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
>> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
>> org.apache.ctakes.typesystem.type.syntax.BaseToken
>> 14 Dec 2017 09:42:17  INFO SentenceDetector - Sentence detector model
>> file: org/apache/ctakes/core/sentdetect/sd-med-model.zip
>> 14 Dec 2017 09:42:17  INFO POSTagger - POS tagger model file:
>> org/apache/ctakes/postagger/models/mayo-pos.zip
>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - Loading NLM Norm
>> and Lvg with config file = C:\New_Drive\apache-ctakes-4.0.0-
>> bin\apache-ctakes-
>> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl -   config file
>> absolute path = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cwd =
>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>> 4.0.0\resources\org\apache\ctakes\lvg\
>> 14 Dec 2017 09:42:18  INFO ENGINE - open start - state not modified
>> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open start
>> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open end
>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
>> 14 Dec 2017 09:42:18  INFO DrugMentionAnnotator - Finite state
>> machines loaded.
>> 14 Dec 2017 09:42:23  INFO ClearNLPDependencyParserAE - using Morphy
>> analysis? true Loading configuration.
>> Loading feature templates.
>> Loading lexica.
>> Loading model:
>> .....................................................................
>> ...................
>> Loading configuration.
>> Loading feature templates.
>> Loading model:
>> .
>> Loading configuration.
>> Loading feature templates.
>> Loading lexica.
>> Loading model:
>> ...
>> <various Loading model>
>> .
>> Loading configuration.
>> Loading feature templates.
>> Loading lexica.
>> Loading model:
>> ................................
>> Loading model:
>> .............................
>> 14 Dec 2017 09:42:32  INFO ConstituencyParser - Initializing parser...
>> 14 Dec 2017 09:42:33  INFO SentenceDetector - Starting processing.
>> 14 Dec 2017 09:42:34  INFO TokenizerAnnotatorPTB - process(JCas) in
>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
>> 14 Dec 2017 09:42:36  INFO LvgAnnotator - process(JCas)
>> 14 Dec 2017 09:42:55  INFO ContextDependentTokenizerAnnotator -
>> process(JCas)
>> 14 Dec 2017 09:42:58  INFO POSTagger - process(JCas)
>> 14 Dec 2017 09:43:10  INFO Chunker -  process(JCas)
>> 14 Dec 2017 09:43:46  INFO ChunkAdjuster -  process(JCas)
>> 14 Dec 2017 09:43:47  INFO ChunkAdjuster -  process(JCas)
>> 14 Dec 2017 09:43:48  INFO AbstractJCasTermAnnotator - Starting
>> processing
>> 14 Dec 2017 09:43:54  INFO AbstractJCasTermAnnotator - Finished
>> processing
>> 14 Dec 2017 09:43:54  INFO DrugMentionAnnotator - process(JCas)
>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:34  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:38  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:39  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:42  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:43  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:45  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:53  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:54  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:45:59  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:00  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:05  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:06  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:08  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:11  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:16  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:24  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:27  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:30  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:32  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:35  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:38  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:45  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:53  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:46:54  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:02  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:22  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:24  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:28  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:29  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:34  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:38  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:46  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:49  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:47:58  INFO DrugMentionAnnotator -
>> 14 Dec 2017 09:48:45  INFO MaxentParserWrapper - Started processing:
>> idd_secondTrial.txt
>> 14 Dec 2017 10:20:19  INFO MaxentParserWrapper - Done parsing:
>> idd_secondTrial.txt
>>
>>
>>
>>
>>
>>
>>
>> Regards,
>> Harish.
>>
>>
>> -----Original Message-----
>> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
>> Sent: Thursday, December 14, 2017 9:16 AM
>> To: user@ctakes.apache.org
>> Subject: Re: Slowness in processing files [EXTERNAL]
>>
>> Do not try to use AggregatePlainTextProcessor, it is just slow.
>> Use the fast version and debug the password issues.
>> Make sure you have your UMLS credentials set in:
>> $CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_r
>> x_
>> 16ab.xml
>>
>> in two different places.
>>
>> Tim
>>
>>
>>
>> On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote:
>>>
>>> Hi James,
>>>   
>>> Thanks for responding.
>>>   
>>> Single file is taking ~5 hours to process with
>>> AggregatePlainTextProcessor of size 2 Mb. This is how the process
>>> looks like for JVM arguments regarding memory:
>>>   
>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
>>> -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXXXX" -cp
>>> "C:\New_Drive\apache-ctakes-4.0.0-bi
>>> apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-
>>> bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-
>>> 4.0.0-
>>> bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
>>> nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
>>> ctakes-
>>> 4.0.0\config\log4j.xml -Xms512M -Xmx3g
>>> org.apache.uima.tools.cpm.CpmFrame
>>>   
>>> Also, just now I tried to process the file with AE
>>>   AggregatePlaintextFastUMLSProcessor but ran into different problem
>>> of not getting authentication error with same username password
>>> being used in AggregatePlainTextProcessor.
>>>   
>>> I can run it with AggregatePlaintextFastUMLSProcessor by increasing
>>> Xms 5g and Xmx5g,  if you could please let me know how can it be
>>> possible that with one AE AggregatePlainTextProcessor it is running
>>> fine with above username and password but giving below exception
>>> with same username, password with
>>> AggregatePlaintextFastUMLSProcessor.
>>>   
>>> Exception:
>>>   
>>>   C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
>>> -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXX" -cp
>>> "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-
>>> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
>>> ctakes-
>>> 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-
>>> ctakes-
>>> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g
>>> org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM
>>> java.util.prefs.WindowsPreferences <init> WARNING: Could not
>>> open/create prefs root node Software\JavaSoft\Prefs at root
>>> 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
>>> log4j:
>>> attributes.... 13 Dec 2017 21:04:58  INFO Chunker - Chunker model
>>> file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec
>>> 2017
>>> 21:05:00  INFO TokenizerAnnotatorPTB - Initializing
>>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017
>>> 21:05:00
>>> INFO ContextDependentTokenizerAnnotator - Finite state machines
>>> loaded. 13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using
>>> dictionary lookup window type:
>>> org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017
>>> 21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded:
>>> CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN
>>> VBP VBZ WDT WP WPS WRB   13 Dec 2017 21:05:00  INFO
>>> AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec
>>> 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary
>>> Descriptor:
>>> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
>>> 13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing
>>> dictionary specifications: 13 Dec 2017 21:05:00  INFO
>>> UmlsUserApprover
>>> - Checking UMLS Account at https://urldefense.proofpoint.com/v2/url
>>> ?u=https-3A__uts-
>>> 2Dws.nlm.nih.go&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
>>> eFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
>>> vCEkIgDc6DU1Nbw&s=v_ivdTVH9oojQd-0bohfzxVCl5UxJlSZ5FUfi7qnmxo&e=
>>> v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017
>>> 21:05:02 ERROR UmlsUserApprover -   UMLS Account at https://urldefe
>>> nse.proofpoint.com/v2/url?u=https-3A__uts-
>>> 2Dws.nl&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=He
>>> up-IbsIg9Q1TPOylpP9FE4GTK-
>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
>>> vCEkIgDc6DU1Nbw&s=U8OuKgmE0YMDPABaTm39jDFIXG4tnVEeE1rrCS03cbM&e=
>>> m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß
>>> with XXXXXXX
>>> org.apache.uima.resource.ResourceInitializationException:
>>> Initialization of CAS Processor with name
>>> "AggregatePlaintextFastUMLSProcessor" failed.         at
>>> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
>>> ti
>>> alize(CollectionProcessingEngine_impl.java:81)         at
>>> org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessin
>>> gE
>>> ngine(UIMAFramework_impl.java:420)         at
>>> org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIM
>>> AF
>>> ramework.java:918)         at
>>> org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:57
>>> 3)
>>>          at
>>> org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
>>>          at
>>> org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
>>> by: org.apache.uima.resource.ResourceConfigurationException:
>>> Initialization of CAS Processor with name
>>> "AggregatePlaintextFastUMLSProcessor" failed.         at
>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
>>> eg
>>> ratedCasProcessor(CPEFactory.java:1101)         at
>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProc
>>> es
>>> sors(CPEFactory.java:547)         at
>>> org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.ja
>>> va
>>> :253)         at
>>> org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.
>>> ja
>>> va:127)         at
>>> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
>>> ti
>>> alize(CollectionProcessingEngine_impl.java:73)         ... 5 more
>>> Caused by:
>>> org.apache.uima.resource.ResourceInitializationException:
>>> Initialization of annotator class
>>> "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator"
>>> failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
>>> bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
>>> fast/desc/analysis_engine/UmlsLookupAnnotator.xml)         at
>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>>> ni
>>> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
>>>          at
>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>>> ni
>>> tialize(PrimitiveAnalysisEngine_impl.java:170)         at
>>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
>>> ly
>>> sisEngineFactory_impl.java:94)         at
>>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
>>> Co
>>> mpositeResourceFactory_impl.java:62)         at
>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
>>> 9)
>>>          at
>>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
>>> av
>>> a:407)         at
>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.ja
>>> va
>>> :256)         at
>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>>> ni
>>> tASB(AggregateAnalysisEngine_impl.java:429)         at
>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>>> ni
>>> tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:37
>>> 3)
>>>          at
>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>>> ni
>>> tialize(AggregateAnalysisEngine_impl.java:186)         at
>>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
>>> ly
>>> sisEngineFactory_impl.java:94)         at
>>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
>>> Co
>>> mpositeResourceFactory_impl.java:62)         at
>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
>>> 9)
>>>          at
>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:33
>>> 1)
>>>          at
>>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
>>> av
>>> a:448)         at
>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
>>> eg
>>> ratedCasProcessor(CPEFactory.java:1085)         ... 9 more Caused
>>> by:
>>> org.apache.uima.resource.ResourceInitializationException: MESSAGE
>>> LOCALIZATION FAILED: Can't find resource for bundle
>>> java.util.PropertyResourceBundle, key C ould not construct
>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>>> ti
>>> onary         at
>>> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
>>> ni
>>> tialize(AbstractJCasTermAnnotator.java:131)         at
>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>>> ni
>>> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
>>>          ... 24 more Caused by:
>>> org.apache.uima.analysis_engine.annotator.AnnotatorContextException
>>> :
>>> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
>>> java.util.PropertyResourceBu ndle, key Could not construct
>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>>> ti
>>> onary         at
>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>> rP
>>> arser.parseDictionary(DictionaryDescriptorParser.java:199)
>>> at
>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>> rP
>>> arser.parseDictionaries(DictionaryDescriptorParser.java:156)
>>> at
>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>> rP
>>> arser.parseDescriptor(DictionaryDescriptorParser.java:128)
>>> at
>>> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
>>> ni
>>> tialize(AbstractJCasTermAnnotator.java:129)         ... 25 more
>>> Caused
>>> by: java.lang.reflect.InvocationTargetException         at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Method)
>>>          at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
>>> Source)
>>>          at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
>>> Source)         at
>>> java.lang.reflect.Constructor.newInstance(Unknown
>>> Source)         at
>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>> rP
>>> arser.parseDictionary(DictionaryDescriptorParser.java:196)
>>> ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS
>>> dictionary sno_rx_16abTerms         at
>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>>> ti
>>> onary.<init>(UmlsJdbcRareWordDictionary.java:29)         ... 33 more
>>>   
>>>   
>>>   
>>> From: James Masanz [mailto:masanz.james@gmail.com]
>>> Sent: Wednesday, December 13, 2017 8:56 PM
>>> To: user@ctakes.apache.org
>>> Subject: Re: Slowness in processing files
>>>   
>>> Using AggregatePlaintextFastUMLSProcessor  is much faster than
>>> AggregatePlainTextProcessor, so I suggest that to start with you
>>> just use AggregatePlaintextFastUMLSProcessor.
>>>   
>>> Do you mean it is taking ~5 hours for a single file to be processed
>>> at times, or is that for a set of files?
>>>   
>>> If your JVM heap space is not set large enough, you can get very
>>> slow results.
>>> Try increasing to 5G (or more) using the JVM parameter   -Xmx5G For
>>> faster start up, you can also set the -Xms to the same or something
>>> close to -Xmx value.
>>>   
>>>   -- James
>>>   
>>> On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <hyadav@live.unc.edu
>>>>
>>> wrote:
>>> Hi All,
>>>   
>>> When the medical records are run with the AE as
>>> AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor
>>> the processing is very slow. It is pretty fast when the smaller
>>> files
>>> (~2 kb) are fed as input but when I am processing with bigger files
>>> say, 2Mb, it is very slow and the files are taking ~5 hours to
>>> process. Any pointer will be of great help.
>>>   
>>> Regards,
>>> Harish.
>>>   
> 

RE: Slowness in processing files [EXTERNAL]

Posted by "Yadav, Harish" <hy...@live.unc.edu>.
Hi Timothy,

Sorry for the typo, I meant ran with AE AggregatePlaintextFastUMLSProcessor with -Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours) for a single file of 2 Mb size. It runs fine for 2 Kb file.

Regards,
Harish.

-----Original Message-----
From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu] 
Sent: Thursday, December 14, 2017 11:22 AM
To: user@ctakes.apache.org
Subject: Re: Slowness in processing files [EXTERNAL]

You missed the most important part of my message:
> Do not try to use AggregatePlainTextProcessor, it is just slow.

Use AggregatePlaintextFastUMLSProcessor

Tim


On Thu, 2017-12-14 at 16:15 +0000, Yadav, Harish wrote:
> Hi Timothy,
> 
> I fixed the password issues and ran with AE 
> AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a 
> lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I 
> have checked the memory consumption of the process and it never goes 
> above 4.5 G, so I am not sure if it is the memory issue. However, AE 
> AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but 
> most of our files are in Mbs so processing time for each file for more 
> than 2 hours is not feasible.
> 
> Could you please suggest something which may improve the performance.
> Below are the logs for the process of 2 Mb file with
> AggregatePlainTextProcessor:
> 
> 
> 
> Logs:
> 
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp
> "C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes-
> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g 
> org.apache.uima.tools.cpm.CpmFrame
> Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init>
> WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs 
> at root 0x80000002. Windows
> RegCreateKeyEx(...) returned error code 5.
> log4j: reset attribute= "false".
> log4j: Threshold ="null".
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressAppender] additivity to [false].
> log4j: Level value for ProgressAppender is  [INFO].
> log4j: ProgressAppender level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m].
> log4j: Adding appender named [noEolAppender] to category 
> [ProgressAppender].
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressDone] additivity to [false].
> log4j: Level value for ProgressDone is  [INFO].
> log4j: ProgressDone level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m%n].
> log4j: Adding appender named [eolAppender] to category [ProgressDone].
> log4j: Level value for root is  [INFO].
> log4j: root level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy 
> HH:mm:ss} %5p %c{1} - %m%n].
> log4j: Adding appender named [consoleAppender] to category [root].
> 14 Dec 2017 09:42:09  INFO Chunker - Chunker model file:
> org/apache/ctakes/chunker/models/chunker-model.zip
> 14 Dec 2017 09:42:10  INFO TokenizerAnnotatorPTB - Initializing 
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> 14 Dec 2017 09:42:10  INFO ContextDependentTokenizerAnnotator - Finite 
> state machines loaded.
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using 
> dictionary lookup window type:
> org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Exclusion 
> tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB 
> VBD VBG VBN VBP VBZ WDT WP WPS WRB
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using minimum 
> term text span: 3
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using 
> Dictionary Descriptor:
> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> 14 Dec 2017 09:42:10  INFO DictionaryDescriptorParser - Parsing 
> dictionary specifications:
> 14 Dec 2017 09:42:10  INFO UmlsUserApprover - Checking UMLS Account at 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.
> nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW
> 14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for 
> user harish1234:
> .14 Dec 2017 09:42:11  INFO UmlsUserApprover -   UMLS Account at http
> s://urldefense.proofpoint.com/v2/url?u=https-3A__uts-
> 2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69z
> y_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for 
> user harish1234 has been validated
> 
> 14 Dec 2017 09:42:11  INFO JdbcConnectionFactory - Connecting to 
> jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/s
> no_rx_16ab/sno_rx_16ab:
> 14 Dec 2017 09:42:11  INFO ENGINE - open start - state not modified 
> ..................
> 14 Dec 2017 09:42:17  INFO JdbcConnectionFactory -  Database connected
> 14 Dec 2017 09:42:17  INFO JdbcRareWordDictionary - Connected to cui 
> and term table CUI_TERMS
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept 
> table TUI with class TUI
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept 
> table RXNORM with class LONG
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept 
> table PREFTERM with class PREFTERM
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept 
> table SNOMEDCT_US with class LONG
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope 
> sizes: 10 , 10
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
> LEFT,RIGHT
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
> org.apache.ctakes.necontexts.status.StatusContextAnalyzer
> 14 Dec 2017 09:42:17  INFO StatusContextAnalyzer - initBoundaryData() 
> called for ContextInitializer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
> org.apache.ctakes.necontexts.status.StatusContextHitConsumer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
> type: org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
> org.apache.ctakes.typesystem.type.syntax.BaseToken
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope 
> sizes: 7 , 7
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
> LEFT,RIGHT
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
> org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
> 14 Dec 2017 09:42:17  INFO NegationContextAnalyzer -
> initBoundaryData() called for ContextInitializer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
> org.apache.ctakes.necontexts.negation.NegationContextHitConsumer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
> type: org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
> org.apache.ctakes.typesystem.type.syntax.BaseToken
> 14 Dec 2017 09:42:17  INFO SentenceDetector - Sentence detector model
> file: org/apache/ctakes/core/sentdetect/sd-med-model.zip
> 14 Dec 2017 09:42:17  INFO POSTagger - POS tagger model file:
> org/apache/ctakes/postagger/models/mayo-pos.zip
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - Loading NLM Norm 
> and Lvg with config file = C:\New_Drive\apache-ctakes-4.0.0-
> bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl -   config file 
> absolute path = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cwd =
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\
> 14 Dec 2017 09:42:18  INFO ENGINE - open start - state not modified
> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open start
> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open end
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
> 14 Dec 2017 09:42:18  INFO DrugMentionAnnotator - Finite state 
> machines loaded.
> 14 Dec 2017 09:42:23  INFO ClearNLPDependencyParserAE - using Morphy 
> analysis? true Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> .....................................................................
> ...................
> Loading configuration.
> Loading feature templates.
> Loading model:
> .
> Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> ...
> <various Loading model>
> .
> Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> ................................
> Loading model:
> .............................
> 14 Dec 2017 09:42:32  INFO ConstituencyParser - Initializing parser...
> 14 Dec 2017 09:42:33  INFO SentenceDetector - Starting processing.
> 14 Dec 2017 09:42:34  INFO TokenizerAnnotatorPTB - process(JCas) in 
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> 14 Dec 2017 09:42:36  INFO LvgAnnotator - process(JCas)
> 14 Dec 2017 09:42:55  INFO ContextDependentTokenizerAnnotator -
> process(JCas)
> 14 Dec 2017 09:42:58  INFO POSTagger - process(JCas)
> 14 Dec 2017 09:43:10  INFO Chunker -  process(JCas)
> 14 Dec 2017 09:43:46  INFO ChunkAdjuster -  process(JCas)
> 14 Dec 2017 09:43:47  INFO ChunkAdjuster -  process(JCas)
> 14 Dec 2017 09:43:48  INFO AbstractJCasTermAnnotator - Starting 
> processing
> 14 Dec 2017 09:43:54  INFO AbstractJCasTermAnnotator - Finished 
> processing
> 14 Dec 2017 09:43:54  INFO DrugMentionAnnotator - process(JCas)
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:34  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:38  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:39  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:42  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:43  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:45  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:53  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:59  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:00  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:05  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:06  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:08  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:11  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:16  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:24  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:27  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:30  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:35  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:38  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:45  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:53  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:02  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:22  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:24  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:28  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:29  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:34  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:38  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:46  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:49  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:58  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:48:45  INFO MaxentParserWrapper - Started processing:
> idd_secondTrial.txt
> 14 Dec 2017 10:20:19  INFO MaxentParserWrapper - Done parsing:
> idd_secondTrial.txt
> 
> 
> 
> 
> 
> 
> 
> Regards,
> Harish.
> 
> 
> -----Original Message-----
> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
> Sent: Thursday, December 14, 2017 9:16 AM
> To: user@ctakes.apache.org
> Subject: Re: Slowness in processing files [EXTERNAL]
> 
> Do not try to use AggregatePlainTextProcessor, it is just slow.
> Use the fast version and debug the password issues.
> Make sure you have your UMLS credentials set in:
> $CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_r
> x_
> 16ab.xml
> 
> in two different places.
> 
> Tim
> 
> 
> 
> On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote:
> > 
> > Hi James,
> >  
> > Thanks for responding.
> >  
> > Single file is taking ~5 hours to process with 
> > AggregatePlainTextProcessor of size 2 Mb. This is how the process 
> > looks like for JVM arguments regarding memory:
> >  
> > C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> > -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXXXX" -cp 
> > "C:\New_Drive\apache-ctakes-4.0.0-bi
> > apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-
> > bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-
> > 4.0.0-
> > bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
> > nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
> > ctakes-
> > 4.0.0\config\log4j.xml -Xms512M -Xmx3g 
> > org.apache.uima.tools.cpm.CpmFrame
> >  
> > Also, just now I tried to process the file with AE
> >  AggregatePlaintextFastUMLSProcessor but ran into different problem 
> > of not getting authentication error with same username password 
> > being used in AggregatePlainTextProcessor.
> >  
> > I can run it with AggregatePlaintextFastUMLSProcessor by increasing 
> > Xms 5g and Xmx5g,  if you could please let me know how can it be 
> > possible that with one AE AggregatePlainTextProcessor it is running 
> > fine with above username and password but giving below exception 
> > with same username, password with 
> > AggregatePlaintextFastUMLSProcessor.
> >  
> > Exception:
> >  
> >  C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> > -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXX" -cp 
> > "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-
> > 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> > 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
> > ctakes-
> > 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-
> > ctakes-
> > 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g 
> > org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM 
> > java.util.prefs.WindowsPreferences <init> WARNING: Could not 
> > open/create prefs root node Software\JavaSoft\Prefs at root 
> > 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
> > log4j:
> > attributes.... 13 Dec 2017 21:04:58  INFO Chunker - Chunker model
> > file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec
> > 2017
> > 21:05:00  INFO TokenizerAnnotatorPTB - Initializing 
> > org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017
> > 21:05:00
> > INFO ContextDependentTokenizerAnnotator - Finite state machines 
> > loaded. 13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using 
> > dictionary lookup window type:
> > org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017
> > 21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded:
> > CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN 
> > VBP VBZ WDT WP WPS WRB   13 Dec 2017 21:05:00  INFO 
> > AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec
> > 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary
> > Descriptor:
> > org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> > 13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing 
> > dictionary specifications: 13 Dec 2017 21:05:00  INFO 
> > UmlsUserApprover
> > - Checking UMLS Account at https://urldefense.proofpoint.com/v2/url
> > ?u=https-3A__uts-
> > 2Dws.nlm.nih.go&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
> > eFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
> > vCEkIgDc6DU1Nbw&s=v_ivdTVH9oojQd-0bohfzxVCl5UxJlSZ5FUfi7qnmxo&e=
> > v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017
> > 21:05:02 ERROR UmlsUserApprover -   UMLS Account at https://urldefe
> > nse.proofpoint.com/v2/url?u=https-3A__uts-
> > 2Dws.nl&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=He
> > up-IbsIg9Q1TPOylpP9FE4GTK-
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
> > vCEkIgDc6DU1Nbw&s=U8OuKgmE0YMDPABaTm39jDFIXG4tnVEeE1rrCS03cbM&e=
> > m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß 
> > with XXXXXXX
> > org.apache.uima.resource.ResourceInitializationException:
> > Initialization of CAS Processor with name 
> > "AggregatePlaintextFastUMLSProcessor" failed.         at 
> > org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
> > ti
> > alize(CollectionProcessingEngine_impl.java:81)         at 
> > org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessin
> > gE
> > ngine(UIMAFramework_impl.java:420)         at 
> > org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIM
> > AF
> > ramework.java:918)         at
> > org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:57
> > 3)
> >         at
> > org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
> >         at
> > org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
> > by: org.apache.uima.resource.ResourceConfigurationException:
> > Initialization of CAS Processor with name 
> > "AggregatePlaintextFastUMLSProcessor" failed.         at 
> > org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
> > eg
> > ratedCasProcessor(CPEFactory.java:1101)         at 
> > org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProc
> > es
> > sors(CPEFactory.java:547)         at 
> > org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.ja
> > va
> > :253)         at
> > org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.
> > ja
> > va:127)         at
> > org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
> > ti
> > alize(CollectionProcessingEngine_impl.java:73)         ... 5 more 
> > Caused by:
> > org.apache.uima.resource.ResourceInitializationException:
> > Initialization of annotator class
> > "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator "
> > failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
> > bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
> > fast/desc/analysis_engine/UmlsLookupAnnotator.xml)         at 
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> > ni
> > tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
> >         at
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> > ni
> > tialize(PrimitiveAnalysisEngine_impl.java:170)         at 
> > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
> > ly
> > sisEngineFactory_impl.java:94)         at 
> > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
> > Co
> > mpositeResourceFactory_impl.java:62)         at
> > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
> > 9)
> >         at
> > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
> > av
> > a:407)         at
> > org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.ja
> > va
> > :256)         at
> > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> > ni
> > tASB(AggregateAnalysisEngine_impl.java:429)         at 
> > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> > ni
> > tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:37
> > 3)
> >         at
> > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> > ni
> > tialize(AggregateAnalysisEngine_impl.java:186)         at 
> > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
> > ly
> > sisEngineFactory_impl.java:94)         at 
> > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
> > Co
> > mpositeResourceFactory_impl.java:62)         at
> > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
> > 9)
> >         at
> > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:33
> > 1)
> >         at
> > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
> > av
> > a:448)         at
> > org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
> > eg
> > ratedCasProcessor(CPEFactory.java:1085)         ... 9 more Caused
> > by:
> > org.apache.uima.resource.ResourceInitializationException: MESSAGE 
> > LOCALIZATION FAILED: Can't find resource for bundle 
> > java.util.PropertyResourceBundle, key C ould not construct 
> > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> > ti
> > onary         at
> > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
> > ni
> > tialize(AbstractJCasTermAnnotator.java:131)         at 
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> > ni
> > tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
> >         ... 24 more Caused by:
> > org.apache.uima.analysis_engine.annotator.AnnotatorContextException
> > :
> > MESSAGE LOCALIZATION FAILED: Can't find resource for bundle 
> > java.util.PropertyResourceBu ndle, key Could not construct 
> > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> > ti
> > onary         at
> > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> > rP
> > arser.parseDictionary(DictionaryDescriptorParser.java:199)
> > at
> > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> > rP
> > arser.parseDictionaries(DictionaryDescriptorParser.java:156)
> > at
> > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> > rP
> > arser.parseDescriptor(DictionaryDescriptorParser.java:128)
> > at
> > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
> > ni
> > tialize(AbstractJCasTermAnnotator.java:129)         ... 25 more 
> > Caused
> > by: java.lang.reflect.InvocationTargetException         at 
> > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > Method)
> >         at
> > sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
> > Source)
> >         at
> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
> > Source)         at
> > java.lang.reflect.Constructor.newInstance(Unknown
> > Source)         at
> > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> > rP
> > arser.parseDictionary(DictionaryDescriptorParser.java:196)
> > ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS 
> > dictionary sno_rx_16abTerms         at 
> > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> > ti
> > onary.<init>(UmlsJdbcRareWordDictionary.java:29)         ... 33 more
> >  
> >  
> >  
> > From: James Masanz [mailto:masanz.james@gmail.com]
> > Sent: Wednesday, December 13, 2017 8:56 PM
> > To: user@ctakes.apache.org
> > Subject: Re: Slowness in processing files
> >  
> > Using AggregatePlaintextFastUMLSProcessor  is much faster than 
> > AggregatePlainTextProcessor, so I suggest that to start with you 
> > just use AggregatePlaintextFastUMLSProcessor.
> >  
> > Do you mean it is taking ~5 hours for a single file to be processed 
> > at times, or is that for a set of files?
> >  
> > If your JVM heap space is not set large enough, you can get very 
> > slow results.
> > Try increasing to 5G (or more) using the JVM parameter   -Xmx5G For 
> > faster start up, you can also set the -Xms to the same or something 
> > close to -Xmx value.
> >  
> >  -- James
> >  
> > On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <hyadav@live.unc.edu
> > >
> > wrote:
> > Hi All,
> >  
> > When the medical records are run with the AE as 
> > AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor 
> > the processing is very slow. It is pretty fast when the smaller 
> > files
> > (~2 kb) are fed as input but when I am processing with bigger files 
> > say, 2Mb, it is very slow and the files are taking ~5 hours to 
> > process. Any pointer will be of great help.
> >  
> > Regards,
> > Harish.
> >  

Re: Slowness in processing files [EXTERNAL]

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.
You missed the most important part of my message:
> Do not try to use AggregatePlainTextProcessor, it is just slow.

Use AggregatePlaintextFastUMLSProcessor

Tim


On Thu, 2017-12-14 at 16:15 +0000, Yadav, Harish wrote:
> Hi Timothy,
> 
> I fixed the password issues and ran with AE
> AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a
> lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I
> have checked the memory consumption of the process and it never goes
> above 4.5 G, so I am not sure if it is the memory issue. However, AE
> AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but
> most of our files are in Mbs so processing time for each file for
> more than 2 hours is not feasible. 
> 
> Could you please suggest something which may improve the performance.
> Below are the logs for the process of 2 Mb file with
> AggregatePlainTextProcessor:
> 
> 
> 
> Logs:
> 
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp
> "C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes-
> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g
> org.apache.uima.tools.cpm.CpmFrame
> Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init>
> WARNING: Could not open/create prefs root node
> Software\JavaSoft\Prefs at root 0x80000002. Windows
> RegCreateKeyEx(...) returned error code 5.
> log4j: reset attribute= "false".
> log4j: Threshold ="null".
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressAppender] additivity to [false].
> log4j: Level value for ProgressAppender is  [INFO].
> log4j: ProgressAppender level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m].
> log4j: Adding appender named [noEolAppender] to category
> [ProgressAppender].
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressDone] additivity to [false].
> log4j: Level value for ProgressDone is  [INFO].
> log4j: ProgressDone level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m%n].
> log4j: Adding appender named [eolAppender] to category
> [ProgressDone].
> log4j: Level value for root is  [INFO].
> log4j: root level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy
> HH:mm:ss} %5p %c{1} - %m%n].
> log4j: Adding appender named [consoleAppender] to category [root].
> 14 Dec 2017 09:42:09  INFO Chunker - Chunker model file:
> org/apache/ctakes/chunker/models/chunker-model.zip
> 14 Dec 2017 09:42:10  INFO TokenizerAnnotatorPTB - Initializing
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> 14 Dec 2017 09:42:10  INFO ContextDependentTokenizerAnnotator -
> Finite state machines loaded.
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
> dictionary lookup window type:
> org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Exclusion
> tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB
> VBD VBG VBN VBP VBZ WDT WP WPS WRB
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using minimum
> term text span: 3
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
> Dictionary Descriptor:
> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> 14 Dec 2017 09:42:10  INFO DictionaryDescriptorParser - Parsing
> dictionary specifications:
> 14 Dec 2017 09:42:10  INFO UmlsUserApprover - Checking UMLS Account
> at https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.
> nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW
> 14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
> user harish1234:
> .14 Dec 2017 09:42:11  INFO UmlsUserApprover -   UMLS Account at http
> s://urldefense.proofpoint.com/v2/url?u=https-3A__uts-
> 2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69z
> y_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
> user harish1234 has been validated
> 
> 14 Dec 2017 09:42:11  INFO JdbcConnectionFactory - Connecting to
> jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/s
> no_rx_16ab/sno_rx_16ab:
> 14 Dec 2017 09:42:11  INFO ENGINE - open start - state not modified
> ..................
> 14 Dec 2017 09:42:17  INFO JdbcConnectionFactory -  Database
> connected
> 14 Dec 2017 09:42:17  INFO JdbcRareWordDictionary - Connected to cui
> and term table CUI_TERMS
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
> table TUI with class TUI
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
> table RXNORM with class LONG
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
> table PREFTERM with class PREFTERM
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
> table SNOMEDCT_US with class LONG
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right
> scope sizes: 10 , 10
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
> LEFT,RIGHT
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
> org.apache.ctakes.necontexts.status.StatusContextAnalyzer
> 14 Dec 2017 09:42:17  INFO StatusContextAnalyzer - initBoundaryData()
> called for ContextInitializer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
> org.apache.ctakes.necontexts.status.StatusContextHitConsumer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
> type: org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
> org.apache.ctakes.typesystem.type.syntax.BaseToken
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right
> scope sizes: 7 , 7
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
> LEFT,RIGHT
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
> org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
> 14 Dec 2017 09:42:17  INFO NegationContextAnalyzer -
> initBoundaryData() called for ContextInitializer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
> org.apache.ctakes.necontexts.negation.NegationContextHitConsumer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
> type: org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
> org.apache.ctakes.typesystem.type.syntax.BaseToken
> 14 Dec 2017 09:42:17  INFO SentenceDetector - Sentence detector model
> file: org/apache/ctakes/core/sentdetect/sd-med-model.zip
> 14 Dec 2017 09:42:17  INFO POSTagger - POS tagger model file:
> org/apache/ctakes/postagger/models/mayo-pos.zip
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - Loading NLM Norm
> and Lvg with config file = C:\New_Drive\apache-ctakes-4.0.0-
> bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl -   config file
> absolute path = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cwd =
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\
> 14 Dec 2017 09:42:18  INFO ENGINE - open start - state not modified
> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open start
> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open end
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
> 14 Dec 2017 09:42:18  INFO DrugMentionAnnotator - Finite state
> machines loaded.
> 14 Dec 2017 09:42:23  INFO ClearNLPDependencyParserAE - using Morphy
> analysis? true
> Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> .....................................................................
> ...................
> Loading configuration.
> Loading feature templates.
> Loading model:
> .
> Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> ...
> <various Loading model>
> .
> Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> ................................
> Loading model:
> .............................
> 14 Dec 2017 09:42:32  INFO ConstituencyParser - Initializing
> parser...
> 14 Dec 2017 09:42:33  INFO SentenceDetector - Starting processing.
> 14 Dec 2017 09:42:34  INFO TokenizerAnnotatorPTB - process(JCas) in
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> 14 Dec 2017 09:42:36  INFO LvgAnnotator - process(JCas)
> 14 Dec 2017 09:42:55  INFO ContextDependentTokenizerAnnotator -
> process(JCas)
> 14 Dec 2017 09:42:58  INFO POSTagger - process(JCas)
> 14 Dec 2017 09:43:10  INFO Chunker -  process(JCas)
> 14 Dec 2017 09:43:46  INFO ChunkAdjuster -  process(JCas)
> 14 Dec 2017 09:43:47  INFO ChunkAdjuster -  process(JCas)
> 14 Dec 2017 09:43:48  INFO AbstractJCasTermAnnotator - Starting
> processing
> 14 Dec 2017 09:43:54  INFO AbstractJCasTermAnnotator - Finished
> processing
> 14 Dec 2017 09:43:54  INFO DrugMentionAnnotator - process(JCas)
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:34  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:38  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:39  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:42  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:43  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:45  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:53  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:59  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:00  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:05  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:06  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:08  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:11  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:16  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:24  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:27  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:30  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:35  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:38  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:45  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:53  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:02  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:22  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:24  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:28  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:29  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:34  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:38  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:46  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:49  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:58  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:48:45  INFO MaxentParserWrapper - Started processing:
> idd_secondTrial.txt
> 14 Dec 2017 10:20:19  INFO MaxentParserWrapper - Done parsing:
> idd_secondTrial.txt
> 
> 
> 
> 
> 
> 
> 
> Regards,
> Harish.
> 
> 
> -----Original Message-----
> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu] 
> Sent: Thursday, December 14, 2017 9:16 AM
> To: user@ctakes.apache.org
> Subject: Re: Slowness in processing files [EXTERNAL]
> 
> Do not try to use AggregatePlainTextProcessor, it is just slow.
> Use the fast version and debug the password issues.
> Make sure you have your UMLS credentials set in:
> $CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_r
> x_
> 16ab.xml
> 
> in two different places.
> 
> Tim
> 
> 
> 
> On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote:
> > 
> > Hi James,
> >  
> > Thanks for responding.
> >  
> > Single file is taking ~5 hours to process with 
> > AggregatePlainTextProcessor of size 2 Mb. This is how the process 
> > looks like for JVM arguments regarding memory:
> >  
> > C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> > -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXXXX" -cp 
> > "C:\New_Drive\apache-ctakes-4.0.0-bi
> > apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-
> > bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-
> > 4.0.0-
> > bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
> > nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
> > ctakes-
> > 4.0.0\config\log4j.xml -Xms512M -Xmx3g 
> > org.apache.uima.tools.cpm.CpmFrame
> >  
> > Also, just now I tried to process the file with AE
> >  AggregatePlaintextFastUMLSProcessor but ran into different problem
> > of 
> > not getting authentication error with same username password being 
> > used in AggregatePlainTextProcessor.
> >  
> > I can run it with AggregatePlaintextFastUMLSProcessor by
> > increasing 
> > Xms 5g and Xmx5g,  if you could please let me know how can it be 
> > possible that with one AE AggregatePlainTextProcessor it is
> > running 
> > fine with above username and password but giving below exception
> > with 
> > same username, password with AggregatePlaintextFastUMLSProcessor.
> >  
> > Exception:
> >  
> >  C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> > -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXX" -cp 
> > "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-
> > 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> > 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
> > ctakes-
> > 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-
> > ctakes-
> > 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g 
> > org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM 
> > java.util.prefs.WindowsPreferences <init> WARNING: Could not 
> > open/create prefs root node Software\JavaSoft\Prefs at root 
> > 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
> > log4j:
> > attributes.... 13 Dec 2017 21:04:58  INFO Chunker - Chunker model
> > file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec
> > 2017
> > 21:05:00  INFO TokenizerAnnotatorPTB - Initializing 
> > org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017
> > 21:05:00 
> > INFO ContextDependentTokenizerAnnotator - Finite state machines 
> > loaded. 13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator -
> > Using 
> > dictionary lookup window type:
> > org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017
> > 21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded:
> > CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN
> > VBP 
> > VBZ WDT WP WPS WRB   13 Dec 2017 21:05:00  INFO 
> > AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec
> > 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary
> > Descriptor:
> > org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> > 13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing 
> > dictionary specifications: 13 Dec 2017 21:05:00  INFO
> > UmlsUserApprover 
> > - Checking UMLS Account at https://urldefense.proofpoint.com/v2/url
> > ?u=https-3A__uts-
> > 2Dws.nlm.nih.go&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
> > eFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
> > vCEkIgDc6DU1Nbw&s=v_ivdTVH9oojQd-0bohfzxVCl5UxJlSZ5FUfi7qnmxo&e= 
> > v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017
> > 21:05:02 ERROR UmlsUserApprover -   UMLS Account at https://urldefe
> > nse.proofpoint.com/v2/url?u=https-3A__uts-
> > 2Dws.nl&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=He
> > up-IbsIg9Q1TPOylpP9FE4GTK-
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
> > vCEkIgDc6DU1Nbw&s=U8OuKgmE0YMDPABaTm39jDFIXG4tnVEeE1rrCS03cbM&e= 
> > m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß
> > with 
> > XXXXXXX
> > org.apache.uima.resource.ResourceInitializationException:
> > Initialization of CAS Processor with name 
> > "AggregatePlaintextFastUMLSProcessor" failed.         at 
> > org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
> > ti
> > alize(CollectionProcessingEngine_impl.java:81)         at 
> > org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessin
> > gE
> > ngine(UIMAFramework_impl.java:420)         at 
> > org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIM
> > AF
> > ramework.java:918)         at
> > org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:57
> > 3)
> >         at
> > org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
> >         at
> > org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
> > by: org.apache.uima.resource.ResourceConfigurationException:
> > Initialization of CAS Processor with name 
> > "AggregatePlaintextFastUMLSProcessor" failed.         at 
> > org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
> > eg
> > ratedCasProcessor(CPEFactory.java:1101)         at 
> > org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProc
> > es
> > sors(CPEFactory.java:547)         at
> > org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.ja
> > va
> > :253)         at
> > org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.
> > ja
> > va:127)         at
> > org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
> > ti
> > alize(CollectionProcessingEngine_impl.java:73)         ... 5 more 
> > Caused by:
> > org.apache.uima.resource.ResourceInitializationException:
> > Initialization of annotator class
> > "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator "
> > failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
> > bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
> > fast/desc/analysis_engine/UmlsLookupAnnotator.xml)         at 
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> > ni
> > tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
> >         at
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> > ni
> > tialize(PrimitiveAnalysisEngine_impl.java:170)         at 
> > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
> > ly
> > sisEngineFactory_impl.java:94)         at 
> > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
> > Co
> > mpositeResourceFactory_impl.java:62)         at
> > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
> > 9)
> >         at
> > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
> > av
> > a:407)         at
> > org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.ja
> > va
> > :256)         at
> > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> > ni
> > tASB(AggregateAnalysisEngine_impl.java:429)         at 
> > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> > ni
> > tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:37
> > 3)
> >         at
> > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> > ni
> > tialize(AggregateAnalysisEngine_impl.java:186)         at 
> > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
> > ly
> > sisEngineFactory_impl.java:94)         at 
> > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
> > Co
> > mpositeResourceFactory_impl.java:62)         at
> > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
> > 9)
> >         at
> > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:33
> > 1)
> >         at
> > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
> > av
> > a:448)         at
> > org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
> > eg
> > ratedCasProcessor(CPEFactory.java:1085)         ... 9 more Caused
> > by:
> > org.apache.uima.resource.ResourceInitializationException: MESSAGE 
> > LOCALIZATION FAILED: Can't find resource for bundle 
> > java.util.PropertyResourceBundle, key C ould not construct 
> > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> > ti
> > onary         at
> > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
> > ni
> > tialize(AbstractJCasTermAnnotator.java:131)         at 
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> > ni
> > tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
> >         ... 24 more Caused by:
> > org.apache.uima.analysis_engine.annotator.AnnotatorContextException
> > :
> > MESSAGE LOCALIZATION FAILED: Can't find resource for bundle 
> > java.util.PropertyResourceBu ndle, key Could not construct 
> > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> > ti
> > onary         at
> > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> > rP
> > arser.parseDictionary(DictionaryDescriptorParser.java:199)        
> > at 
> > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> > rP
> > arser.parseDictionaries(DictionaryDescriptorParser.java:156)
> > at
> > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> > rP
> > arser.parseDescriptor(DictionaryDescriptorParser.java:128)        
> > at 
> > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
> > ni
> > tialize(AbstractJCasTermAnnotator.java:129)         ... 25 more
> > Caused 
> > by: java.lang.reflect.InvocationTargetException         at 
> > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > Method)
> >         at
> > sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
> > Source)
> >         at
> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
> > Source)         at
> > java.lang.reflect.Constructor.newInstance(Unknown
> > Source)         at
> > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> > rP
> > arser.parseDictionary(DictionaryDescriptorParser.java:196)
> > ... 28 more Caused by: java.sql.SQLException: Invalid User for
> > UMLS 
> > dictionary sno_rx_16abTerms         at 
> > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> > ti
> > onary.<init>(UmlsJdbcRareWordDictionary.java:29)         ... 33
> > more
> >  
> >  
> >  
> > From: James Masanz [mailto:masanz.james@gmail.com]
> > Sent: Wednesday, December 13, 2017 8:56 PM
> > To: user@ctakes.apache.org
> > Subject: Re: Slowness in processing files
> >  
> > Using AggregatePlaintextFastUMLSProcessor  is much faster than 
> > AggregatePlainTextProcessor, so I suggest that to start with you
> > just 
> > use AggregatePlaintextFastUMLSProcessor.
> >  
> > Do you mean it is taking ~5 hours for a single file to be processed
> > at 
> > times, or is that for a set of files?
> >  
> > If your JVM heap space is not set large enough, you can get very
> > slow 
> > results.
> > Try increasing to 5G (or more) using the JVM parameter   -Xmx5G
> > For 
> > faster start up, you can also set the -Xms to the same or
> > something 
> > close to -Xmx value.
> >  
> >  -- James
> >  
> > On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <hyadav@live.unc.edu
> > >
> > wrote:
> > Hi All,
> >  
> > When the medical records are run with the AE as 
> > AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor
> > the 
> > processing is very slow. It is pretty fast when the smaller files
> > (~2 kb) are fed as input but when I am processing with bigger
> > files 
> > say, 2Mb, it is very slow and the files are taking ~5 hours to 
> > process. Any pointer will be of great help.
> >  
> > Regards,
> > Harish.
> >  

RE: Slowness in processing files [EXTERNAL]

Posted by "Yadav, Harish" <hy...@live.unc.edu>.
Hi Timothy,

I fixed the password issues and ran with AE AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I have checked the memory consumption of the process and it never goes above 4.5 G, so I am not sure if it is the memory issue. However, AE AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but most of our files are in Mbs so processing time for each file for more than 2 hours is not feasible. 

Could you please suggest something which may improve the performance. Below are the logs for the process of 2 Mb file with AggregatePlainTextProcessor:



Logs:

C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp "C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g org.apache.uima.tools.cpm.CpmFrame
Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init>
WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs at root 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
log4j: reset attribute= "false".
log4j: Threshold ="null".
log4j: Retreiving an instance of org.apache.log4j.Logger.
log4j: Setting [ProgressAppender] additivity to [false].
log4j: Level value for ProgressAppender is  [INFO].
log4j: ProgressAppender level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%m].
log4j: Adding appender named [noEolAppender] to category [ProgressAppender].
log4j: Retreiving an instance of org.apache.log4j.Logger.
log4j: Setting [ProgressDone] additivity to [false].
log4j: Level value for ProgressDone is  [INFO].
log4j: ProgressDone level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%m%n].
log4j: Adding appender named [eolAppender] to category [ProgressDone].
log4j: Level value for root is  [INFO].
log4j: root level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy HH:mm:ss} %5p %c{1} - %m%n].
log4j: Adding appender named [consoleAppender] to category [root].
14 Dec 2017 09:42:09  INFO Chunker - Chunker model file: org/apache/ctakes/chunker/models/chunker-model.zip
14 Dec 2017 09:42:10  INFO TokenizerAnnotatorPTB - Initializing org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
14 Dec 2017 09:42:10  INFO ContextDependentTokenizerAnnotator - Finite state machines loaded.
14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using dictionary lookup window type: org.apache.ctakes.typesystem.type.textspan.Sentence
14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN VBP VBZ WDT WP WPS WRB
14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using minimum term text span: 3
14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using Dictionary Descriptor: org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
14 Dec 2017 09:42:10  INFO DictionaryDescriptorParser - Parsing dictionary specifications:
14 Dec 2017 09:42:10  INFO UmlsUserApprover - Checking UMLS Account at https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser for user harish1234:
.14 Dec 2017 09:42:11  INFO UmlsUserApprover -   UMLS Account at https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser for user harish1234 has been validated

14 Dec 2017 09:42:11  INFO JdbcConnectionFactory - Connecting to jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/sno_rx_16ab:
14 Dec 2017 09:42:11  INFO ENGINE - open start - state not modified
..................
14 Dec 2017 09:42:17  INFO JdbcConnectionFactory -  Database connected
14 Dec 2017 09:42:17  INFO JdbcRareWordDictionary - Connected to cui and term table CUI_TERMS
14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept table TUI with class TUI
14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept table RXNORM with class LONG
14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept table PREFTERM with class PREFTERM
14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept table SNOMEDCT_US with class LONG
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope sizes: 10 , 10
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order: LEFT,RIGHT
14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer: org.apache.ctakes.necontexts.status.StatusContextAnalyzer
14 Dec 2017 09:42:17  INFO StatusContextAnalyzer - initBoundaryData() called for ContextInitializer
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer: org.apache.ctakes.necontexts.status.StatusContextHitConsumer
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window type: org.apache.ctakes.typesystem.type.textspan.Sentence
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type: org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type: org.apache.ctakes.typesystem.type.syntax.BaseToken
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope sizes: 7 , 7
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order: LEFT,RIGHT
14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer: org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
14 Dec 2017 09:42:17  INFO NegationContextAnalyzer - initBoundaryData() called for ContextInitializer
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer: org.apache.ctakes.necontexts.negation.NegationContextHitConsumer
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window type: org.apache.ctakes.typesystem.type.textspan.Sentence
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type: org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type: org.apache.ctakes.typesystem.type.syntax.BaseToken
14 Dec 2017 09:42:17  INFO SentenceDetector - Sentence detector model file: org/apache/ctakes/core/sentdetect/sd-med-model.zip
14 Dec 2017 09:42:17  INFO POSTagger - POS tagger model file: org/apache/ctakes/postagger/models/mayo-pos.zip
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - Loading NLM Norm and Lvg with config file = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl -   config file absolute path = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cwd = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\resources\org\apache\ctakes\lvg\
14 Dec 2017 09:42:18  INFO ENGINE - open start - state not modified
14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open start
14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open end
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
14 Dec 2017 09:42:18  INFO DrugMentionAnnotator - Finite state machines loaded.
14 Dec 2017 09:42:23  INFO ClearNLPDependencyParserAE - using Morphy analysis? true
Loading configuration.
Loading feature templates.
Loading lexica.
Loading model:
........................................................................................
Loading configuration.
Loading feature templates.
Loading model:
.
Loading configuration.
Loading feature templates.
Loading lexica.
Loading model:
...
<various Loading model>
.
Loading configuration.
Loading feature templates.
Loading lexica.
Loading model:
................................
Loading model:
.............................
14 Dec 2017 09:42:32  INFO ConstituencyParser - Initializing parser...
14 Dec 2017 09:42:33  INFO SentenceDetector - Starting processing.
14 Dec 2017 09:42:34  INFO TokenizerAnnotatorPTB - process(JCas) in org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
14 Dec 2017 09:42:36  INFO LvgAnnotator - process(JCas)
14 Dec 2017 09:42:55  INFO ContextDependentTokenizerAnnotator - process(JCas)
14 Dec 2017 09:42:58  INFO POSTagger - process(JCas)
14 Dec 2017 09:43:10  INFO Chunker -  process(JCas)
14 Dec 2017 09:43:46  INFO ChunkAdjuster -  process(JCas)
14 Dec 2017 09:43:47  INFO ChunkAdjuster -  process(JCas)
14 Dec 2017 09:43:48  INFO AbstractJCasTermAnnotator - Starting processing
14 Dec 2017 09:43:54  INFO AbstractJCasTermAnnotator - Finished processing
14 Dec 2017 09:43:54  INFO DrugMentionAnnotator - process(JCas)
14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:34  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:38  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:39  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:42  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:43  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:45  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:53  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:54  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:59  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:00  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:05  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:06  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:08  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:11  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:16  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:24  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:27  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:30  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:35  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:38  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:45  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:53  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:54  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:02  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:22  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:24  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:28  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:29  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:34  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:38  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:46  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:49  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:58  INFO DrugMentionAnnotator -
14 Dec 2017 09:48:45  INFO MaxentParserWrapper - Started processing: idd_secondTrial.txt
14 Dec 2017 10:20:19  INFO MaxentParserWrapper - Done parsing: idd_secondTrial.txt







Regards,
Harish.


-----Original Message-----
From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu] 
Sent: Thursday, December 14, 2017 9:16 AM
To: user@ctakes.apache.org
Subject: Re: Slowness in processing files [EXTERNAL]

Do not try to use AggregatePlainTextProcessor, it is just slow.
Use the fast version and debug the password issues.
Make sure you have your UMLS credentials set in:
$CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_
16ab.xml

in two different places.

Tim



On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote:
> Hi James,
>  
> Thanks for responding.
>  
> Single file is taking ~5 hours to process with 
> AggregatePlainTextProcessor of size 2 Mb. This is how the process 
> looks like for JVM arguments regarding memory:
>  
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXXXX" -cp 
> "C:\New_Drive\apache-ctakes-4.0.0-bi
> apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-
> bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-
> bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
> nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\config\log4j.xml -Xms512M -Xmx3g 
> org.apache.uima.tools.cpm.CpmFrame
>  
> Also, just now I tried to process the file with AE
>  AggregatePlaintextFastUMLSProcessor but ran into different problem of 
> not getting authentication error with same username password being 
> used in AggregatePlainTextProcessor.
>  
> I can run it with AggregatePlaintextFastUMLSProcessor by increasing 
> Xms 5g and Xmx5g,  if you could please let me know how can it be 
> possible that with one AE AggregatePlainTextProcessor it is running 
> fine with above username and password but giving below exception with 
> same username, password with AggregatePlaintextFastUMLSProcessor.
>  
> Exception:
>  
>  C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXX" -cp 
> "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-
> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-ctakes-
> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g 
> org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM 
> java.util.prefs.WindowsPreferences <init> WARNING: Could not 
> open/create prefs root node Software\JavaSoft\Prefs at root 
> 0x80000002. Windows RegCreateKeyEx(...) returned error code 5. log4j:
> attributes.... 13 Dec 2017 21:04:58  INFO Chunker - Chunker model
> file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec 2017
> 21:05:00  INFO TokenizerAnnotatorPTB - Initializing 
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017 21:05:00 
> INFO ContextDependentTokenizerAnnotator - Finite state machines 
> loaded. 13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using 
> dictionary lookup window type:
> org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017
> 21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded:
> CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN VBP 
> VBZ WDT WP WPS WRB   13 Dec 2017 21:05:00  INFO 
> AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec
> 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary
> Descriptor: org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> 13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing 
> dictionary specifications: 13 Dec 2017 21:05:00  INFO UmlsUserApprover 
> - Checking UMLS Account at https://uts-ws.nlm.nih.go 
> v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017
> 21:05:02 ERROR UmlsUserApprover -   UMLS Account at https://uts-ws.nl 
> m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß with 
> XXXXXXX
> org.apache.uima.resource.ResourceInitializationException:
> Initialization of CAS Processor with name 
> "AggregatePlaintextFastUMLSProcessor" failed.         at 
> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initi
> alize(CollectionProcessingEngine_impl.java:81)         at 
> org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessingE
> ngine(UIMAFramework_impl.java:420)         at 
> org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIMAF
> ramework.java:918)         at
> org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:573)
>         at
> org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
>         at
> org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
> by: org.apache.uima.resource.ResourceConfigurationException:
> Initialization of CAS Processor with name 
> "AggregatePlaintextFastUMLSProcessor" failed.         at 
> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInteg
> ratedCasProcessor(CPEFactory.java:1101)         at 
> org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProces
> sors(CPEFactory.java:547)         at
> org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.java
> :253)         at
> org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.ja
> va:127)         at
> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initi
> alize(CollectionProcessingEngine_impl.java:73)         ... 5 more 
> Caused by: org.apache.uima.resource.ResourceInitializationException:
> Initialization of annotator class
> "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator "
> failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
> bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
> fast/desc/analysis_engine/UmlsLookupAnnotator.xml)         at 
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini
> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
>         at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini
> tialize(PrimitiveAnalysisEngine_impl.java:170)         at 
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy
> sisEngineFactory_impl.java:94)         at 
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(Co
> mpositeResourceFactory_impl.java:62)         at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
>         at
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.jav
> a:407)         at
> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java
> :256)         at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
> tASB(AggregateAnalysisEngine_impl.java:429)         at 
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
> tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373)
>         at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
> tialize(AggregateAnalysisEngine_impl.java:186)         at 
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy
> sisEngineFactory_impl.java:94)         at 
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(Co
> mpositeResourceFactory_impl.java:62)         at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
>         at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:331)
>         at
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.jav
> a:448)         at
> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInteg
> ratedCasProcessor(CPEFactory.java:1085)         ... 9 more Caused by:
> org.apache.uima.resource.ResourceInitializationException: MESSAGE 
> LOCALIZATION FAILED: Can't find resource for bundle 
> java.util.PropertyResourceBundle, key C ould not construct 
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDicti
> onary         at
> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.ini
> tialize(AbstractJCasTermAnnotator.java:131)         at 
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini
> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
>         ... 24 more Caused by:
> org.apache.uima.analysis_engine.annotator.AnnotatorContextException:
> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle 
> java.util.PropertyResourceBu ndle, key Could not construct 
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDicti
> onary         at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP
> arser.parseDictionary(DictionaryDescriptorParser.java:199)         at 
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP
> arser.parseDictionaries(DictionaryDescriptorParser.java:156)
> at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP
> arser.parseDescriptor(DictionaryDescriptorParser.java:128)         at 
> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.ini
> tialize(AbstractJCasTermAnnotator.java:129)         ... 25 more Caused 
> by: java.lang.reflect.InvocationTargetException         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
> Source)         at java.lang.reflect.Constructor.newInstance(Unknown
> Source)         at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP
> arser.parseDictionary(DictionaryDescriptorParser.java:196)
> ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS 
> dictionary sno_rx_16abTerms         at 
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDicti
> onary.<init>(UmlsJdbcRareWordDictionary.java:29)         ... 33 more
>  
>  
>  
> From: James Masanz [mailto:masanz.james@gmail.com]
> Sent: Wednesday, December 13, 2017 8:56 PM
> To: user@ctakes.apache.org
> Subject: Re: Slowness in processing files
>  
> Using AggregatePlaintextFastUMLSProcessor  is much faster than 
> AggregatePlainTextProcessor, so I suggest that to start with you just 
> use AggregatePlaintextFastUMLSProcessor.
>  
> Do you mean it is taking ~5 hours for a single file to be processed at 
> times, or is that for a set of files?
>  
> If your JVM heap space is not set large enough, you can get very slow 
> results.
> Try increasing to 5G (or more) using the JVM parameter   -Xmx5G For 
> faster start up, you can also set the -Xms to the same or something 
> close to -Xmx value.
>  
>  -- James
>  
> On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <hy...@live.unc.edu>
> wrote:
> Hi All,
>  
> When the medical records are run with the AE as 
> AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor the 
> processing is very slow. It is pretty fast when the smaller files
> (~2 kb) are fed as input but when I am processing with bigger files 
> say, 2Mb, it is very slow and the files are taking ~5 hours to 
> process. Any pointer will be of great help.
>  
> Regards,
> Harish.
>  

Re: Slowness in processing files [EXTERNAL]

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.
Do not try to use AggregatePlainTextProcessor, it is just slow.
Use the fast version and debug the password issues.
Make sure you have your UMLS credentials set in:
$CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_
16ab.xml

in two different places.

Tim



On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote:
> Hi James,
>  
> Thanks for responding.
>  
> Single file is taking ~5 hours to process with
> AggregatePlainTextProcessor of size 2 Mb. This is how the process
> looks like for JVM arguments regarding memory:
>  
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXXXX" -cp
> "C:\New_Drive\apache-ctakes-4.0.0-bi
> apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-
> bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-
> bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
> nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\config\log4j.xml -Xms512M -Xmx3g
> org.apache.uima.tools.cpm.CpmFrame
>  
> Also, just now I tried to process the file with AE
>  AggregatePlaintextFastUMLSProcessor but ran into different problem
> of not getting authentication error with same username password being
> used in AggregatePlainTextProcessor.
>  
> I can run it with AggregatePlaintextFastUMLSProcessor by increasing
> Xms 5g and Xmx5g,  if you could please let me know how can it be
> possible that with one AE AggregatePlainTextProcessor it is running
> fine with above username and password but giving below exception with
> same username, password with AggregatePlaintextFastUMLSProcessor.
>  
> Exception:
>  
>  C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXX" -cp
> "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-
> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-ctakes-
> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g
> org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM
> java.util.prefs.WindowsPreferences <init> WARNING: Could not
> open/create prefs root node Software\JavaSoft\Prefs at root
> 0x80000002. Windows RegCreateKeyEx(...) returned error code 5. log4j:
> attributes.... 13 Dec 2017 21:04:58  INFO Chunker - Chunker model
> file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec 2017
> 21:05:00  INFO TokenizerAnnotatorPTB - Initializing
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017 21:05:00 
> INFO ContextDependentTokenizerAnnotator - Finite state machines
> loaded. 13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using
> dictionary lookup window type:
> org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017
> 21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded:
> CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN VBP
> VBZ WDT WP WPS WRB   13 Dec 2017 21:05:00  INFO
> AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec
> 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary
> Descriptor: org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> 13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing
> dictionary specifications: 13 Dec 2017 21:05:00  INFO
> UmlsUserApprover - Checking UMLS Account at https://uts-ws.nlm.nih.go
> v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017
> 21:05:02 ERROR UmlsUserApprover -   UMLS Account at https://uts-ws.nl
> m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß
> with XXXXXXX  
> org.apache.uima.resource.ResourceInitializationException:
> Initialization of CAS Processor with name
> "AggregatePlaintextFastUMLSProcessor" failed.         at
> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initi
> alize(CollectionProcessingEngine_impl.java:81)         at
> org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessingE
> ngine(UIMAFramework_impl.java:420)         at
> org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIMAF
> ramework.java:918)         at
> org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:573)
>         at
> org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
>         at
> org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
> by: org.apache.uima.resource.ResourceConfigurationException:
> Initialization of CAS Processor with name
> "AggregatePlaintextFastUMLSProcessor" failed.         at
> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInteg
> ratedCasProcessor(CPEFactory.java:1101)         at
> org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProces
> sors(CPEFactory.java:547)         at
> org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.java
> :253)         at
> org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.ja
> va:127)         at
> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initi
> alize(CollectionProcessingEngine_impl.java:73)         ... 5 more
> Caused by: org.apache.uima.resource.ResourceInitializationException:
> Initialization of annotator class
> "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator "
> failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
> bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
> fast/desc/analysis_engine/UmlsLookupAnnotator.xml)         at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini
> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
>         at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini
> tialize(PrimitiveAnalysisEngine_impl.java:170)         at
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy
> sisEngineFactory_impl.java:94)         at
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(Co
> mpositeResourceFactory_impl.java:62)         at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
>         at
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.jav
> a:407)         at
> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java
> :256)         at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
> tASB(AggregateAnalysisEngine_impl.java:429)         at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
> tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373)
>         at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
> tialize(AggregateAnalysisEngine_impl.java:186)         at
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy
> sisEngineFactory_impl.java:94)         at
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(Co
> mpositeResourceFactory_impl.java:62)         at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
>         at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:331)
>         at
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.jav
> a:448)         at
> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInteg
> ratedCasProcessor(CPEFactory.java:1085)         ... 9 more Caused by:
> org.apache.uima.resource.ResourceInitializationException: MESSAGE
> LOCALIZATION FAILED: Can't find resource for bundle
> java.util.PropertyResourceBundle, key C ould not construct
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDicti
> onary         at
> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.ini
> tialize(AbstractJCasTermAnnotator.java:131)         at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini
> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
>         ... 24 more Caused by:
> org.apache.uima.analysis_engine.annotator.AnnotatorContextException:
> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
> java.util.PropertyResourceBu ndle, key Could not construct
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDicti
> onary         at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP
> arser.parseDictionary(DictionaryDescriptorParser.java:199)         at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP
> arser.parseDictionaries(DictionaryDescriptorParser.java:156)        
> at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP
> arser.parseDescriptor(DictionaryDescriptorParser.java:128)         at
> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.ini
> tialize(AbstractJCasTermAnnotator.java:129)         ... 25 more
> Caused by: java.lang.reflect.InvocationTargetException         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
> Source)         at java.lang.reflect.Constructor.newInstance(Unknown
> Source)         at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP
> arser.parseDictionary(DictionaryDescriptorParser.java:196)        
> ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS
> dictionary sno_rx_16abTerms         at
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDicti
> onary.<init>(UmlsJdbcRareWordDictionary.java:29)         ... 33 more
>  
>  
>  
> From: James Masanz [mailto:masanz.james@gmail.com] 
> Sent: Wednesday, December 13, 2017 8:56 PM
> To: user@ctakes.apache.org
> Subject: Re: Slowness in processing files
>  
> Using AggregatePlaintextFastUMLSProcessor  is much faster than
> AggregatePlainTextProcessor, so I suggest that to start with you just
> use AggregatePlaintextFastUMLSProcessor.
>  
> Do you mean it is taking ~5 hours for a single file to be processed
> at times, or is that for a set of files?
>  
> If your JVM heap space is not set large enough, you can get very slow
> results.
> Try increasing to 5G (or more) using the JVM parameter   -Xmx5G
> For faster start up, you can also set the -Xms to the same or
> something close to -Xmx value.
>  
>  -- James
>  
> On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <hy...@live.unc.edu>
> wrote:
> Hi All,
>  
> When the medical records are run with the AE as
> AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor
> the processing is very slow. It is pretty fast when the smaller files
> (~2 kb) are fed as input but when I am processing with bigger files
> say, 2Mb, it is very slow and the files are taking ~5 hours to
> process. Any pointer will be of great help.
>  
> Regards,
> Harish.
>  

RE: Slowness in processing files

Posted by "Yadav, Harish" <hy...@live.unc.edu>.
Hi James,

Thanks for responding.

Single file is taking ~5 hours to process with AggregatePlainTextProcessor of size 2 Mb. This is how the process looks like for JVM arguments regarding memory:

C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXXXX" -cp "C:\New_Drive\apache-ctakes-4.0.0-bi
apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g org.apache.uima.tools.cpm.CpmFrame

Also, just now I tried to process the file with AE  AggregatePlaintextFastUMLSProcessor but ran into different problem of not getting authentication error with same username password being used in AggregatePlainTextProcessor.

I can run it with AggregatePlaintextFastUMLSProcessor by increasing Xms 5g and Xmx5g,  if you could please let me know how can it be possible that with one AE AggregatePlainTextProcessor it is running fine with above username and password but giving below exception with same username, password with AggregatePlaintextFastUMLSProcessor.

Exception:

C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXX" -cp "C:\New_Drive\apache-ctakes-4.0.0-bin\
apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.co
nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g org.apache.uima.tools.cpm.CpmFrame
Dec 13, 2017 9:01:20 PM java.util.prefs.WindowsPreferences <init>
WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs at root 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
log4j: attributes....
13 Dec 2017 21:04:58  INFO Chunker - Chunker model file: org/apache/ctakes/chunker/models/chunker-model.zip
13 Dec 2017 21:05:00  INFO TokenizerAnnotatorPTB - Initializing org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
13 Dec 2017 21:05:00  INFO ContextDependentTokenizerAnnotator - Finite state machines loaded.
13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using dictionary lookup window type: org.apache.ctakes.typesystem.type.textspan.Sentence
13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN VBP VBZ WDT WP WPS WRB

13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using minimum term text span: 3
13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary Descriptor: org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing dictionary specifications:
13 Dec 2017 21:05:00  INFO UmlsUserApprover - Checking UMLS Account at https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser for user harish1234-ß:
....13 Dec 2017 21:05:02 ERROR UmlsUserApprover -   UMLS Account at https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß with XXXXXXX

org.apache.uima.resource.ResourceInitializationException: Initialization of CAS Processor with name "AggregatePlaintextFastUMLSProcessor" failed.
        at org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initialize(CollectionProcessingEngine_impl.java:81)
        at org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessingEngine(UIMAFramework_impl.java:420)
        at org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIMAFramework.java:918)
        at org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:573)
        at org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
        at org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713)
Caused by: org.apache.uima.resource.ResourceConfigurationException: Initialization of CAS Processor with name "AggregatePlaintextFastUMLSProcessor" failed.
        at org.apache.uima.collection.impl.cpm.container.CPEFactory.produceIntegratedCasProcessor(CPEFactory.java:1101)
        at org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProcessors(CPEFactory.java:547)
        at org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.java:253)
        at org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.java:127)
        at org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initialize(CollectionProcessingEngine_impl.java:73)
        ... 5 more
Caused by: org.apache.uima.resource.ResourceInitializationException: Initialization of annotator class "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator
" failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-fast/desc/analysis_engine/UmlsLookupAnnotator.xml)
        at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
        at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:170)
        at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
        at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
        at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
        at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:407)
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:256)
        at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:429)
        at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373)
        at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:186)
        at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
        at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
        at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
        at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:331)
        at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:448)
        at org.apache.uima.collection.impl.cpm.container.CPEFactory.produceIntegratedCasProcessor(CPEFactory.java:1085)
        ... 9 more
Caused by: org.apache.uima.resource.ResourceInitializationException: MESSAGE LOCALIZATION FAILED: Can't find resource for bundle java.util.PropertyResourceBundle, key C
ould not construct org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary
        at org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:131)
        at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
        ... 24 more
Caused by: org.apache.uima.analysis_engine.annotator.AnnotatorContextException: MESSAGE LOCALIZATION FAILED: Can't find resource for bundle java.util.PropertyResourceBu
ndle, key Could not construct org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary
        at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionary(DictionaryDescriptorParser.java:199)
        at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionaries(DictionaryDescriptorParser.java:156)
        at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDescriptor(DictionaryDescriptorParser.java:128)
        at org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:129)
        ... 25 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionary(DictionaryDescriptorParser.java:196)
        ... 28 more
Caused by: java.sql.SQLException: Invalid User for UMLS dictionary sno_rx_16abTerms
        at org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary.<init>(UmlsJdbcRareWordDictionary.java:29)
        ... 33 more




From: James Masanz [mailto:masanz.james@gmail.com]
Sent: Wednesday, December 13, 2017 8:56 PM
To: user@ctakes.apache.org
Subject: Re: Slowness in processing files

Using AggregatePlaintextFastUMLSProcessor  is much faster than AggregatePlainTextProcessor, so I suggest that to start with you just use AggregatePlaintextFastUMLSProcessor.

Do you mean it is taking ~5 hours for a single file to be processed at times, or is that for a set of files?

If your JVM heap space is not set large enough, you can get very slow results.
Try increasing to 5G (or more) using the JVM parameter   -Xmx5G
For faster start up, you can also set the -Xms to the same or something close to -Xmx value.

 -- James

On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <hy...@live.unc.edu>> wrote:
Hi All,

When the medical records are run with the AE as AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor the processing is very slow. It is pretty fast when the smaller files (~2 kb) are fed as input but when I am processing with bigger files say, 2Mb, it is very slow and the files are taking ~5 hours to process. Any pointer will be of great help.

Regards,
Harish.


Re: Slowness in processing files

Posted by James Masanz <ma...@gmail.com>.
Using AggregatePlaintextFastUMLSProcessor  is much faster than
AggregatePlainTextProcessor, so I suggest that to start with you just use
AggregatePlaintextFastUMLSProcessor.

Do you mean it is taking ~5 hours for a single file to be processed at
times, or is that for a set of files?

If your JVM heap space is not set large enough, you can get very slow
results.
Try increasing to 5G (or more) using the JVM parameter   -Xmx5G
For faster start up, you can also set the -Xms to the same or something
close to -Xmx value.

 -- James

On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <hy...@live.unc.edu> wrote:

> Hi All,
>
>
>
> When the medical records are run with the AE as
> AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor the
> processing is very slow. It is pretty fast when the smaller files (~2 kb)
> are fed as input but when I am processing with bigger files say, 2Mb, it is
> very slow and the files are taking ~5 hours to process. Any pointer will be
> of great help.
>
>
>
> Regards,
>
> Harish.
>