You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Zakir Saifi <za...@raxa.com> on 2019/02/21 05:18:06 UTC

Making Ctakes Faster after Changing default lookup span value

Hi Everyone,

I am using Ctakes for Structuring some clinical Text. In my clinical text,
there are single characters word like *P 90 (Pulse 90) *etc. I want Ctakes
to detect those. Since the default minimum span detected by Ctakes is 3.
I was not able to detect these concepts. Therefore I have changed the Value
of the _minimumLookupSpan to 1. Now I am able to detect the one character
word using Ctakes after adding them to my Custom Dictionary.

My Problem is that after changing the value of _minimumLookupSpan, ctakes
has become slow.
I am using Ctakes-web-Rest (Rest Service using Ctakes). For lookup span
value of 3 (default), rest call was taking less than 2s for text like (
Systolic blood pressure 180 ) is now taking around 5s.

How can I make Ctakes faster?. Any configuration which helps to improve the
performance without losing the current detection rate.

Here is the content of my current Piper file.

load DefaultFastPipeline
add
org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaDefaultJCasTermAnnotator
LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml
add LabValueFinder
add
org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
LookupXml=org/apache/ctakes/dictionary/lookup/fast/drugConcept.xml
add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
STATUS_BOUNDARY_ANN_TYPE="org.apache.ctakes.typesystem.type.textsem.MedicationMention"
add
org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
LookupXml=org/apache/ctakes/dictionary/lookup/fast/personName.xml
add org.apache.ctakes.raxactakes.core.ae.PersonNameFinder

addDescription EventAnnotator
addLogged BackwardsTimeAnnotator
classifierJarPath=/org/apache/ctakes/temporal/ae/timeannotator/model.jar
addLogged DocTimeRelAnnotator
classifierJarPath=/org/apache/ctakes/temporal/ae/doctimerel/model.jar
addLogged EventTimeRelationAnnotator
classifierJarPath=/org/apache/ctakes/temporal/ae/eventtime/model.jar
addLogged EventEventRelationAnnotator
classifierJarPath=/org/apache/ctakes/temporal/ae/eventevent/model.jar
addLogged ContextualModalityAnnotator
classifierJarPath=/org/apache/ctakes/temporal/ae/contextualmodality/model.jar
addLogged EventAnnotator
classifierJarPath=/org/apache/ctakes/temporal/ae/eventannotator/model.jar

-- 
Regards
Zakir Saifi
(Software Developer at Raxa)

Re: Making Ctakes Faster after Changing default lookup span value [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
> These are 3 tables in the database aiunstructured.
-- Ok.  It might be beneficial to combine them into a single table and just use 1 annotator, especially since that number of total rows (140k) is relatively small.

> I would try to use exclusionTags and minimum span in my piper file
-- Sounds good.

Sean




I am not sure about what would be the size of large dictionary for the
Ctakes. Currently Number of rows in above tables are:-
concepts: 70,000 rows
drug: 30,000 rows
persons: 40,000 rows

For the* RaxaDefaultJcasTermAnnotator *part
It is similar to the
org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator , I have
only changed the value of  _minimumLookupSpan (to 1) protected variable of
AbstractJCasTermAnnotator class.

I would try to use exclusionTags and minimum span in my piper file and
analyse the result. If there is any better way to implement the above
scenario using a *single dictionary instance*. please let me know.

On Fri, Feb 22, 2019 at 6:52 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Zakir,
>
> Thank you for the information.  Just out of curiosity, why did you decide
> to go with 3 xml files instead of 1?  You can combine all of those specs
> into 1 xml and a single instance of the dictionary lookup class will handle
> it.
>
> Without a little more knowledge of the dictionaries that you reference I
> still can't say much.  If they are huge then that i obviously going to
> impact the run time.
>
> I just realized that part of your problem detecting things like "P 90" is
> most likely part of speech tagging.  There is a parameter named
> "exclusionTags" that prevents certain parts of speech such as Verb from
> being used in lookup.  When using the ctakes dictionary lookup you might
> want to change your piper to something like:
>
> //  Do not exclude words of any part of speech tag for dictionary lookup.
> set exclusionTags=""
> //  Use span of 1 for dictionary lookup.
> set minimumSpan=1
> //  Set the path to the xml file containing information for dictionary
> lookup configuration.
> set LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml
> //  Annotate concepts based upon default algorithms.
> add DefaultJCasTermAnnotator
>
> --  though again, you are using something named
> RaxaDefaultJCasTermAnnotator , and I have no idea what that is.
>
> Sean
>
> ________________________________________
> From: Zakir Saifi <za...@raxa.com>
> Sent: Thursday, February 21, 2019 1:54 AM
> To: dev@ctakes.apache.org
> Subject: Re: Making Ctakes Faster after Changing default lookup span value
> [EXTERNAL]
>
> Thanks Sean for early reply,
>
> Here are the content of file you are looking for
>
> *1. tinyDictSpec.xml*
>
> ============
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <lookupSpecification>
>     <dictionaries>
>                 <dictionary>
>                     <name>LabAnnotatorTestDict</name>
>
>
> <implementationName>org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary</implementationName>
>                     <properties>
>                        <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                         <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                       <property key="jdbcUser" value="root"/>
>                       <property key="jdbcPass" value=""/>
>                        <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                        <property key="umlsVendor" value="NLM-6515182895"/>
>                        <property key="umlsUser" value=""/>
>                        <property key="umlsPass" value=""/>
>                        <property key="rareWordTable" value="rareword"/>
>                     </properties>
>                 </dictionary>
>     </dictionaries>
>
>             <conceptFactories>
>                 <conceptFactory>
>                     <name>LabAnnotatorTestConcepts</name>
>
>
> <implementationName>org.apache.ctakes.dictionary.lookup2.concept.UmlsJdbcConceptFactory</implementationName>
>                     <properties>
>                         <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                           <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                       <property key="jdbcUser" value="root"/>
>                       <property key="jdbcPass" value=""/>
>                         <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                         <property key="umlsVendor" value="NLM-6515182895"/>
>                         <property key="umlsUser" value=""/>
>                         <property key="umlsPass" value=""/>
>                         <property key="tuiTable" value="tui"/>
>                     </properties>
>                 </conceptFactory>
>             </conceptFactories>
>
>
>             <dictionaryConceptPairs>
>                 <dictionaryConceptPair>
>                     <name>LabAnnotatorPair</name>
>                     <dictionaryName>LabAnnotatorTestDict</dictionaryName>
>
> <conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName>
>                 </dictionaryConceptPair>
>             </dictionaryConceptPairs>
>
>             <rareWordConsumer>
>                 <name>Term Consumer</name>
>
>
> <implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName>
>                 <properties>
>                     <property key="codingScheme" value="custom"/>
>                 </properties>
>             </rareWordConsumer>
>
> </lookupSpecification>
>
> ===========
>
> *2.  drugConcept.xml*
> <?xml version="1.0" encoding="UTF-8"?>
>
> <lookupSpecification>
>     <dictionaries>
>                 <dictionary>
>                     <name>LabAnnotatorTestDict</name>
>
>
> <implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.dictionary.UmlsJdbcDrugTermsDictonary</implementationName>
>                     <properties>
>                       <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                        <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                       <property key="jdbcUser" value="root"/>
>                       <property key="jdbcPass" value=""/>
>                       <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                       <property key="umlsVendor" value="NLM-6515182895"/>
>                       <property key="umlsUser" value=""/>
>                       <property key="umlsPass" value=""/>
>                       <property key="rareWordTable" value="drug"/>
>                     </properties>
>                 </dictionary>
>     </dictionaries>
>
>             <conceptFactories>
>                 <conceptFactory>
>                     <name>LabAnnotatorTestConcepts</name>
>
>
> <implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.concept.UmlsJdbcDrugNameConceptFactory
> </implementationName>
>                     <properties>
>                        <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                        <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                       <property key="jdbcUser" value="root"/>
>                       <property key="jdbcPass" value=""/>
>                        <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                        <property key="umlsVendor" value="NLM-6515182895"/>
>                        <property key="umlsUser" value=""/>
>                        <property key="umlsPass" value=""/>
>                        <property key="tuiTable" value="tui"/>
>                     </properties>
>                 </conceptFactory>
>             </conceptFactories>
>
>
>             <dictionaryConceptPairs>
>                 <dictionaryConceptPair>
>                     <name>LabAnnotatorPair</name>
>                     <dictionaryName>LabAnnotatorTestDict</dictionaryName>
>
> <conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName>
>                 </dictionaryConceptPair>
>             </dictionaryConceptPairs>
>
>             <rareWordConsumer>
>                 <name>Term Consumer</name>
>
>
> <implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName>
>                 <properties>
>                     <property key="codingScheme" value="custom"/>
>                 </properties>
>             </rareWordConsumer>
> </lookupSpecification>
>
> *=======*
>
> *3. personName.xml*
>
> <?xml version="1.0" encoding="UTF-8"?>
> <lookupSpecification>
>     <dictionaries>
>                 <dictionary>
>                     <name>LabAnnotatorTestDict</name>
>
>
> <implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.dictionary.UmlsJdbcPersonDictionary</implementationName>
>                     <properties>
>                       <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                         <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                        <property key="jdbcUser" value="root"/>
>                        <property key="jdbcPass" value=""/>
>                       <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                        <property key="umlsVendor" value="NLM-6515182895"/>
>                        <property key="umlsUser" value=""/>
>                        <property key="umlsPass" value=""/>
>                        <property key="rareWordTable" value="person_name"/>
>                     </properties>
>                 </dictionary>
>     </dictionaries>
>
>             <conceptFactories>
>                 <conceptFactory>
>                     <name>LabAnnotatorTestConcepts</name>
>
>
> <implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.concept.UmlsJdbcPersonNameConceptFactory</implementationName>
>                     <properties>
>                       <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                        <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                        <property key="jdbcUser" value="root"/>
>                        <property key="jdbcPass" value=""/>
>                       <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                       <property key="umlsVendor" value="NLM-6515182895"/>
>                       <property key="umlsUser" value=""/>
>                       <property key="umlsPass" value=""/>
>                       <property key="tuiTable" value="tui"/>
>                     </properties>
>                 </conceptFactory>
>             </conceptFactories>
>
>             <dictionaryConceptPairs>
>                 <dictionaryConceptPair>
>                     <name>LabAnnotatorPair</name>
>                     <dictionaryName>LabAnnotatorTestDict</dictionaryName>
>
> <conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName>
>                 </dictionaryConceptPair>
>             </dictionaryConceptPairs>
>
>             <rareWordConsumer>
>                 <name>Term Consumer</name>
>
>
> <implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName>
>                 <properties>
>                     <property key="codingScheme" value="custom"/>
>                 </properties>
>             </rareWordConsumer>
> </lookupSpecification>
>
>
>  *RaxaDefaultJcasTermAnnotator* is similar to the
> org.apache.ctakes.dictionary.lookup2.ae.*DefaultJCasTermAnnotator* , I have
> only changed the value of   _minimumLookupSpan (to 1) variable
> of AbstractJCasTermAnnotator.
>
> On Thu, Feb 21, 2019 at 11:41 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Zakir,
> >
> > In order for me to help you, I need to know more about:
> > Your primary dictionary:
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml
> >
> > Your custom dictionary lookup #1:
> > add
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/drugConcept.xml
> >
> > Your custom dictionary lookup #2:
> > add
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/personName.xml
> >
> >
> > As for your metrics,
> > >For lookup span
> > value of 3 (default), rest call was taking less than 2s for text like (
> > Systolic blood pressure 180 ) is now taking around 5s.
> >
> > Does this mean that a document containing such text took 2 seconds, or
> > that averaging over discovered annotations per took 2 seconds?
> >
> > I realize that moving from 3 characters to 1 means that every "a" "to"
> > "in" "of" "an" "1" "2" ... is used for lookup.  However, that should not
> > multiply the processing time *2.5
> >
> >
> > I have to wonder if the non-ctakes
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae
> > .RaxaDefaultJCasTermAnnotator
> > is doing something suspect.
> >
> >
> > Sean
> >
> >
> > ________________________________________
> > From: Zakir Saifi <za...@raxa.com>
> > Sent: Thursday, February 21, 2019 12:18 AM
> > To: dev@ctakes.apache.org
> > Subject: Making Ctakes Faster after Changing default lookup span value
> > [EXTERNAL]
> >
> > Hi Everyone,
> >
> > I am using Ctakes for Structuring some clinical Text. In my clinical
> text,
> > there are single characters word like *P 90 (Pulse 90) *etc. I want
> Ctakes
> > to detect those. Since the default minimum span detected by Ctakes is 3.
> > I was not able to detect these concepts. Therefore I have changed the
> Value
> > of the _minimumLookupSpan to 1. Now I am able to detect the one character
> > word using Ctakes after adding them to my Custom Dictionary.
> >
> > My Problem is that after changing the value of _minimumLookupSpan, ctakes
> > has become slow.
> > I am using Ctakes-web-Rest (Rest Service using Ctakes). For lookup span
> > value of 3 (default), rest call was taking less than 2s for text like (
> > Systolic blood pressure 180 ) is now taking around 5s.
> >
> > How can I make Ctakes faster?. Any configuration which helps to improve
> the
> > performance without losing the current detection rate.
> >
> > Here is the content of my current Piper file.
> >
> > load DefaultFastPipeline
> > add
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae
> > .RaxaDefaultJCasTermAnnotator
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml
> > add LabValueFinder
> > add
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/drugConcept.xml
> > add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
> >
> >
> STATUS_BOUNDARY_ANN_TYPE="org.apache.ctakes.typesystem.type.textsem.MedicationMention"
> > add
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/personName.xml
> > add org.apache.ctakes.raxactakes.core.ae.PersonNameFinder
> >
> > addDescription EventAnnotator
> > addLogged BackwardsTimeAnnotator
> > classifierJarPath=/org/apache/ctakes/temporal/ae/timeannotator/model.jar
> > addLogged DocTimeRelAnnotator
> > classifierJarPath=/org/apache/ctakes/temporal/ae/doctimerel/model.jar
> > addLogged EventTimeRelationAnnotator
> > classifierJarPath=/org/apache/ctakes/temporal/ae/eventtime/model.jar
> > addLogged EventEventRelationAnnotator
> > classifierJarPath=/org/apache/ctakes/temporal/ae/eventevent/model.jar
> > addLogged ContextualModalityAnnotator
> >
> >
> classifierJarPath=/org/apache/ctakes/temporal/ae/contextualmodality/model.jar
> > addLogged EventAnnotator
> > classifierJarPath=/org/apache/ctakes/temporal/ae/eventannotator/model.jar
> >
> > --
> > Regards
> > Zakir Saifi
> > (Software Developer at Raxa)
> >
>
>
> --
> Regards
> Zakir Saifi
> (Software Developer at Raxa)
>


--
Regards
Zakir Saifi
(Software Developer at Raxa)

Re: Making Ctakes Faster after Changing default lookup span value [EXTERNAL]

Posted by Zakir Saifi <za...@raxa.com>.
Thank you for the suggestion. I am using 3 XML files because  3 XML files
referring to the different tables in the database.
First one is for *concepts* and the second is for the *drugs *and the third
is for *persons*. These are 3 tables in the database aiunstructured.

I am not sure about what would be the size of large dictionary for the
Ctakes. Currently Number of rows in above tables are:-
concepts: 70,000 rows
drug: 30,000 rows
persons: 40,000 rows

For the* RaxaDefaultJcasTermAnnotator *part
It is similar to the
org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator , I have
only changed the value of  _minimumLookupSpan (to 1) protected variable of
AbstractJCasTermAnnotator class.

I would try to use exclusionTags and minimum span in my piper file and
analyse the result. If there is any better way to implement the above
scenario using a *single dictionary instance*. please let me know.

On Fri, Feb 22, 2019 at 6:52 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Zakir,
>
> Thank you for the information.  Just out of curiosity, why did you decide
> to go with 3 xml files instead of 1?  You can combine all of those specs
> into 1 xml and a single instance of the dictionary lookup class will handle
> it.
>
> Without a little more knowledge of the dictionaries that you reference I
> still can't say much.  If they are huge then that i obviously going to
> impact the run time.
>
> I just realized that part of your problem detecting things like "P 90" is
> most likely part of speech tagging.  There is a parameter named
> "exclusionTags" that prevents certain parts of speech such as Verb from
> being used in lookup.  When using the ctakes dictionary lookup you might
> want to change your piper to something like:
>
> //  Do not exclude words of any part of speech tag for dictionary lookup.
> set exclusionTags=""
> //  Use span of 1 for dictionary lookup.
> set minimumSpan=1
> //  Set the path to the xml file containing information for dictionary
> lookup configuration.
> set LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml
> //  Annotate concepts based upon default algorithms.
> add DefaultJCasTermAnnotator
>
> --  though again, you are using something named
> RaxaDefaultJCasTermAnnotator , and I have no idea what that is.
>
> Sean
>
> ________________________________________
> From: Zakir Saifi <za...@raxa.com>
> Sent: Thursday, February 21, 2019 1:54 AM
> To: dev@ctakes.apache.org
> Subject: Re: Making Ctakes Faster after Changing default lookup span value
> [EXTERNAL]
>
> Thanks Sean for early reply,
>
> Here are the content of file you are looking for
>
> *1. tinyDictSpec.xml*
>
> ============
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <lookupSpecification>
>     <dictionaries>
>                 <dictionary>
>                     <name>LabAnnotatorTestDict</name>
>
>
> <implementationName>org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary</implementationName>
>                     <properties>
>                        <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                         <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                       <property key="jdbcUser" value="root"/>
>                       <property key="jdbcPass" value=""/>
>                        <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                        <property key="umlsVendor" value="NLM-6515182895"/>
>                        <property key="umlsUser" value=""/>
>                        <property key="umlsPass" value=""/>
>                        <property key="rareWordTable" value="rareword"/>
>                     </properties>
>                 </dictionary>
>     </dictionaries>
>
>             <conceptFactories>
>                 <conceptFactory>
>                     <name>LabAnnotatorTestConcepts</name>
>
>
> <implementationName>org.apache.ctakes.dictionary.lookup2.concept.UmlsJdbcConceptFactory</implementationName>
>                     <properties>
>                         <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                           <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                       <property key="jdbcUser" value="root"/>
>                       <property key="jdbcPass" value=""/>
>                         <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                         <property key="umlsVendor" value="NLM-6515182895"/>
>                         <property key="umlsUser" value=""/>
>                         <property key="umlsPass" value=""/>
>                         <property key="tuiTable" value="tui"/>
>                     </properties>
>                 </conceptFactory>
>             </conceptFactories>
>
>
>             <dictionaryConceptPairs>
>                 <dictionaryConceptPair>
>                     <name>LabAnnotatorPair</name>
>                     <dictionaryName>LabAnnotatorTestDict</dictionaryName>
>
> <conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName>
>                 </dictionaryConceptPair>
>             </dictionaryConceptPairs>
>
>             <rareWordConsumer>
>                 <name>Term Consumer</name>
>
>
> <implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName>
>                 <properties>
>                     <property key="codingScheme" value="custom"/>
>                 </properties>
>             </rareWordConsumer>
>
> </lookupSpecification>
>
> ===========
>
> *2.  drugConcept.xml*
> <?xml version="1.0" encoding="UTF-8"?>
>
> <lookupSpecification>
>     <dictionaries>
>                 <dictionary>
>                     <name>LabAnnotatorTestDict</name>
>
>
> <implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.dictionary.UmlsJdbcDrugTermsDictonary</implementationName>
>                     <properties>
>                       <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                        <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                       <property key="jdbcUser" value="root"/>
>                       <property key="jdbcPass" value=""/>
>                       <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                       <property key="umlsVendor" value="NLM-6515182895"/>
>                       <property key="umlsUser" value=""/>
>                       <property key="umlsPass" value=""/>
>                       <property key="rareWordTable" value="drug"/>
>                     </properties>
>                 </dictionary>
>     </dictionaries>
>
>             <conceptFactories>
>                 <conceptFactory>
>                     <name>LabAnnotatorTestConcepts</name>
>
>
> <implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.concept.UmlsJdbcDrugNameConceptFactory
> </implementationName>
>                     <properties>
>                        <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                        <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                       <property key="jdbcUser" value="root"/>
>                       <property key="jdbcPass" value=""/>
>                        <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                        <property key="umlsVendor" value="NLM-6515182895"/>
>                        <property key="umlsUser" value=""/>
>                        <property key="umlsPass" value=""/>
>                        <property key="tuiTable" value="tui"/>
>                     </properties>
>                 </conceptFactory>
>             </conceptFactories>
>
>
>             <dictionaryConceptPairs>
>                 <dictionaryConceptPair>
>                     <name>LabAnnotatorPair</name>
>                     <dictionaryName>LabAnnotatorTestDict</dictionaryName>
>
> <conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName>
>                 </dictionaryConceptPair>
>             </dictionaryConceptPairs>
>
>             <rareWordConsumer>
>                 <name>Term Consumer</name>
>
>
> <implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName>
>                 <properties>
>                     <property key="codingScheme" value="custom"/>
>                 </properties>
>             </rareWordConsumer>
> </lookupSpecification>
>
> *=======*
>
> *3. personName.xml*
>
> <?xml version="1.0" encoding="UTF-8"?>
> <lookupSpecification>
>     <dictionaries>
>                 <dictionary>
>                     <name>LabAnnotatorTestDict</name>
>
>
> <implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.dictionary.UmlsJdbcPersonDictionary</implementationName>
>                     <properties>
>                       <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                         <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                        <property key="jdbcUser" value="root"/>
>                        <property key="jdbcPass" value=""/>
>                       <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                        <property key="umlsVendor" value="NLM-6515182895"/>
>                        <property key="umlsUser" value=""/>
>                        <property key="umlsPass" value=""/>
>                        <property key="rareWordTable" value="person_name"/>
>                     </properties>
>                 </dictionary>
>     </dictionaries>
>
>             <conceptFactories>
>                 <conceptFactory>
>                     <name>LabAnnotatorTestConcepts</name>
>
>
> <implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.concept.UmlsJdbcPersonNameConceptFactory</implementationName>
>                     <properties>
>                       <property key="jdbcDriver"
> value="com.mysql.jdbc.Driver"/>
>                        <property key="jdbcUrl"
>
> value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
>                        <property key="jdbcUser" value="root"/>
>                        <property key="jdbcPass" value=""/>
>                       <property key="umlsUrl" value="
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=
> >
>                       <property key="umlsVendor" value="NLM-6515182895"/>
>                       <property key="umlsUser" value=""/>
>                       <property key="umlsPass" value=""/>
>                       <property key="tuiTable" value="tui"/>
>                     </properties>
>                 </conceptFactory>
>             </conceptFactories>
>
>             <dictionaryConceptPairs>
>                 <dictionaryConceptPair>
>                     <name>LabAnnotatorPair</name>
>                     <dictionaryName>LabAnnotatorTestDict</dictionaryName>
>
> <conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName>
>                 </dictionaryConceptPair>
>             </dictionaryConceptPairs>
>
>             <rareWordConsumer>
>                 <name>Term Consumer</name>
>
>
> <implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName>
>                 <properties>
>                     <property key="codingScheme" value="custom"/>
>                 </properties>
>             </rareWordConsumer>
> </lookupSpecification>
>
>
>  *RaxaDefaultJcasTermAnnotator* is similar to the
> org.apache.ctakes.dictionary.lookup2.ae.*DefaultJCasTermAnnotator* , I have
> only changed the value of   _minimumLookupSpan (to 1) variable
> of AbstractJCasTermAnnotator.
>
> On Thu, Feb 21, 2019 at 11:41 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Zakir,
> >
> > In order for me to help you, I need to know more about:
> > Your primary dictionary:
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml
> >
> > Your custom dictionary lookup #1:
> > add
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/drugConcept.xml
> >
> > Your custom dictionary lookup #2:
> > add
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/personName.xml
> >
> >
> > As for your metrics,
> > >For lookup span
> > value of 3 (default), rest call was taking less than 2s for text like (
> > Systolic blood pressure 180 ) is now taking around 5s.
> >
> > Does this mean that a document containing such text took 2 seconds, or
> > that averaging over discovered annotations per took 2 seconds?
> >
> > I realize that moving from 3 characters to 1 means that every "a" "to"
> > "in" "of" "an" "1" "2" ... is used for lookup.  However, that should not
> > multiply the processing time *2.5
> >
> >
> > I have to wonder if the non-ctakes
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae
> > .RaxaDefaultJCasTermAnnotator
> > is doing something suspect.
> >
> >
> > Sean
> >
> >
> > ________________________________________
> > From: Zakir Saifi <za...@raxa.com>
> > Sent: Thursday, February 21, 2019 12:18 AM
> > To: dev@ctakes.apache.org
> > Subject: Making Ctakes Faster after Changing default lookup span value
> > [EXTERNAL]
> >
> > Hi Everyone,
> >
> > I am using Ctakes for Structuring some clinical Text. In my clinical
> text,
> > there are single characters word like *P 90 (Pulse 90) *etc. I want
> Ctakes
> > to detect those. Since the default minimum span detected by Ctakes is 3.
> > I was not able to detect these concepts. Therefore I have changed the
> Value
> > of the _minimumLookupSpan to 1. Now I am able to detect the one character
> > word using Ctakes after adding them to my Custom Dictionary.
> >
> > My Problem is that after changing the value of _minimumLookupSpan, ctakes
> > has become slow.
> > I am using Ctakes-web-Rest (Rest Service using Ctakes). For lookup span
> > value of 3 (default), rest call was taking less than 2s for text like (
> > Systolic blood pressure 180 ) is now taking around 5s.
> >
> > How can I make Ctakes faster?. Any configuration which helps to improve
> the
> > performance without losing the current detection rate.
> >
> > Here is the content of my current Piper file.
> >
> > load DefaultFastPipeline
> > add
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae
> > .RaxaDefaultJCasTermAnnotator
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml
> > add LabValueFinder
> > add
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/drugConcept.xml
> > add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
> >
> >
> STATUS_BOUNDARY_ANN_TYPE="org.apache.ctakes.typesystem.type.textsem.MedicationMention"
> > add
> > org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> > LookupXml=org/apache/ctakes/dictionary/lookup/fast/personName.xml
> > add org.apache.ctakes.raxactakes.core.ae.PersonNameFinder
> >
> > addDescription EventAnnotator
> > addLogged BackwardsTimeAnnotator
> > classifierJarPath=/org/apache/ctakes/temporal/ae/timeannotator/model.jar
> > addLogged DocTimeRelAnnotator
> > classifierJarPath=/org/apache/ctakes/temporal/ae/doctimerel/model.jar
> > addLogged EventTimeRelationAnnotator
> > classifierJarPath=/org/apache/ctakes/temporal/ae/eventtime/model.jar
> > addLogged EventEventRelationAnnotator
> > classifierJarPath=/org/apache/ctakes/temporal/ae/eventevent/model.jar
> > addLogged ContextualModalityAnnotator
> >
> >
> classifierJarPath=/org/apache/ctakes/temporal/ae/contextualmodality/model.jar
> > addLogged EventAnnotator
> > classifierJarPath=/org/apache/ctakes/temporal/ae/eventannotator/model.jar
> >
> > --
> > Regards
> > Zakir Saifi
> > (Software Developer at Raxa)
> >
>
>
> --
> Regards
> Zakir Saifi
> (Software Developer at Raxa)
>


-- 
Regards
Zakir Saifi
(Software Developer at Raxa)

Re: Making Ctakes Faster after Changing default lookup span value [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Zakir,

Thank you for the information.  Just out of curiosity, why did you decide to go with 3 xml files instead of 1?  You can combine all of those specs into 1 xml and a single instance of the dictionary lookup class will handle it.

Without a little more knowledge of the dictionaries that you reference I still can't say much.  If they are huge then that i obviously going to impact the run time.

I just realized that part of your problem detecting things like "P 90" is most likely part of speech tagging.  There is a parameter named "exclusionTags" that prevents certain parts of speech such as Verb from being used in lookup.  When using the ctakes dictionary lookup you might want to change your piper to something like:

//  Do not exclude words of any part of speech tag for dictionary lookup.
set exclusionTags=""
//  Use span of 1 for dictionary lookup.
set minimumSpan=1
//  Set the path to the xml file containing information for dictionary lookup configuration.
set LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml
//  Annotate concepts based upon default algorithms.
add DefaultJCasTermAnnotator

--  though again, you are using something named RaxaDefaultJCasTermAnnotator , and I have no idea what that is.

Sean

________________________________________
From: Zakir Saifi <za...@raxa.com>
Sent: Thursday, February 21, 2019 1:54 AM
To: dev@ctakes.apache.org
Subject: Re: Making Ctakes Faster after Changing default lookup span value [EXTERNAL]

Thanks Sean for early reply,

Here are the content of file you are looking for

*1. tinyDictSpec.xml*

============

<?xml version="1.0" encoding="UTF-8"?>

<lookupSpecification>
    <dictionaries>
                <dictionary>
                    <name>LabAnnotatorTestDict</name>

<implementationName>org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary</implementationName>
                    <properties>
                       <property key="jdbcDriver"
value="com.mysql.jdbc.Driver"/>
                        <property key="jdbcUrl"
value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
                      <property key="jdbcUser" value="root"/>
                      <property key="jdbcPass" value=""/>
                       <property key="umlsUrl" value="
https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=>
                       <property key="umlsVendor" value="NLM-6515182895"/>
                       <property key="umlsUser" value=""/>
                       <property key="umlsPass" value=""/>
                       <property key="rareWordTable" value="rareword"/>
                    </properties>
                </dictionary>
    </dictionaries>

            <conceptFactories>
                <conceptFactory>
                    <name>LabAnnotatorTestConcepts</name>

<implementationName>org.apache.ctakes.dictionary.lookup2.concept.UmlsJdbcConceptFactory</implementationName>
                    <properties>
                        <property key="jdbcDriver"
value="com.mysql.jdbc.Driver"/>
                          <property key="jdbcUrl"
value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
                      <property key="jdbcUser" value="root"/>
                      <property key="jdbcPass" value=""/>
                        <property key="umlsUrl" value="
https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=>
                        <property key="umlsVendor" value="NLM-6515182895"/>
                        <property key="umlsUser" value=""/>
                        <property key="umlsPass" value=""/>
                        <property key="tuiTable" value="tui"/>
                    </properties>
                </conceptFactory>
            </conceptFactories>


            <dictionaryConceptPairs>
                <dictionaryConceptPair>
                    <name>LabAnnotatorPair</name>
                    <dictionaryName>LabAnnotatorTestDict</dictionaryName>

<conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName>
                </dictionaryConceptPair>
            </dictionaryConceptPairs>

            <rareWordConsumer>
                <name>Term Consumer</name>

<implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName>
                <properties>
                    <property key="codingScheme" value="custom"/>
                </properties>
            </rareWordConsumer>

</lookupSpecification>

===========

*2.  drugConcept.xml*
<?xml version="1.0" encoding="UTF-8"?>

<lookupSpecification>
    <dictionaries>
                <dictionary>
                    <name>LabAnnotatorTestDict</name>

<implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.dictionary.UmlsJdbcDrugTermsDictonary</implementationName>
                    <properties>
                      <property key="jdbcDriver"
value="com.mysql.jdbc.Driver"/>
                       <property key="jdbcUrl"
value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
                      <property key="jdbcUser" value="root"/>
                      <property key="jdbcPass" value=""/>
                      <property key="umlsUrl" value="
https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=>
                      <property key="umlsVendor" value="NLM-6515182895"/>
                      <property key="umlsUser" value=""/>
                      <property key="umlsPass" value=""/>
                      <property key="rareWordTable" value="drug"/>
                    </properties>
                </dictionary>
    </dictionaries>

            <conceptFactories>
                <conceptFactory>
                    <name>LabAnnotatorTestConcepts</name>

<implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.concept.UmlsJdbcDrugNameConceptFactory
</implementationName>
                    <properties>
                       <property key="jdbcDriver"
value="com.mysql.jdbc.Driver"/>
                       <property key="jdbcUrl"
value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
                      <property key="jdbcUser" value="root"/>
                      <property key="jdbcPass" value=""/>
                       <property key="umlsUrl" value="
https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=>
                       <property key="umlsVendor" value="NLM-6515182895"/>
                       <property key="umlsUser" value=""/>
                       <property key="umlsPass" value=""/>
                       <property key="tuiTable" value="tui"/>
                    </properties>
                </conceptFactory>
            </conceptFactories>


            <dictionaryConceptPairs>
                <dictionaryConceptPair>
                    <name>LabAnnotatorPair</name>
                    <dictionaryName>LabAnnotatorTestDict</dictionaryName>

<conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName>
                </dictionaryConceptPair>
            </dictionaryConceptPairs>

            <rareWordConsumer>
                <name>Term Consumer</name>

<implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName>
                <properties>
                    <property key="codingScheme" value="custom"/>
                </properties>
            </rareWordConsumer>
</lookupSpecification>

*=======*

*3. personName.xml*

<?xml version="1.0" encoding="UTF-8"?>
<lookupSpecification>
    <dictionaries>
                <dictionary>
                    <name>LabAnnotatorTestDict</name>

<implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.dictionary.UmlsJdbcPersonDictionary</implementationName>
                    <properties>
                      <property key="jdbcDriver"
value="com.mysql.jdbc.Driver"/>
                        <property key="jdbcUrl"
value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
                       <property key="jdbcUser" value="root"/>
                       <property key="jdbcPass" value=""/>
                      <property key="umlsUrl" value="
https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=>
                       <property key="umlsVendor" value="NLM-6515182895"/>
                       <property key="umlsUser" value=""/>
                       <property key="umlsPass" value=""/>
                       <property key="rareWordTable" value="person_name"/>
                    </properties>
                </dictionary>
    </dictionaries>

            <conceptFactories>
                <conceptFactory>
                    <name>LabAnnotatorTestConcepts</name>

<implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.concept.UmlsJdbcPersonNameConceptFactory</implementationName>
                    <properties>
                      <property key="jdbcDriver"
value="com.mysql.jdbc.Driver"/>
                       <property key="jdbcUrl"
value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
                       <property key="jdbcUser" value="root"/>
                       <property key="jdbcPass" value=""/>
                      <property key="umlsUrl" value="
https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e=>
                      <property key="umlsVendor" value="NLM-6515182895"/>
                      <property key="umlsUser" value=""/>
                      <property key="umlsPass" value=""/>
                      <property key="tuiTable" value="tui"/>
                    </properties>
                </conceptFactory>
            </conceptFactories>

            <dictionaryConceptPairs>
                <dictionaryConceptPair>
                    <name>LabAnnotatorPair</name>
                    <dictionaryName>LabAnnotatorTestDict</dictionaryName>

<conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName>
                </dictionaryConceptPair>
            </dictionaryConceptPairs>

            <rareWordConsumer>
                <name>Term Consumer</name>

<implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName>
                <properties>
                    <property key="codingScheme" value="custom"/>
                </properties>
            </rareWordConsumer>
</lookupSpecification>


 *RaxaDefaultJcasTermAnnotator* is similar to the
org.apache.ctakes.dictionary.lookup2.ae.*DefaultJCasTermAnnotator* , I have
only changed the value of   _minimumLookupSpan (to 1) variable
of AbstractJCasTermAnnotator.

On Thu, Feb 21, 2019 at 11:41 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Zakir,
>
> In order for me to help you, I need to know more about:
> Your primary dictionary:
> LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml
>
> Your custom dictionary lookup #1:
> add
> org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> LookupXml=org/apache/ctakes/dictionary/lookup/fast/drugConcept.xml
>
> Your custom dictionary lookup #2:
> add
> org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> LookupXml=org/apache/ctakes/dictionary/lookup/fast/personName.xml
>
>
> As for your metrics,
> >For lookup span
> value of 3 (default), rest call was taking less than 2s for text like (
> Systolic blood pressure 180 ) is now taking around 5s.
>
> Does this mean that a document containing such text took 2 seconds, or
> that averaging over discovered annotations per took 2 seconds?
>
> I realize that moving from 3 characters to 1 means that every "a" "to"
> "in" "of" "an" "1" "2" ... is used for lookup.  However, that should not
> multiply the processing time *2.5
>
>
> I have to wonder if the non-ctakes
> org.apache.ctakes.raxactakes.dictionary.lookup2.ae
> .RaxaDefaultJCasTermAnnotator
> is doing something suspect.
>
>
> Sean
>
>
> ________________________________________
> From: Zakir Saifi <za...@raxa.com>
> Sent: Thursday, February 21, 2019 12:18 AM
> To: dev@ctakes.apache.org
> Subject: Making Ctakes Faster after Changing default lookup span value
> [EXTERNAL]
>
> Hi Everyone,
>
> I am using Ctakes for Structuring some clinical Text. In my clinical text,
> there are single characters word like *P 90 (Pulse 90) *etc. I want Ctakes
> to detect those. Since the default minimum span detected by Ctakes is 3.
> I was not able to detect these concepts. Therefore I have changed the Value
> of the _minimumLookupSpan to 1. Now I am able to detect the one character
> word using Ctakes after adding them to my Custom Dictionary.
>
> My Problem is that after changing the value of _minimumLookupSpan, ctakes
> has become slow.
> I am using Ctakes-web-Rest (Rest Service using Ctakes). For lookup span
> value of 3 (default), rest call was taking less than 2s for text like (
> Systolic blood pressure 180 ) is now taking around 5s.
>
> How can I make Ctakes faster?. Any configuration which helps to improve the
> performance without losing the current detection rate.
>
> Here is the content of my current Piper file.
>
> load DefaultFastPipeline
> add
> org.apache.ctakes.raxactakes.dictionary.lookup2.ae
> .RaxaDefaultJCasTermAnnotator
> LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml
> add LabValueFinder
> add
> org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> LookupXml=org/apache/ctakes/dictionary/lookup/fast/drugConcept.xml
> add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
>
> STATUS_BOUNDARY_ANN_TYPE="org.apache.ctakes.typesystem.type.textsem.MedicationMention"
> add
> org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> LookupXml=org/apache/ctakes/dictionary/lookup/fast/personName.xml
> add org.apache.ctakes.raxactakes.core.ae.PersonNameFinder
>
> addDescription EventAnnotator
> addLogged BackwardsTimeAnnotator
> classifierJarPath=/org/apache/ctakes/temporal/ae/timeannotator/model.jar
> addLogged DocTimeRelAnnotator
> classifierJarPath=/org/apache/ctakes/temporal/ae/doctimerel/model.jar
> addLogged EventTimeRelationAnnotator
> classifierJarPath=/org/apache/ctakes/temporal/ae/eventtime/model.jar
> addLogged EventEventRelationAnnotator
> classifierJarPath=/org/apache/ctakes/temporal/ae/eventevent/model.jar
> addLogged ContextualModalityAnnotator
>
> classifierJarPath=/org/apache/ctakes/temporal/ae/contextualmodality/model.jar
> addLogged EventAnnotator
> classifierJarPath=/org/apache/ctakes/temporal/ae/eventannotator/model.jar
>
> --
> Regards
> Zakir Saifi
> (Software Developer at Raxa)
>


--
Regards
Zakir Saifi
(Software Developer at Raxa)

Re: Making Ctakes Faster after Changing default lookup span value [EXTERNAL]

Posted by Zakir Saifi <za...@raxa.com>.
Thanks Sean for early reply,

Here are the content of file you are looking for

*1. tinyDictSpec.xml*

============

<?xml version="1.0" encoding="UTF-8"?>

<lookupSpecification>
    <dictionaries>
                <dictionary>
                    <name>LabAnnotatorTestDict</name>

<implementationName>org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary</implementationName>
                    <properties>
                       <property key="jdbcDriver"
value="com.mysql.jdbc.Driver"/>
                        <property key="jdbcUrl"
value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
                      <property key="jdbcUser" value="root"/>
                      <property key="jdbcPass" value=""/>
                       <property key="umlsUrl" value="
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser"/>
                       <property key="umlsVendor" value="NLM-6515182895"/>
                       <property key="umlsUser" value=""/>
                       <property key="umlsPass" value=""/>
                       <property key="rareWordTable" value="rareword"/>
                    </properties>
                </dictionary>
    </dictionaries>

            <conceptFactories>
                <conceptFactory>
                    <name>LabAnnotatorTestConcepts</name>

<implementationName>org.apache.ctakes.dictionary.lookup2.concept.UmlsJdbcConceptFactory</implementationName>
                    <properties>
                        <property key="jdbcDriver"
value="com.mysql.jdbc.Driver"/>
                          <property key="jdbcUrl"
value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
                      <property key="jdbcUser" value="root"/>
                      <property key="jdbcPass" value=""/>
                        <property key="umlsUrl" value="
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser"/>
                        <property key="umlsVendor" value="NLM-6515182895"/>
                        <property key="umlsUser" value=""/>
                        <property key="umlsPass" value=""/>
                        <property key="tuiTable" value="tui"/>
                    </properties>
                </conceptFactory>
            </conceptFactories>


            <dictionaryConceptPairs>
                <dictionaryConceptPair>
                    <name>LabAnnotatorPair</name>
                    <dictionaryName>LabAnnotatorTestDict</dictionaryName>

<conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName>
                </dictionaryConceptPair>
            </dictionaryConceptPairs>

            <rareWordConsumer>
                <name>Term Consumer</name>

<implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName>
                <properties>
                    <property key="codingScheme" value="custom"/>
                </properties>
            </rareWordConsumer>

</lookupSpecification>

===========

*2.  drugConcept.xml*
<?xml version="1.0" encoding="UTF-8"?>

<lookupSpecification>
    <dictionaries>
                <dictionary>
                    <name>LabAnnotatorTestDict</name>

<implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.dictionary.UmlsJdbcDrugTermsDictonary</implementationName>
                    <properties>
                      <property key="jdbcDriver"
value="com.mysql.jdbc.Driver"/>
                       <property key="jdbcUrl"
value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
                      <property key="jdbcUser" value="root"/>
                      <property key="jdbcPass" value=""/>
                      <property key="umlsUrl" value="
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser"/>
                      <property key="umlsVendor" value="NLM-6515182895"/>
                      <property key="umlsUser" value=""/>
                      <property key="umlsPass" value=""/>
                      <property key="rareWordTable" value="drug"/>
                    </properties>
                </dictionary>
    </dictionaries>

            <conceptFactories>
                <conceptFactory>
                    <name>LabAnnotatorTestConcepts</name>

<implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.concept.UmlsJdbcDrugNameConceptFactory
</implementationName>
                    <properties>
                       <property key="jdbcDriver"
value="com.mysql.jdbc.Driver"/>
                       <property key="jdbcUrl"
value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
                      <property key="jdbcUser" value="root"/>
                      <property key="jdbcPass" value=""/>
                       <property key="umlsUrl" value="
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser"/>
                       <property key="umlsVendor" value="NLM-6515182895"/>
                       <property key="umlsUser" value=""/>
                       <property key="umlsPass" value=""/>
                       <property key="tuiTable" value="tui"/>
                    </properties>
                </conceptFactory>
            </conceptFactories>


            <dictionaryConceptPairs>
                <dictionaryConceptPair>
                    <name>LabAnnotatorPair</name>
                    <dictionaryName>LabAnnotatorTestDict</dictionaryName>

<conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName>
                </dictionaryConceptPair>
            </dictionaryConceptPairs>

            <rareWordConsumer>
                <name>Term Consumer</name>

<implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName>
                <properties>
                    <property key="codingScheme" value="custom"/>
                </properties>
            </rareWordConsumer>
</lookupSpecification>

*=======*

*3. personName.xml*

<?xml version="1.0" encoding="UTF-8"?>
<lookupSpecification>
    <dictionaries>
                <dictionary>
                    <name>LabAnnotatorTestDict</name>

<implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.dictionary.UmlsJdbcPersonDictionary</implementationName>
                    <properties>
                      <property key="jdbcDriver"
value="com.mysql.jdbc.Driver"/>
                        <property key="jdbcUrl"
value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
                       <property key="jdbcUser" value="root"/>
                       <property key="jdbcPass" value=""/>
                      <property key="umlsUrl" value="
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser"/>
                       <property key="umlsVendor" value="NLM-6515182895"/>
                       <property key="umlsUser" value=""/>
                       <property key="umlsPass" value=""/>
                       <property key="rareWordTable" value="person_name"/>
                    </properties>
                </dictionary>
    </dictionaries>

            <conceptFactories>
                <conceptFactory>
                    <name>LabAnnotatorTestConcepts</name>

<implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.concept.UmlsJdbcPersonNameConceptFactory</implementationName>
                    <properties>
                      <property key="jdbcDriver"
value="com.mysql.jdbc.Driver"/>
                       <property key="jdbcUrl"
value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;autoReconnect=true"/>
                       <property key="jdbcUser" value="root"/>
                       <property key="jdbcPass" value=""/>
                      <property key="umlsUrl" value="
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser"/>
                      <property key="umlsVendor" value="NLM-6515182895"/>
                      <property key="umlsUser" value=""/>
                      <property key="umlsPass" value=""/>
                      <property key="tuiTable" value="tui"/>
                    </properties>
                </conceptFactory>
            </conceptFactories>

            <dictionaryConceptPairs>
                <dictionaryConceptPair>
                    <name>LabAnnotatorPair</name>
                    <dictionaryName>LabAnnotatorTestDict</dictionaryName>

<conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName>
                </dictionaryConceptPair>
            </dictionaryConceptPairs>

            <rareWordConsumer>
                <name>Term Consumer</name>

<implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName>
                <properties>
                    <property key="codingScheme" value="custom"/>
                </properties>
            </rareWordConsumer>
</lookupSpecification>


 *RaxaDefaultJcasTermAnnotator* is similar to the
org.apache.ctakes.dictionary.lookup2.ae.*DefaultJCasTermAnnotator* , I have
only changed the value of   _minimumLookupSpan (to 1) variable
of AbstractJCasTermAnnotator.

On Thu, Feb 21, 2019 at 11:41 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Zakir,
>
> In order for me to help you, I need to know more about:
> Your primary dictionary:
> LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml
>
> Your custom dictionary lookup #1:
> add
> org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> LookupXml=org/apache/ctakes/dictionary/lookup/fast/drugConcept.xml
>
> Your custom dictionary lookup #2:
> add
> org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> LookupXml=org/apache/ctakes/dictionary/lookup/fast/personName.xml
>
>
> As for your metrics,
> >For lookup span
> value of 3 (default), rest call was taking less than 2s for text like (
> Systolic blood pressure 180 ) is now taking around 5s.
>
> Does this mean that a document containing such text took 2 seconds, or
> that averaging over discovered annotations per took 2 seconds?
>
> I realize that moving from 3 characters to 1 means that every "a" "to"
> "in" "of" "an" "1" "2" ... is used for lookup.  However, that should not
> multiply the processing time *2.5
>
>
> I have to wonder if the non-ctakes
> org.apache.ctakes.raxactakes.dictionary.lookup2.ae
> .RaxaDefaultJCasTermAnnotator
> is doing something suspect.
>
>
> Sean
>
>
> ________________________________________
> From: Zakir Saifi <za...@raxa.com>
> Sent: Thursday, February 21, 2019 12:18 AM
> To: dev@ctakes.apache.org
> Subject: Making Ctakes Faster after Changing default lookup span value
> [EXTERNAL]
>
> Hi Everyone,
>
> I am using Ctakes for Structuring some clinical Text. In my clinical text,
> there are single characters word like *P 90 (Pulse 90) *etc. I want Ctakes
> to detect those. Since the default minimum span detected by Ctakes is 3.
> I was not able to detect these concepts. Therefore I have changed the Value
> of the _minimumLookupSpan to 1. Now I am able to detect the one character
> word using Ctakes after adding them to my Custom Dictionary.
>
> My Problem is that after changing the value of _minimumLookupSpan, ctakes
> has become slow.
> I am using Ctakes-web-Rest (Rest Service using Ctakes). For lookup span
> value of 3 (default), rest call was taking less than 2s for text like (
> Systolic blood pressure 180 ) is now taking around 5s.
>
> How can I make Ctakes faster?. Any configuration which helps to improve the
> performance without losing the current detection rate.
>
> Here is the content of my current Piper file.
>
> load DefaultFastPipeline
> add
> org.apache.ctakes.raxactakes.dictionary.lookup2.ae
> .RaxaDefaultJCasTermAnnotator
> LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml
> add LabValueFinder
> add
> org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> LookupXml=org/apache/ctakes/dictionary/lookup/fast/drugConcept.xml
> add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
>
> STATUS_BOUNDARY_ANN_TYPE="org.apache.ctakes.typesystem.type.textsem.MedicationMention"
> add
> org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
> LookupXml=org/apache/ctakes/dictionary/lookup/fast/personName.xml
> add org.apache.ctakes.raxactakes.core.ae.PersonNameFinder
>
> addDescription EventAnnotator
> addLogged BackwardsTimeAnnotator
> classifierJarPath=/org/apache/ctakes/temporal/ae/timeannotator/model.jar
> addLogged DocTimeRelAnnotator
> classifierJarPath=/org/apache/ctakes/temporal/ae/doctimerel/model.jar
> addLogged EventTimeRelationAnnotator
> classifierJarPath=/org/apache/ctakes/temporal/ae/eventtime/model.jar
> addLogged EventEventRelationAnnotator
> classifierJarPath=/org/apache/ctakes/temporal/ae/eventevent/model.jar
> addLogged ContextualModalityAnnotator
>
> classifierJarPath=/org/apache/ctakes/temporal/ae/contextualmodality/model.jar
> addLogged EventAnnotator
> classifierJarPath=/org/apache/ctakes/temporal/ae/eventannotator/model.jar
>
> --
> Regards
> Zakir Saifi
> (Software Developer at Raxa)
>


-- 
Regards
Zakir Saifi
(Software Developer at Raxa)

Re: Making Ctakes Faster after Changing default lookup span value [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Zakir,

In order for me to help you, I need to know more about:
Your primary dictionary:
LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml

Your custom dictionary lookup #1:
add
org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
LookupXml=org/apache/ctakes/dictionary/lookup/fast/drugConcept.xml

Your custom dictionary lookup #2:
add
org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
LookupXml=org/apache/ctakes/dictionary/lookup/fast/personName.xml


As for your metrics,
>For lookup span
value of 3 (default), rest call was taking less than 2s for text like (
Systolic blood pressure 180 ) is now taking around 5s.

Does this mean that a document containing such text took 2 seconds, or that averaging over discovered annotations per took 2 seconds?

I realize that moving from 3 characters to 1 means that every "a" "to" "in" "of" "an" "1" "2" ... is used for lookup.  However, that should not multiply the processing time *2.5


I have to wonder if the non-ctakes
org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaDefaultJCasTermAnnotator
is doing something suspect.


Sean


________________________________________
From: Zakir Saifi <za...@raxa.com>
Sent: Thursday, February 21, 2019 12:18 AM
To: dev@ctakes.apache.org
Subject: Making Ctakes Faster after Changing default lookup span value [EXTERNAL]

Hi Everyone,

I am using Ctakes for Structuring some clinical Text. In my clinical text,
there are single characters word like *P 90 (Pulse 90) *etc. I want Ctakes
to detect those. Since the default minimum span detected by Ctakes is 3.
I was not able to detect these concepts. Therefore I have changed the Value
of the _minimumLookupSpan to 1. Now I am able to detect the one character
word using Ctakes after adding them to my Custom Dictionary.

My Problem is that after changing the value of _minimumLookupSpan, ctakes
has become slow.
I am using Ctakes-web-Rest (Rest Service using Ctakes). For lookup span
value of 3 (default), rest call was taking less than 2s for text like (
Systolic blood pressure 180 ) is now taking around 5s.

How can I make Ctakes faster?. Any configuration which helps to improve the
performance without losing the current detection rate.

Here is the content of my current Piper file.

load DefaultFastPipeline
add
org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaDefaultJCasTermAnnotator
LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml
add LabValueFinder
add
org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
LookupXml=org/apache/ctakes/dictionary/lookup/fast/drugConcept.xml
add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
STATUS_BOUNDARY_ANN_TYPE="org.apache.ctakes.typesystem.type.textsem.MedicationMention"
add
org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator
LookupXml=org/apache/ctakes/dictionary/lookup/fast/personName.xml
add org.apache.ctakes.raxactakes.core.ae.PersonNameFinder

addDescription EventAnnotator
addLogged BackwardsTimeAnnotator
classifierJarPath=/org/apache/ctakes/temporal/ae/timeannotator/model.jar
addLogged DocTimeRelAnnotator
classifierJarPath=/org/apache/ctakes/temporal/ae/doctimerel/model.jar
addLogged EventTimeRelationAnnotator
classifierJarPath=/org/apache/ctakes/temporal/ae/eventtime/model.jar
addLogged EventEventRelationAnnotator
classifierJarPath=/org/apache/ctakes/temporal/ae/eventevent/model.jar
addLogged ContextualModalityAnnotator
classifierJarPath=/org/apache/ctakes/temporal/ae/contextualmodality/model.jar
addLogged EventAnnotator
classifierJarPath=/org/apache/ctakes/temporal/ae/eventannotator/model.jar

--
Regards
Zakir Saifi
(Software Developer at Raxa)