You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Miller, Timothy" <Ti...@childrens.harvard.edu> on 2018/01/04 12:41:19 UTC

Re: How to use external CSV or BSV in addition to FastUMLS attention Sean [EXTERNAL]

Peter, I know Sean is busy this week and he may not see this for a while. But I tried this method over the summer and got it to work so I'm fairly confident that's the right approach still. Some of the details may have changed from two years ago, so I would also check out this directory as a starting point:
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-dictionary-lookup-fast-res/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/example/bsv/

Tim

________________________________________
From: Abramowitsch, Peter <pa...@hearst.com>
Sent: Thursday, January 4, 2018 7:28 AM
To: dev@ctakes.apache.org
Subject: Re: How to use external CSV or BSV in addition to FastUMLS  attention Sean [EXTERNAL]

Further to my previous message, Sean, I was wondering if you could tell me whether this answer you gave in 2015, is still the right way to do things in ctakes4.x

permalink:  https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message_s3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Xq7U7BTlhofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxsDD1ZdfsHVXO56wR8erA&e=

Subject:        RE: How to update cTAKES so that new top level categories come out based on local dictionary?<https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message_s3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Xq7U7BTlhofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxsDD1ZdfsHVXO56wR8erA&e=>     [permalink] <https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message_s3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Xq7U7BTlhofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxsDD1ZdfsHVXO56wR8erA&e=>
From:   Finan, Sean (Sean...@childrens.harvard.edu)
Date:   Oct 6, 2015 2:04:56 pm
List:   org.apache.incubator.ctakes-dev


Regards
Peter

From: <Abramowitsch>, Peter Abramowitsch <pa...@hearst.com>>
Date: Thursday, January 4, 2018 at 12:50 PM
To: "dev@ctakes.apache.org<ma...@ctakes.apache.org>" <de...@ctakes.apache.org>>
Subject: How to use external CSV or BSV in addition to FastUMLS

Can someone point me to any up-to-date how-tos on how to include external CSV/BSV type resources to add synonyms, and other terms for dictionary lookup to augment the FAST UMLS resources that comes out of the box.   Perhaps I have missed something, but looking at the CTakesDictionaryCreator UI, it looks like it is designed only to choose subsets of the UMLS data set rather than allowing one to bring in completely new information sources.  I scoured the Marklogic ctakes user archive, but so many of the entries are old and I'm not sure they describe the current way of doing things.

The only approach I could see would be to take use the AggregateEngine description and have it point to the CSV annotator, creating a completely new AE but this would build other types of annotation, whereas what I'm thinking about is a case for creating identified mentions such as a DiseaseDisorderMention based on finding an acronym that the UMLS resource doesn't know about, even though the concept in its full textual form is there.

I'm sure this is not a unique request and apologize in advance if it has already been answered somewhere

- Peter

Re: How to use external CSV or BSV in addition to FastUMLS attention Sean [EXTERNAL]

Posted by "Abramowitsch, Peter" <pa...@hearst.com>.
Great thanks,  I'll give it a spin.

On 1/4/18, 2:42 PM, "Miller, Timothy"
<Ti...@childrens.harvard.edu> wrote:

>The UIMA Analysis Engine descriptor for the dictionary component has a
>parameter for what ctakes calls a "lookup descriptor". By default the
>lookup descriptor describes a lookup in a hsql engine. The xml files in
>that sample directory are lookup descriptors for a lookup using the bsv
>files they point to. If you want your bsv lookup to complement the
>default lookup it's possible to just have two dictionaries running with
>different lookup descriptors. I think it's also possible to have a lookup
>descriptor have multiple lookup types (i.e. multiple <dictionary>
>sections inside <dictionaries>) but I can't guarantee that works!
>Tim
>
>________________________________________
>From: Abramowitsch, Peter <pa...@hearst.com>
>Sent: Thursday, January 4, 2018 7:51 AM
>To: dev@ctakes.apache.org
>Subject: Re: How to use external CSV or BSV in addition to FastUMLS
>attention Sean [EXTERNAL]
>
>Thanks Tim,
>
>I did see that folder and its contents and it seemed the right place to
>begin.  What I couldn't find was how/where to refer to one of those
>CustomCuiTui.Xml files in an engine description.
>
>Peter
>
>On 1/4/18, 1:41 PM, "Miller, Timothy"
><Ti...@childrens.harvard.edu> wrote:
>
>>Peter, I know Sean is busy this week and he may not see this for a while.
>>But I tried this method over the summer and got it to work so I'm fairly
>>confident that's the right approach still. Some of the details may have
>>changed from two years ago, so I would also check out this directory as a
>>starting point:
>>https://urldefense.proofpoint.com/v2/url?u=http-3A__svn.apache.org_viewvc
>>_
>>ctakes_trunk_ctakes-2Ddictionary-2Dlookup-2Dfast-2Dres_src_main_resources
>>_
>>org_apache_ctakes_dictionary_lookup_fast_example_bsv_&d=DwIFAw&c=B73tqXN8
>>E
>>c0ocRmZHMCntw&r=5LM1YwNyMUq7CWiSepCCsjTjwuVF4uswNF8BK5Orm10&m=j2h_timB4sk
>>c
>>lRz6ICf0XlmaUgJekZOOgGo_WF-iuDw&s=qbZInrnxDgeP2prW-pOoOFkVLFweja-ct48H8NW
>>y
>>dIM&e=
>>
>>Tim
>>
>>________________________________________
>>From: Abramowitsch, Peter <pa...@hearst.com>
>>Sent: Thursday, January 4, 2018 7:28 AM
>>To: dev@ctakes.apache.org
>>Subject: Re: How to use external CSV or BSV in addition to FastUMLS
>>attention Sean [EXTERNAL]
>>
>>Further to my previous message, Sean, I was wondering if you could tell
>>me whether this answer you gave in 2015, is still the right way to do
>>things in ctakes4.x
>>
>>permalink:
>>https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message_
>>s
>>3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
>>H
>>eup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Xq7U7BT
>>l
>>hofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxsDD1ZdfsHVXO
>>5
>>6wR8erA&e=
>>
>>Subject:        RE: How to update cTAKES so that new top level categories
>>come out based on local
>>dictionary?<https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.
>>o
>>rg_message_s3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdi
>>o
>>CoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx667
>>4
>>h&m=Xq7U7BTlhofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBx
>>s
>>DD1ZdfsHVXO56wR8erA&e=>     [permalink]
>><https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message
>>_
>>s3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r
>>=
>>Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Xq7U7B
>>T
>>lhofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxsDD1ZdfsHVX
>>O
>>56wR8erA&e=>
>>From:   Finan, Sean (Sean...@childrens.harvard.edu)
>>Date:   Oct 6, 2015 2:04:56 pm
>>List:   org.apache.incubator.ctakes-dev
>>
>>
>>Regards
>>Peter
>>
>>From: <Abramowitsch>, Peter Abramowitsch
>><pa...@hearst.com>>
>>Date: Thursday, January 4, 2018 at 12:50 PM
>>To: "dev@ctakes.apache.org<ma...@ctakes.apache.org>"
>><de...@ctakes.apache.org>>
>>Subject: How to use external CSV or BSV in addition to FastUMLS
>>
>>Can someone point me to any up-to-date how-tos on how to include external
>>CSV/BSV type resources to add synonyms, and other terms for dictionary
>>lookup to augment the FAST UMLS resources that comes out of the box.
>>Perhaps I have missed something, but looking at the
>>CTakesDictionaryCreator UI, it looks like it is designed only to choose
>>subsets of the UMLS data set rather than allowing one to bring in
>>completely new information sources.  I scoured the Marklogic ctakes user
>>archive, but so many of the entries are old and I'm not sure they
>>describe the current way of doing things.
>>
>>The only approach I could see would be to take use the AggregateEngine
>>description and have it point to the CSV annotator, creating a completely
>>new AE but this would build other types of annotation, whereas what I'm
>>thinking about is a case for creating identified mentions such as a
>>DiseaseDisorderMention based on finding an acronym that the UMLS resource
>>doesn't know about, even though the concept in its full textual form is
>>there.
>>
>>I'm sure this is not a unique request and apologize in advance if it has
>>already been answered somewhere
>>
>>- Peter
>


Re: How to use external CSV or BSV in addition to FastUMLS attention Sean [EXTERNAL]

Posted by "Abramowitsch, Peter" <pa...@hearst.com>.
Thanks, Tim Ok I did get custom dictionaries to work.

I was interested to see what it did when I overloaded an existing SNOMED
term with a new text, but keeping the same Preferred Text.  I like the
results:

So for instance in the bsv file:
C1956346|T47|grwz|Coronary Artery Disease

grwz is now linked with the same CUI/TUI as its SNOMED cousin.


And when running it through CTAKES I see this

168": {
      "_type": "UmlsConcept",
      "codingScheme": "my-scheme",
      "score": 0.0,
      "disambiguated": false,
      "cui": "C1956346",
      "tui": "T047",
      "preferredText": "Coronary Artery Disease"
}



So my concept can share the same CUI as the SNOMED concept for analysis
purposes, but I know it comes from a different dictionary.  Cool.

--------

Just a note:   Your instructions and Seans from several years ago are
slightly different from each other and from this release.

The bits that have changed are

Sean's refers to adding a bsv based dictionary to the cTakesHsql.xml file
which has become sno_rx_16ab.xml in Ctakes4

Yours refers to lookupDescriptors in the EngineDescription file.  But in
dictionary-lookup-fast, there are no more lookupDescriptors in its Engine
Description.  Those can only be found in the non-fast dictionary-lookup
module and there one finds examples of a CSV lookup which look as if
they're a different vintage from the bsv. I didn't try it, but my
take-away is that this would create a different kind of annotation. Using
the "adjunct" approach one can get bona fide disease disorder mentions and
procedure mentions etc.. based on the TUIs one hijacks.

Peter


On 1/4/18, 2:42 PM, "Miller, Timothy"
<Ti...@childrens.harvard.edu> wrote:

>The UIMA Analysis Engine descriptor for the dictionary component has a
>parameter for what ctakes calls a "lookup descriptor". By default the
>lookup descriptor describes a lookup in a hsql engine. The xml files in
>that sample directory are lookup descriptors for a lookup using the bsv
>files they point to. If you want your bsv lookup to complement the
>default lookup it's possible to just have two dictionaries running with
>different lookup descriptors. I think it's also possible to have a lookup
>descriptor have multiple lookup types (i.e. multiple <dictionary>
>sections inside <dictionaries>) but I can't guarantee that works!
>Tim
>
>________________________________________
>From: Abramowitsch, Peter <pa...@hearst.com>
>Sent: Thursday, January 4, 2018 7:51 AM
>To: dev@ctakes.apache.org
>Subject: Re: How to use external CSV or BSV in addition to FastUMLS
>attention Sean [EXTERNAL]
>
>Thanks Tim,
>
>I did see that folder and its contents and it seemed the right place to
>begin.  What I couldn't find was how/where to refer to one of those
>CustomCuiTui.Xml files in an engine description.
>
>Peter
>
>On 1/4/18, 1:41 PM, "Miller, Timothy"
><Ti...@childrens.harvard.edu> wrote:
>
>>Peter, I know Sean is busy this week and he may not see this for a while.
>>But I tried this method over the summer and got it to work so I'm fairly
>>confident that's the right approach still. Some of the details may have
>>changed from two years ago, so I would also check out this directory as a
>>starting point:
>>https://urldefense.proofpoint.com/v2/url?u=http-3A__svn.apache.org_viewvc
>>_
>>ctakes_trunk_ctakes-2Ddictionary-2Dlookup-2Dfast-2Dres_src_main_resources
>>_
>>org_apache_ctakes_dictionary_lookup_fast_example_bsv_&d=DwIFAw&c=B73tqXN8
>>E
>>c0ocRmZHMCntw&r=5LM1YwNyMUq7CWiSepCCsjTjwuVF4uswNF8BK5Orm10&m=j2h_timB4sk
>>c
>>lRz6ICf0XlmaUgJekZOOgGo_WF-iuDw&s=qbZInrnxDgeP2prW-pOoOFkVLFweja-ct48H8NW
>>y
>>dIM&e=
>>
>>Tim
>>
>>________________________________________
>>From: Abramowitsch, Peter <pa...@hearst.com>
>>Sent: Thursday, January 4, 2018 7:28 AM
>>To: dev@ctakes.apache.org
>>Subject: Re: How to use external CSV or BSV in addition to FastUMLS
>>attention Sean [EXTERNAL]
>>
>>Further to my previous message, Sean, I was wondering if you could tell
>>me whether this answer you gave in 2015, is still the right way to do
>>things in ctakes4.x
>>
>>permalink:
>>https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message_
>>s
>>3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
>>H
>>eup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Xq7U7BT
>>l
>>hofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxsDD1ZdfsHVXO
>>5
>>6wR8erA&e=
>>
>>Subject:        RE: How to update cTAKES so that new top level categories
>>come out based on local
>>dictionary?<https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.
>>o
>>rg_message_s3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdi
>>o
>>CoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx667
>>4
>>h&m=Xq7U7BTlhofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBx
>>s
>>DD1ZdfsHVXO56wR8erA&e=>     [permalink]
>><https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message
>>_
>>s3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r
>>=
>>Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Xq7U7B
>>T
>>lhofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxsDD1ZdfsHVX
>>O
>>56wR8erA&e=>
>>From:   Finan, Sean (Sean...@childrens.harvard.edu)
>>Date:   Oct 6, 2015 2:04:56 pm
>>List:   org.apache.incubator.ctakes-dev
>>
>>
>>Regards
>>Peter
>>
>>From: <Abramowitsch>, Peter Abramowitsch
>><pa...@hearst.com>>
>>Date: Thursday, January 4, 2018 at 12:50 PM
>>To: "dev@ctakes.apache.org<ma...@ctakes.apache.org>"
>><de...@ctakes.apache.org>>
>>Subject: How to use external CSV or BSV in addition to FastUMLS
>>
>>Can someone point me to any up-to-date how-tos on how to include external
>>CSV/BSV type resources to add synonyms, and other terms for dictionary
>>lookup to augment the FAST UMLS resources that comes out of the box.
>>Perhaps I have missed something, but looking at the
>>CTakesDictionaryCreator UI, it looks like it is designed only to choose
>>subsets of the UMLS data set rather than allowing one to bring in
>>completely new information sources.  I scoured the Marklogic ctakes user
>>archive, but so many of the entries are old and I'm not sure they
>>describe the current way of doing things.
>>
>>The only approach I could see would be to take use the AggregateEngine
>>description and have it point to the CSV annotator, creating a completely
>>new AE but this would build other types of annotation, whereas what I'm
>>thinking about is a case for creating identified mentions such as a
>>DiseaseDisorderMention based on finding an acronym that the UMLS resource
>>doesn't know about, even though the concept in its full textual form is
>>there.
>>
>>I'm sure this is not a unique request and apologize in advance if it has
>>already been answered somewhere
>>
>>- Peter
>


Re: How to use external CSV or BSV in addition to FastUMLS attention Sean [EXTERNAL]

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.
The UIMA Analysis Engine descriptor for the dictionary component has a parameter for what ctakes calls a "lookup descriptor". By default the lookup descriptor describes a lookup in a hsql engine. The xml files in that sample directory are lookup descriptors for a lookup using the bsv files they point to. If you want your bsv lookup to complement the default lookup it's possible to just have two dictionaries running with different lookup descriptors. I think it's also possible to have a lookup descriptor have multiple lookup types (i.e. multiple <dictionary> sections inside <dictionaries>) but I can't guarantee that works!
Tim

________________________________________
From: Abramowitsch, Peter <pa...@hearst.com>
Sent: Thursday, January 4, 2018 7:51 AM
To: dev@ctakes.apache.org
Subject: Re: How to use external CSV or BSV in addition to FastUMLS  attention Sean [EXTERNAL]

Thanks Tim,

I did see that folder and its contents and it seemed the right place to
begin.  What I couldn't find was how/where to refer to one of those
CustomCuiTui.Xml files in an engine description.

Peter

On 1/4/18, 1:41 PM, "Miller, Timothy"
<Ti...@childrens.harvard.edu> wrote:

>Peter, I know Sean is busy this week and he may not see this for a while.
>But I tried this method over the summer and got it to work so I'm fairly
>confident that's the right approach still. Some of the details may have
>changed from two years ago, so I would also check out this directory as a
>starting point:
>https://urldefense.proofpoint.com/v2/url?u=http-3A__svn.apache.org_viewvc_
>ctakes_trunk_ctakes-2Ddictionary-2Dlookup-2Dfast-2Dres_src_main_resources_
>org_apache_ctakes_dictionary_lookup_fast_example_bsv_&d=DwIFAw&c=B73tqXN8E
>c0ocRmZHMCntw&r=5LM1YwNyMUq7CWiSepCCsjTjwuVF4uswNF8BK5Orm10&m=j2h_timB4skc
>lRz6ICf0XlmaUgJekZOOgGo_WF-iuDw&s=qbZInrnxDgeP2prW-pOoOFkVLFweja-ct48H8NWy
>dIM&e=
>
>Tim
>
>________________________________________
>From: Abramowitsch, Peter <pa...@hearst.com>
>Sent: Thursday, January 4, 2018 7:28 AM
>To: dev@ctakes.apache.org
>Subject: Re: How to use external CSV or BSV in addition to FastUMLS
>attention Sean [EXTERNAL]
>
>Further to my previous message, Sean, I was wondering if you could tell
>me whether this answer you gave in 2015, is still the right way to do
>things in ctakes4.x
>
>permalink:
>https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message_s
>3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=H
>eup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Xq7U7BTl
>hofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxsDD1ZdfsHVXO5
>6wR8erA&e=
>
>Subject:        RE: How to update cTAKES so that new top level categories
>come out based on local
>dictionary?<https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.o
>rg_message_s3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdio
>CoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674
>h&m=Xq7U7BTlhofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxs
>DD1ZdfsHVXO56wR8erA&e=>     [permalink]
><https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message_
>s3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
>Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Xq7U7BT
>lhofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxsDD1ZdfsHVXO
>56wR8erA&e=>
>From:   Finan, Sean (Sean...@childrens.harvard.edu)
>Date:   Oct 6, 2015 2:04:56 pm
>List:   org.apache.incubator.ctakes-dev
>
>
>Regards
>Peter
>
>From: <Abramowitsch>, Peter Abramowitsch
><pa...@hearst.com>>
>Date: Thursday, January 4, 2018 at 12:50 PM
>To: "dev@ctakes.apache.org<ma...@ctakes.apache.org>"
><de...@ctakes.apache.org>>
>Subject: How to use external CSV or BSV in addition to FastUMLS
>
>Can someone point me to any up-to-date how-tos on how to include external
>CSV/BSV type resources to add synonyms, and other terms for dictionary
>lookup to augment the FAST UMLS resources that comes out of the box.
>Perhaps I have missed something, but looking at the
>CTakesDictionaryCreator UI, it looks like it is designed only to choose
>subsets of the UMLS data set rather than allowing one to bring in
>completely new information sources.  I scoured the Marklogic ctakes user
>archive, but so many of the entries are old and I'm not sure they
>describe the current way of doing things.
>
>The only approach I could see would be to take use the AggregateEngine
>description and have it point to the CSV annotator, creating a completely
>new AE but this would build other types of annotation, whereas what I'm
>thinking about is a case for creating identified mentions such as a
>DiseaseDisorderMention based on finding an acronym that the UMLS resource
>doesn't know about, even though the concept in its full textual form is
>there.
>
>I'm sure this is not a unique request and apologize in advance if it has
>already been answered somewhere
>
>- Peter


Re: How to use external CSV or BSV in addition to FastUMLS attention Sean [EXTERNAL]

Posted by "Abramowitsch, Peter" <pa...@hearst.com>.
Thanks Tim,  

I did see that folder and its contents and it seemed the right place to
begin.  What I couldn't find was how/where to refer to one of those
CustomCuiTui.Xml files in an engine description.

Peter

On 1/4/18, 1:41 PM, "Miller, Timothy"
<Ti...@childrens.harvard.edu> wrote:

>Peter, I know Sean is busy this week and he may not see this for a while.
>But I tried this method over the summer and got it to work so I'm fairly
>confident that's the right approach still. Some of the details may have
>changed from two years ago, so I would also check out this directory as a
>starting point:
>https://urldefense.proofpoint.com/v2/url?u=http-3A__svn.apache.org_viewvc_
>ctakes_trunk_ctakes-2Ddictionary-2Dlookup-2Dfast-2Dres_src_main_resources_
>org_apache_ctakes_dictionary_lookup_fast_example_bsv_&d=DwIFAw&c=B73tqXN8E
>c0ocRmZHMCntw&r=5LM1YwNyMUq7CWiSepCCsjTjwuVF4uswNF8BK5Orm10&m=j2h_timB4skc
>lRz6ICf0XlmaUgJekZOOgGo_WF-iuDw&s=qbZInrnxDgeP2prW-pOoOFkVLFweja-ct48H8NWy
>dIM&e=
>
>Tim
>
>________________________________________
>From: Abramowitsch, Peter <pa...@hearst.com>
>Sent: Thursday, January 4, 2018 7:28 AM
>To: dev@ctakes.apache.org
>Subject: Re: How to use external CSV or BSV in addition to FastUMLS
>attention Sean [EXTERNAL]
>
>Further to my previous message, Sean, I was wondering if you could tell
>me whether this answer you gave in 2015, is still the right way to do
>things in ctakes4.x
>
>permalink:  
>https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message_s
>3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=H
>eup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Xq7U7BTl
>hofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxsDD1ZdfsHVXO5
>6wR8erA&e=
>
>Subject:        RE: How to update cTAKES so that new top level categories
>come out based on local
>dictionary?<https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.o
>rg_message_s3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdio
>CoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674
>h&m=Xq7U7BTlhofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxs
>DD1ZdfsHVXO56wR8erA&e=>     [permalink]
><https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message_
>s3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
>Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Xq7U7BT
>lhofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxsDD1ZdfsHVXO
>56wR8erA&e=>
>From:   Finan, Sean (Sean...@childrens.harvard.edu)
>Date:   Oct 6, 2015 2:04:56 pm
>List:   org.apache.incubator.ctakes-dev
>
>
>Regards
>Peter
>
>From: <Abramowitsch>, Peter Abramowitsch
><pa...@hearst.com>>
>Date: Thursday, January 4, 2018 at 12:50 PM
>To: "dev@ctakes.apache.org<ma...@ctakes.apache.org>"
><de...@ctakes.apache.org>>
>Subject: How to use external CSV or BSV in addition to FastUMLS
>
>Can someone point me to any up-to-date how-tos on how to include external
>CSV/BSV type resources to add synonyms, and other terms for dictionary
>lookup to augment the FAST UMLS resources that comes out of the box.
>Perhaps I have missed something, but looking at the
>CTakesDictionaryCreator UI, it looks like it is designed only to choose
>subsets of the UMLS data set rather than allowing one to bring in
>completely new information sources.  I scoured the Marklogic ctakes user
>archive, but so many of the entries are old and I'm not sure they
>describe the current way of doing things.
>
>The only approach I could see would be to take use the AggregateEngine
>description and have it point to the CSV annotator, creating a completely
>new AE but this would build other types of annotation, whereas what I'm
>thinking about is a case for creating identified mentions such as a
>DiseaseDisorderMention based on finding an acronym that the UMLS resource
>doesn't know about, even though the concept in its full textual form is
>there.
>
>I'm sure this is not a unique request and apologize in advance if it has
>already been answered somewhere
>
>- Peter