You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Gandhi Rajan Natarajan <Ga...@arisglobal.com> on 2017/09/15 16:39:59 UTC

Enabling drugner pipeline and identifying dates

Hi All,

We are using the pipeline code as mentioned in https://github.com/healthnlp/examples/blob/master/ctakes-temporal-demo/src/main/java/org/apache/ctakes/web/client/servlet/Pipeline.java for the cTAKES web application we are building. But in our case, the measurements and quantities are identified as events as shown below:

SENTENCE:  The patient started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg /m2  (days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma.
           DT    NN      VBD    NN      NN     IN    NN     NNS   NNS         CC     NNP         NNS NNS  NNS        CC      IN          IN  DT     NN     IN       JJ          NN
                       |=====|       |=======|    |======| |===|                  |========|                                                     |=======|                   |=======|
                        Event        Procedure      Drug   Event                     Drug                                                        Procedure                   Disorder
                                     C0087111     C0723668                         C0014582                                                      C0087111                    C0007097
                                                                                                                                                              |======================|
                                                                                                                                                                      Disorder
                                                                                                                                                                      C2239176

From googling what we have found out is that we need to use DrugMentionAnnotator to identify measurements and quantities. Are we right? If so, how do we enable DrugMentionAnnotator in our code. Could someone provide a sample code snippet and help us out on this?

Also the dates are not getting identified in our case as we get the following error in our console even after using latest temporal resources (model.jar) as per Sean's suggestion :

"Null value found in Feature(<Time-Class->, <NULL>) from [Feature(<mention1>, <take>), Feature(<mention1_FirstCovered_0_1_0>, <take>)"

Could someone throw some light on this as well?

Thanks in advance.

Regards,
Gandhi

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender or system manager by email immediately if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited and against the law.

RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Gandhi,   (Hi Tim, find below the best coref chain I have ever seen),

Unfortunately, it looks like the drug-ner module has not been kept up-to-date.  I just checked the cpe xml files and they contain invalid pointers.  Anyway, you should be able to add the DrugMentionAnnotator by using:

AggregateBuilder (code):
    aggregateBuilder.add( AnalysisEngineFactory.createEngineDescription( DrugMentionAnnotator.class ) );

Piper file:
add org.apache.ctakes.drugner.ae.DrugMentionAnnotator

Unfortunately, the drug attribute types all extend the type Annotation.   The PrettyTextWriter that you are using only marks IdentifiedAnnotation subtypes, so you will not see the drug attributes without writing some extra code.  On that matter, I recommend that you use HtmlTextWriter for output as it provides more information in a nicer format - though still not drug ner attributes.  
One nice feature is the markup of coreferences.  Using your example sentence:
"The patient started study treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin , 20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular carcinoma." 
It marks a superscript '1' (coreference chain #1) after "200mg" and "carcinoma" because Tim's excellent coreference model connected:
"study treatment of Thalomid 200mg"  with "the treatment of hepatocellular carcinoma"!
If you click one of the superscript "4"s it will display the coreference chain in the margin.
I am still working on that writer in my spare time, so if you have suggestions please let me know.

As for the missing times, I don't know what you are witnessing.  When I run your sentence I get the times:
"days"
"days 1,8"
"06/07/02"    (contains treatment)
The "days" aren't perfect, but the "06/07/02" date and its "contains treatment" relation are pretty good.

Sean


-----Original Message-----
From: Gandhi Rajan Natarajan [mailto:Gandhi.Natarajan@arisglobal.com] 
Sent: Friday, September 15, 2017 12:40 PM
To: dev@ctakes.apache.org
Subject: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi All,

We are using the pipeline code as mentioned in https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_healthnlp_examples_blob_master_ctakes-2Dtemporal-2Ddemo_src_main_java_org_apache_ctakes_web_client_servlet_Pipeline.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=yzJCkloh5MR6n2JJ5haAmB4_MQed5JDZnn01SFotO9c&s=CZBlVpS2hKfCLyBRrR_D4KKCAtF2ru6qf6HHtV7HnCs&e=  for the cTAKES web application we are building. But in our case, the measurements and quantities are identified as events as shown below:

SENTENCE:  The patient started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg /m2  (days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma.
           DT    NN      VBD    NN      NN     IN    NN     NNS   NNS         CC     NNP         NNS NNS  NNS        CC      IN          IN  DT     NN     IN       JJ          NN
                       |=====|       |=======|    |======| |===|                  |========|                                                     |=======|                   |=======|
                        Event        Procedure      Drug   Event                     Drug                                                        Procedure                   Disorder
                                     C0087111     C0723668                         C0014582                                                      C0087111                    C0007097
                                                                                                                                                              |======================|
                                                                                                                                                                      Disorder
                                                                                                                                                                      C2239176

From googling what we have found out is that we need to use DrugMentionAnnotator to identify measurements and quantities. Are we right? If so, how do we enable DrugMentionAnnotator in our code. Could someone provide a sample code snippet and help us out on this?

Also the dates are not getting identified in our case as we get the following error in our console even after using latest temporal resources (model.jar) as per Sean's suggestion :

"Null value found in Feature(<Time-Class->, <NULL>) from [Feature(<mention1>, <take>), Feature(<mention1_FirstCovered_0_1_0>, <take>)"

Could someone throw some light on this as well?

Thanks in advance.

Regards,
Gandhi

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender or system manager by email immediately if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited and against the law.