You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Ryan Young <ro...@buffalo.edu> on 2020/03/23 15:02:21 UTC

Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI)

Hello,

I am a medical student who happened to come across cTAKES for a project I
am working on. What I would like to do is take a list of surgeries in a
text file and have cTAKES output what it determines to be the best UMLS
code (CUI) for that particular line.

Each line of the text file is independent of the others (i.e., each line
should be read and analyzed separately). For example, here's my list of the
surgeries (Surgery_List.txt):
Colonoscopy with Polypectomy
Esophagogastroduodenoscopy Colonoscopy
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration

When I run the piper file (see below), I get the following output:
Colonoscopy with Polypectomy
"Colonoscopy"
  Procedure
  C0009378 colonoscopy
"Polypectomy"
  Procedure
  C0521210 Resection of polyp

Esophagogastroduodenoscopy Colonoscopy
"Esophagogastroduodenoscopy"
  Procedure
  C0079304 Esophagogastroduodenoscopy
"Colonoscopy"
  Procedure
  C0009378 colonoscopy

Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
"Esophagogastroduodenoscopy"
  Procedure
  C0079304 Esophagogastroduodenoscopy
"Endoscopic ultrasound"
  Procedure
  C0376443 Endoscopic Ultrasound
"Endoscopic"
  Procedure
  C0014245 Endoscopy (procedure)
"ultrasound"
  Procedure
  C0041618 Ultrasonography
"Fine needle aspiration"
  Procedure
  C1510483 Fine needle aspiration biopsy
"aspiration"
  Procedure
  C0349707 Aspiration-action

Here's the piper file I have been using:
reader org.apache.ctakes.core.cr.FileTreeReader
InputDirectory="C:\path\to\input\folder"
load DefaultTokenizerPipeline.piper
SentenceModelFile=C:\cTAKES_4.0.0\desc\ctakes-core\desc\analysis_engine\SentenceDetectorAnnotatorBIO.xml
add ContextDependentTokenizerAnnotator
add org.apache.ctakes.necontexts.ContextAnnotator
addDescription POSTagger
load ChunkerSubPipe.piper
set ctakes.umlsuser=my_username ctakes.umlspw=my_password
add org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator
DictionaryDescriptor=C:\cTAKES_4.0.0\desc\ctakes-dictionary-lookup-fast\desc\analysis_engine\UmlsLookupAnnotator.xml
LookupXml=C:\cTAKES_4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab.xml
add property.plaintext.PropertyTextWriterFit
OutputDirectory="C:\path\to\output\folder"

The workaround I have developed is as follows...
1.) Save each line of Surgery_List.txt to separate text files
2.) Use a Python script to parse each individual text file to extract the
first UMLS code (CUI) given in the text file

The above method works fine when there's only 10 lines, but not so well
when there's 40,000 lines in Surgery_List.txt.

Ideally, I would like for Fast Dictionary Lookup to just return the top
result for each line of Surgery_List.txt. For example, Output.txt would
look just like this:
C0009378
C0079304
C0079304

Just for reference here's how UMLS codes correspond between
Surgery_List.txt and Output.txt:
C0009378 --> Colonoscopy with Polypectomy
C0079304 --> Esophagogastroduodenoscopy Colonoscopy
C0079304 --> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine
needle aspiration

Is there something I can add to the piper file to make this happen?

Currently, I have the cTAKES user version installed, but I could install
the developer version if need be. I would just need a little guidance on
which Java script I would need to modify to get the desired results.

Thank You,

Ryan Young
MD/MBA Candidate
Jacobs School of Medicine & Biomedical Sciences

Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Ryan,

Your piper has a lot of things that you don't need.

Try this:

// set the location of the input file and use a reader that treats each line like a different document.
set InputFileName=C:/path/to/input/file.txt
reader LinesFromFileCollectionReader

// Use the default section, sentence, token pipeline
load DefaultTokenizerPipeline

// Find Parts of Speech for dictionary lookup
add POSTagger

// Add Dictionary Lookup
load DictionarySubPipe

// This is a temporary reader ...
add CuiLookupLister


The CuiLookupLister doesn't do exactly what you want, but you need a custom writer to do that.
The LinesFromFileCollectionReader is not ideal, but it does do what you want.

Could you please tell me how you are running this?  Are you using the submitter gui, the PiperFileRunner class, a shell script or something else?
Also, how comfortable are you with java?

I will scribble up something more and send it in a minute ...

Sean

________________________________________
From: Ryan Young <ro...@buffalo.edu>
Sent: Monday, March 23, 2020 11:02 AM
To: dev@ctakes.apache.org
Subject: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL]

* External Email - Caution *


Hello,

I am a medical student who happened to come across cTAKES for a project I
am working on. What I would like to do is take a list of surgeries in a
text file and have cTAKES output what it determines to be the best UMLS
code (CUI) for that particular line.

Each line of the text file is independent of the others (i.e., each line
should be read and analyzed separately). For example, here's my list of the
surgeries (Surgery_List.txt):
Colonoscopy with Polypectomy
Esophagogastroduodenoscopy Colonoscopy
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration

When I run the piper file (see below), I get the following output:
Colonoscopy with Polypectomy
"Colonoscopy"
  Procedure
  C0009378 colonoscopy
"Polypectomy"
  Procedure
  C0521210 Resection of polyp

Esophagogastroduodenoscopy Colonoscopy
"Esophagogastroduodenoscopy"
  Procedure
  C0079304 Esophagogastroduodenoscopy
"Colonoscopy"
  Procedure
  C0009378 colonoscopy

Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
"Esophagogastroduodenoscopy"
  Procedure
  C0079304 Esophagogastroduodenoscopy
"Endoscopic ultrasound"
  Procedure
  C0376443 Endoscopic Ultrasound
"Endoscopic"
  Procedure
  C0014245 Endoscopy (procedure)
"ultrasound"
  Procedure
  C0041618 Ultrasonography
"Fine needle aspiration"
  Procedure
  C1510483 Fine needle aspiration biopsy
"aspiration"
  Procedure
  C0349707 Aspiration-action

Here's the piper file I have been using:
reader org.apache.ctakes.core.cr.FileTreeReader
InputDirectory="C:\path\to\input\folder"
load DefaultTokenizerPipeline.piper
SentenceModelFile=C:\cTAKES_4.0.0\desc\ctakes-core\desc\analysis_engine\SentenceDetectorAnnotatorBIO.xml
add ContextDependentTokenizerAnnotator
add org.apache.ctakes.necontexts.ContextAnnotator
addDescription POSTagger
load ChunkerSubPipe.piper
set ctakes.umlsuser=my_username ctakes.umlspw=my_password
add org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator
DictionaryDescriptor=C:\cTAKES_4.0.0\desc\ctakes-dictionary-lookup-fast\desc\analysis_engine\UmlsLookupAnnotator.xml
LookupXml=C:\cTAKES_4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab.xml
add property.plaintext.PropertyTextWriterFit
OutputDirectory="C:\path\to\output\folder"

The workaround I have developed is as follows...
1.) Save each line of Surgery_List.txt to separate text files
2.) Use a Python script to parse each individual text file to extract the
first UMLS code (CUI) given in the text file

The above method works fine when there's only 10 lines, but not so well
when there's 40,000 lines in Surgery_List.txt.

Ideally, I would like for Fast Dictionary Lookup to just return the top
result for each line of Surgery_List.txt. For example, Output.txt would
look just like this:
C0009378
C0079304
C0079304

Just for reference here's how UMLS codes correspond between
Surgery_List.txt and Output.txt:
C0009378 --> Colonoscopy with Polypectomy
C0079304 --> Esophagogastroduodenoscopy Colonoscopy
C0079304 --> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine
needle aspiration

Is there something I can add to the piper file to make this happen?

Currently, I have the cTAKES user version installed, but I could install
the developer version if need be. I would just need a little guidance on
which Java script I would need to modify to get the desired results.

Thank You,

Ryan Young
MD/MBA Candidate
Jacobs School of Medicine & Biomedical Sciences

Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL]

Posted by Ryan Young <ro...@buffalo.edu>.
Hello Sean,

Below is the same list from my previous email. However, I manually
truncated the lines so that the email doesn't split them this time. There
should be 100 rows when you copy and paste now.

I ran the list below again and it returns 101 CUIs when it should only
return 100 CUIs. This is interesting because the "non-truncated" version
from my previous email had returned 103 CUIs. Is there a certain number of
characters in a line which causes cTAKES to change fast dictionary lookup's
output? That might be the missing conditional which would have to be
accounted for.

Input.txt (file name wasn't included in the actual file itself)
Colonoscopy with Polypectomy
Esophagogastroduodenoscopy Colonoscopy
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
Esophagogastroduodenoscopy with Endoscopic ultrasound
Esophagogastroduodenoscopy with Biopsy()
Linear Endobronchial Ultrasound (EBUS) with Nav Bronch
Bronchoscopy Endobronchial Ultrasound (EBUS) OR
Esophagogastroduodenoscopy with Dilation Savary
Esophagogastroduodenoscopy
Esophagogastroduodenoscopy Endoscopic retrograde cholangiopancreatography
Wide Local Excision Flap Local Cheek Skin Graft Full Thickness (FTSG)
Esophagogastroduodenoscopy with Dilation Balloon
Esophagogastroduodenoscopy with Biopsy()
Excision Soft Tissue Tumor
Axillary Node Dissection
Wide Local Excision w Removal of Radioactive Seed
Laparoscopic Partial Gastrectomy
ZLumpectomy, with   Sentinel lymph node Biopsy Sentinel Lymph Node Biopsy
Cysto with Pre-Op Ureteral Catheter Placement Diagnostic Laparoscopy
Laminectomy Cervical with Instrumentation
Transanal Endoscopic Microsurgery
Implantation Procedure Ommaya Reservoir Insertion with Axiem
Suprahyoid Lymphadenectomy Procedure Transcervical Extended Mediastinal
Video Assisted Thorascopic Surgery with Wedge Resection Nerve Block
Video Assisted Thorascopic Surgery with Decortication
Esophagogastroduodenoscopy Colonoscopy
Esophagogastroduodenoscopy with Endoscopic ultrasound
Colonoscopy with Biopsy()
Esophagogastroduodenoscopy with Endoscopic ultrasound
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
Esophagogastroduodenoscopy
Colonoscopy
Colonoscopy
Esophagogastroduodenoscopy with Biopsy()
Esophagogastroduodenoscopy with Esophageal Stent Placement
Colonoscopy with Biopsy()
Esophagogastroduodenoscopy with Biopsy()
Esophagogastroduodenoscopy Colonoscopy
Colonoscopy with Biopsy()
Wide local excision with Removal of Seeds Sentinel lymph node Biopsys
Wide Local Excision w Removal of Radioactive Seed
Abscess Drainage Empyema Rib Resection Flap Latissimus Dorsi Thoracoplasty
Laparoscopic Cholecystectomy Laparoscopic Liver Biopsy
Laparotomy Salpingo Oophorectomy Resection Pelvic Abcess Ruptured
Minimally Invasive Esophagectomy with Feeding J
Ileostomy
Diagnostic Hysteroscopy Dilation Curettage (D and C)
Exploratory Laparotomy Lysis of Adhesions Bowel Resection End to End
Robot Assisted Sigmoid Colon Resection Robot Assisted Right Colectomy
Esophagogastroduodenoscopy with Dilation Balloon
Segmentectomy (Thoracic) Nerve Block Intercostal, Multiple
Video Assisted Thorascopic Surgery with Wedge Resection Intubating
Bronchoscopy Nerve Block Intercostal, Multiple
Esophagogastroduodenoscopy Colonoscopy
Esophagogastroduodenoscopy Endoscopy Mucosal Resection
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
Colonoscopy
Esophagogastroduodenoscopy with Biopsy()
Colonoscopy with Biopsy()
Esophagogastroduodenoscopy
Esophagogastroduodenoscopy with Biopsy()
Craniectomy Frontal withStealth Drainage of Abscess
Craniotomy Excision Tumor Posterior Fossa Craniotomy Temporal
Thyroid Lobectomy with Isthmusectomy
Neck Exploration Mediastinal Exploration Video Assisted Thorascopic Surgery
Thyroid Lobectomy
Cytoreduction Debulking Hyperthermic Intraperitoneal Chemotherapy (HIPEC)
Dilation Curettage (D and C) with Hysteroscopy
Open Biopsy Lymph Node Biopsy Excision
Total abdominal hysterectomy Bilateral salpingo-oophorectomy with Radical
Laminectomy Lumbar
Craniotomy Occipital with Axiem
Cytoreduction Debulking Hyperthermic Intraperitoneal Chemotherapy (HIPEC)
Cystoscopy with Ureteral Stent Insertion Change Cystoscopy with Ureteral
Cystoscopy with TURP
Retrograde Pyelogram Ureteroscopy Cystoscopy with Ureteral Cathm
Colonoscopy with Polypectomy
Esophagogastroduodenoscopy with Dilation Balloon
Esophagogastroduodenoscopy with RFA (Halo)
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
Esophagogastroduodenoscopy with Endoscopic ultrasound
Esophagogastroduodenoscopy with Endoscopic ultrasound
Linear Endobronchial Ultrasound (EBUS) with Nav Bronch
Linear Endobronchial Ultrasound (EBUS) with Nav Bronch
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
Bronchoscopy with Endobronchial Ultrasound (EBUS)
Microsuspension Laryngoscopy
Selective Neck Dissection
Wide Local Excision w Removal of Radioactive Seed
Breast Re-Excision
Bronchoscopy
Robot Assisted Total Hysterectomy SO
Craniotomy Parietal with ioMRI
Bronchoscopy with Biopsy()
Ex Laparotomy Total abdominal hysterectomy with Salpingo Oophorectomy
Robot Assisted Prostatectomy Robot Assisted Pelvic Lymphadenectomy

Thank You,

Ryan Young

On Wed, Apr 1, 2020 at 8:51 AM Finan, Sean <Se...@childrens.harvard.edu>
wrote:

> Hi Ryan,
>
> That list didn't work for me as email added its own line endings,
> splitting intended lines, and it ends up being 120 rows.
> For instance:
>
> Esophagogastroduodenoscopy Endoscopic retrograde cholangiopancreatography
> with Ampullectomy
>
> is 2 lines instead of 1.
>
> I will write something new for you in a little bit and maybe we can figure
> this out.
>
> Sean
>
> ________________________________________
> From: Ryan Young <ro...@buffalo.edu>
> Sent: Tuesday, March 31, 2020 10:13 PM
> To: dev@ctakes.apache.org
> Subject: Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code
> (CUI) [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hello Sean,
>
> I was able to get cTAKES packaged. However, the output text file isn't the
> same number of lines as the input text file. For example, if the input text
> file is 10,000 lines long then the output text file ends up being 10,630
> lines.
>
> This makes me think that there's another conditional statement (or two)
> which needs to be added to the end of SentenceFirstCuiWriter.java.
>
> Here's the current version of SentenceFirstCuiWriter.java I am using:
> public class SentenceFirstCuiWriter extends AbstractJCasFileWriter {
>
>    public void writeFile( final JCas jCas, final String outputDir,
>                           final String documentId, final String fileName )
> throws IOException
> {
>       File cuiFile = new File( outputDir, fileName + "_cui.txt" );
>       Map<Sentence, Collection<ProcedureMention>> sentenceMap
>             = JCasUtil.indexCovered( jCas, Sentence.class,
> ProcedureMention.class );
>       List<Collection<ProcedureMention>> sortedSentenceProcedures
>             = sentenceMap.entrySet()
>                          .stream()
>                          .sorted( Map.Entry.comparingByKey(
> DefaultAspanComparator.INSTANCE
> ) )
>                          .map( Map.Entry::getValue )
>                          .collect( Collectors.toList() );
>       try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile ) )
> ) {
>          for ( Collection<ProcedureMention> procedures :
> sortedSentenceProcedures )
> {
>             ProcedureMention firstProcedure
>                   = procedures.stream()
>                               .min( Comparator.comparingInt(
> ProcedureMention::getBegin )
> )
>                               .orElse( null );
>             if ( firstProcedure == null ) {
>             writer.write( "\n" );
>             } else {
>                String cui
>                      = OntologyConceptUtil.getCuis( firstProcedure )
>                                           .stream()
>                                           .findFirst()
>                                           .orElse( "" );
>           if ( cui.isEmpty() ) {
>           writer.write( "\n" );
>           } else {
>                   writer.write( cui + "\n" );
>                }
>             }
>          }
>       }
>    }
> }
>
> Below is the piper file I am using:
> // Piper
> reader org.apache.ctakes.core.cr.FileTreeReader
> InputDirectory="C:\path\to\input\folder"
> set ctakes.umlsuser=username ctakes.umlspw=password
> load DefaultTokenizerPipeline
> add POSTagger
> load DictionarySubPipe
> add SentenceFirstCuiWriter OutputDirectory="C:\path\to\output\folder"
>
> If it helps, I have listed the first 100 lines of the input text file.
> Again, the expected output text file should be 100 lines (i.e., 100 CUIs)
> as well. However, the output text file returns 103 lines (103 CUIs). 3
> extra CUI than what it should.
> Input.txt
> Colonoscopy with Polypectomy
> Esophagogastroduodenoscopy Colonoscopy
> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
> aspiration
> Esophagogastroduodenoscopy with Endoscopic ultrasound
> Esophagogastroduodenoscopy with Biopsy()
> Linear Endobronchial Ultrasound (EBUS) with Nav Bronch
> Bronchoscopy Endobronchial Ultrasound (EBUS) OR
> Esophagogastroduodenoscopy with Dilation Savary
> Esophagogastroduodenoscopy
> Esophagogastroduodenoscopy Endoscopic retrograde cholangiopancreatography
> with Ampullectomy
> Wide Local Excision Flap Local Cheek Skin Graft Full Thickness (FTSG)
> Esophagogastroduodenoscopy with Dilation Balloon
> Esophagogastroduodenoscopy with Biopsy()
> Excision Soft Tissue Tumor
> Axillary Node Dissection
> Wide Local Excision w Removal of Radioactive Seed
> Laparoscopic Partial Gastrectomy
> ZLumpectomy, with   Sentinel lymph node Biopsy Sentinel Lymph Node Biopsy
> Excision
> Cysto with Pre-Op Ureteral Catheter Placement Diagnostic Laparoscopy
> Sigmoid Colectomy Salpingo Oophorectomy
> Laminectomy Cervical with Instrumentation
> Transanal Endoscopic Microsurgery
> Implantation Procedure Ommaya Reservoir Insertion with Axiem
> Suprahyoid Lymphadenectomy Procedure Transcervical Extended Mediastinal
> Lymphadenectomy (Transcervcl Extndd Medstnl Lymphadenectmy) Video Assisted
> Thorascopic Surgery with Lobectomy Intubating Bronchoscopy Nerve Block
> Intercostal, Multiple
> Video Assisted Thorascopic Surgery with Wedge Resection Nerve Block
> Intercostal, Multiple
> Video Assisted Thorascopic Surgery with Decortication
> Esophagogastroduodenoscopy Colonoscopy
> Esophagogastroduodenoscopy with Endoscopic ultrasound
> Colonoscopy with Biopsy()
> Esophagogastroduodenoscopy with Endoscopic ultrasound
> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
> aspiration Endoscopic retrograde cholangiopancreatography with Ampullectomy
> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
> aspiration
> Esophagogastroduodenoscopy
> Colonoscopy
> Colonoscopy
> Esophagogastroduodenoscopy with Biopsy()
> Esophagogastroduodenoscopy with Esophageal Stent Placement
> Colonoscopy with Biopsy()
> Esophagogastroduodenoscopy with Biopsy()
> Esophagogastroduodenoscopy Colonoscopy
> Colonoscopy with Biopsy()
> Wide local excision with Removal of Seeds Sentinel lymph node Biopsys
> Wide Local Excision w Removal of Radioactive Seed
> Abscess Drainage Empyema Rib Resection Flap Latissimus Dorsi Thoracoplasty
> Removal of Foreign Body
> Laparoscopic Cholecystectomy Laparoscopic Liver Biopsy
> Laparotomy Salpingo Oophorectomy Resection Pelvic Abcess Ruptured
> Diverticulum
> Minimally Invasive Esophagectomy with Feeding J
> Ileostomy
> Diagnostic Hysteroscopy Dilation Curettage (D and C)
> Exploratory Laparotomy Lysis of Adhesions Bowel Resection End to End
>  Anastomosis Take Down of Ostomy
> Robot Assisted Sigmoid Colon Resection Robot Assisted Right Colectomy
> Esophagogastroduodenoscopy with Dilation Balloon
> Segmentectomy (Thoracic) Nerve Block Intercostal, Multiple
> Video Assisted Thorascopic Surgery with Wedge Resection Intubating
> Bronchoscopy Nerve Block Intercostal, Multiple
> Esophagogastroduodenoscopy Colonoscopy
> Esophagogastroduodenoscopy Endoscopy Mucosal Resection
> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
> aspiration
> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
> aspiration
> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
> aspiration Endoscopic retrograde cholangiopancreatography with Stent Change
> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
> aspiration Endoscopic retrograde cholangiopancreatography with Ampullectomy
> Esophagogastroduodenoscopy with Esophageal Stent Placement
> Colonoscopy
> Esophagogastroduodenoscopy with Biopsy()
> Colonoscopy with Biopsy()
> Esophagogastroduodenoscopy
> Esophagogastroduodenoscopy with Biopsy()
> Craniectomy Frontal withStealth Drainage of Abscess
> Craniotomy Excision Tumor Posterior Fossa Craniotomy Temporal
> Thyroid Lobectomy with Isthmusectomy
> Neck Exploration Mediastinal Exploration Video Assisted Thorascopic Surgery
> with Bullectomy
> Thyroid Lobectomy
> Cytoreduction Debulking Hyperthermic Intraperitoneal Chemotherapy (HIPEC)
> Dilation Curettage (D and C) with Hysteroscopy
> Open Biopsy Lymph Node Biopsy Excision
> Total abdominal hysterectomy Bilateral salpingo-oophorectomy with Radical
> Dissection For Debulking Exploratory Laparotomy Peritoneal Stripping
> Resection of Tumor Resection of Tumor Bowel Resection Omentectomy
> Laminectomy Lumbar
> Craniotomy Occipital with Axiem
> Cytoreduction Debulking Hyperthermic Intraperitoneal Chemotherapy (HIPEC)
> Exploratory Laparotomy Colectomy Partial Omentectomy
> Cystoscopy with Ureteral Stent Insertion Change Cystoscopy with Ureteral
> Cath  Retrograde Pyelogramm
> Cystoscopy with TURP
> Retrograde Pyelogram Ureteroscopy Cystoscopy with Ureteral Cathm
> Colonoscopy with Polypectomy
> Esophagogastroduodenoscopy with Dilation Balloon
> Esophagogastroduodenoscopy with RFA (Halo)
> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
> aspiration
> Esophagogastroduodenoscopy with Endoscopic ultrasound
> Esophagogastroduodenoscopy with Endoscopic ultrasound
> Linear Endobronchial Ultrasound (EBUS) with Nav Bronch
> Linear Endobronchial Ultrasound (EBUS) with Nav Bronch
> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
> aspiration Endoscopic retrograde cholangiopancreatography with Ampullectomy
> Bronchoscopy with Endobronchial Ultrasound (EBUS)
> Microsuspension Laryngoscopy
> Selective Neck Dissection
> Wide Local Excision w Removal of Radioactive Seed
> Breast Re-Excision
> Bronchoscopy
> Robot Assisted Total Hysterectomy SO
> Craniotomy Parietal with ioMRI
> Bronchoscopy with Biopsy()
> Ex Laparotomy Total abdominal hysterectomy with Salpingo Oophorectomy
> Robot Assisted Prostatectomy Robot Assisted Pelvic Lymphadenectomy
>
> Thank You,
>
> Ryan Young
>
> On Tue, Mar 31, 2020 at 10:44 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Ryan,
> >
> > You made some excellent progress.  ctakes is a little complicated for new
> > users - especially anybody that isn't familiar with Java.
> >
> > Since you are going to be running from a  command line (via python) and
> > have already done so successfully, we can just try to get you set up to
> > repeat that process.
> >
> > In Eclipse, you should be able to run the maven "package" configuration.
> >
> > That will compile and build an installation similar to what you were
> using
> > before.
> >
> > After you execute maven package,
> > open the directory ctakes-distribution/target/
> > There should be a .zip file named apache-ctakes-4.0.1-SNAPSHOT-bin
> > That zip file contains a ctakes installation for Windows.
> > Unzip the installation wherever you like - preferably without spaces in
> > directory names.
> >
> > You should be able to treat this new installation just like you did the
> > one downloaded from the ctakes website.
> >
> > Before you do all of that ...  We should change a couple of things in
> that
> > SentenceFirstCuiWriter to output blanks where procedures or cuis are not
> > discovered for your snippets.
> >
> >
> > >> public class SentenceFirstCuiWriter extends AbstractJCasFileWriter {
> > >>
> > >>    public void writeFile( final JCas jCas, final String outputDir,
> > >>                           final String documentId, final String
> fileName
> > >> ) throws IOException {
> > >>       File cuiFile = new File( outputDir, fileName + "_cui.txt" );
> > >>       Map<Sentence, Collection<ProcedureMention>> sentenceMap
> > >>             = JCasUtil.indexCovered( jCas, Sentence.class,
> > >> ProcedureMention.class );
> > >>       List<Collection<ProcedureMention>> sortedSentenceProcedures
> > >>             = sentenceMap.entrySet()
> > >>                          .stream()
> > >>                          .sorted( Map.Entry.comparingByKey(
> > >> DefaultAspanComparator.INSTANCE ) )
> > >>                          .map( Map.Entry::getValue )
> > >>                          .collect( Collectors.toList() );
> > >>       try ( Writer writer = new BufferedWriter( new FileWriter(
> cuiFile
> > )
> > >> ) ) {
> > >>          for ( Collection<ProcedureMention> procedures :
> > >> sortedSentenceProcedures ) {
> > >>             ProcedureMention firstProcedure
> > >>                   = procedures.stream()
> > >>                               .min( Comparator.comparingInt(
> > >> ProcedureMention::getBegin ) )
> > >>                               .orElse( null );
> > >>             if ( firstProcedure != null ) {
> >
> > ---------- Change the above line to
> >
> > if ( firstProcedure == null ) {
> >    writer.write( "\n" );
> > } else {
> >
> > >>                String cui
> > >>                      = OntologyConceptUtil.getCuis( firstProcedure )
> > >>                                           .stream()
> > >>                                           .findFirst()
> > >>                                           .orElse( "" );
> > >>                if ( !cui.isEmpty() ) {
> >
> > --------- Change the above line to
> >
> > if ( cuis.isEmpty() ) {
> >    writer.write( "\n" );
> > } else {
> >
> > >>                   writer.write( cui + "\n" );
> > >>                }
> > >>             }
> > >>          }
> > >>       }
> > >>    }
> > >> }
> >
> >
> > So, after
> > 1.  Editing the SentenceFirstCuiWriter
> > 2.  Running the maven package step
> > 3.  Unzipping your ctakes installation
> >
> > You should be able to
> > 1.  Run ctakes from command line like you did before
> > 2.  Use the custom piper file
> > 3.  Resolve the firstly-discovered procedure for a snippet on each line
> > 4.  Write file(s) with corresponding line-by-line cuis or empty lines
> > where none are resolved
> >
> > Let me know if I missed anything.
> >
> > Sean
> >
> > ________________________________________
> > From: Ryan Young <ro...@buffalo.edu>
> > Sent: Monday, March 30, 2020 9:44 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code
> > (CUI) [EXTERNAL]
> >
> > * External Email - Caution *
> >
> >
> > Hello Sean,
> >
> > I have run into some difficulty actually running the script you wrote
> > (SentenceFirstCuiWriter.java). I spent the last week doing the following:
> > 1.) Installed cTAKES developer version using Eclipse IDE
> > 2.) Added the appropriate import statements at the beginning of
> > SentenceFirstCuiWriter.java
> > 3.) Placed SentenceFirstCuiWriter.java in this directory:
> >
> >
> C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\cc
> > 4.) Successfully built and compiled cTAKES developer version
> > 5.) Successfully run the test configurations which were already in cTAKES
> > in Eclipse (Run --> Run As --> Maven test)
> >
> > My main question is how do I run the cTAKES developer version from
> command
> > line without running Eclipse or Maven?
> >
> > I found a post you made last year (
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Ddev_201907.mbox_-253C1563805239741.31947-2540childrens.harvard.edu-253E&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ilUJmT8axx_RhXR_47XCxeR_aqswpoVXkSF5HQAxASQ&s=dxIE3QRB6OI1CxljCVx7K9Lgih-ymSq-wou0LqCvkvk&e=
> > ).
> > You stated, *"You can put PipelineBuilder in any main(..) method and then
> > start that main(..) from a command line just as you would any other java
> > program.  Just like any other java program, you need to have your
> > $CLASSPATH set correctly and, for memory use, increase your maximum
> memory
> > with -Xmx .  These are VM options."*
> >
> > I think this is what I have to do. But, I am unsure of how to accomplish
> > this exactly. What I have tried already is:
> > 1.) Launch Command Prompt
> > 2.) Change directory to where PipelineBuilder.java is located
> > cd
> >
> >
> C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\pipeline\PipelineBuilder.java
> > 3.) Enter the following into Command Prompt
> > java org.apache.ctakes.core.pipeline.PiperFileRunner -p
> > C:\Users\Ryan\SkyDrive\Desktop\Piper_File.piper -i
> > C:\Users\Ryan\SkyDrive\Desktop\Input_Folder --writeXmis
> > C:\Users\Ryan\SkyDrive\Desktop\Output_Folder
> >
> > I receive the following error in Command Prompt:
> > Error: Could not find or load main class
> > org.apache.ctakes.core.pipeline.PiperFileRunner
> >
> > I am probably missing something. Just not sure what exactly. I'm not too
> > familiar with Java. The documentation I have been reading hasn't been as
> > helpful since cTAKES is a much more complex project than the simple
> > examples they provide.
> >
> > Lastly, I am using Windows 10.
> >
> > Thank You,
> >
> > Ryan Young
> > MD/MBA Candidate
> > Jacobs School of Medicine & Biomedical Sciences
> >
> > On Mon, Mar 23, 2020 at 3:28 PM Ryan Young <ro...@buffalo.edu> wrote:
> >
> > > Hello Sean,
> > >
> > > Wow! This was a lot more than I was anticipating! Thank you very much!
> > >
> > > To answer your questions...
> > > • I am using Windows 10
> > > • I have the Python script call a shell command to run a batch file.
> The
> > > batch file just contains the following line:
> > > "C:\cTAKES_4.0.0\bin\runPiperFile.bat"  -p "C:\path\to\piper.piper"
> > > • The Python script waits for the shell command to complete (i.e., when
> > > cTAKES is finished processing)
> > > • The Python script will then parse the output text files and then
> > > continue on with the code
> > >
> > > Prior to calling cTAKES, the surgery list is in a Pandas dataframe. The
> > > workaround I had created was to save each line of the surgery list
> column
> > > in the dataframe to a different text file to make it easier for when I
> > had
> > > to parse the output cTAKES text file. As I had mentioned previously, I
> > > would like to have just 1 input text file and 1 output text file (as
> long
> > > as the output file can be easily parsed by Python).
> > >
> > > Regarding my coding background, I don't have much background in Java.
> > > However, a few years ago, I had no knowledge of Python either, but I
> was
> > > able to teach myself while in medical school.
> > >
> > > A few more questions for you...
> > > 1.) Should I save the code you posted in the following location as a
> .jar
> > > file?
> > > C:\cTAKES_4.0.0\lib\SentenceFirstCuiWriter.jar
> > >
> > > 2.) Should I replace "add CuiLookupLister" with "add
> > > SentenceFirstCuiWriter" in the piper file or do I need both?
> > >
> > > 3.) If the SentenceFirstCuiWriter is unable to find a valid CUI, will
> it
> > > leave a blank, N/A, or NaN value? Having any of these values would
> > > definitely help when I have Python parse the output text file. When I
> > have
> > > Python read the output text file, I would have it delete any dataframe
> > rows
> > > with NaN or N/A in the CUI column.
> > >
> > > Thank you very much for your assistance!
> > >
> > > Ryan Young
> > > MD/MBA Candidate
> > > Jacobs School of Medicine & Biomedical Sciences
> > >
> > > On Mon, Mar 23, 2020 at 1:01 PM Finan, Sean <
> > > Sean.Finan@childrens.harvard.edu> wrote:
> > >
> > >> Hi Ryan,
> > >>
> > >> Here is some code for a writer that will do what you want.
> > >> To use it, get rid of those first two lines in the piper that I sent
> > >> (set, reader).
> > >> The default reader will work just fine, and it will allow you to
> process
> > >> multiple surgery lists in on run.
> > >>
> > >> Then just add SentenceFirstCuiWriter to the end of your piper.
> > >>
> > >> Sean
> > >>
> > >>
> > >> public class SentenceFirstCuiWriter extends AbstractJCasFileWriter {
> > >>
> > >>    public void writeFile( final JCas jCas, final String outputDir,
> > >>                           final String documentId, final String
> fileName
> > >> ) throws IOException {
> > >>       File cuiFile = new File( outputDir, fileName + "_cui.txt" );
> > >>       Map<Sentence, Collection<ProcedureMention>> sentenceMap
> > >>             = JCasUtil.indexCovered( jCas, Sentence.class,
> > >> ProcedureMention.class );
> > >>       List<Collection<ProcedureMention>> sortedSentenceProcedures
> > >>             = sentenceMap.entrySet()
> > >>                          .stream()
> > >>                          .sorted( Map.Entry.comparingByKey(
> > >> DefaultAspanComparator.INSTANCE ) )
> > >>                          .map( Map.Entry::getValue )
> > >>                          .collect( Collectors.toList() );
> > >>       try ( Writer writer = new BufferedWriter( new FileWriter(
> cuiFile
> > )
> > >> ) ) {
> > >>          for ( Collection<ProcedureMention> procedures :
> > >> sortedSentenceProcedures ) {
> > >>             ProcedureMention firstProcedure
> > >>                   = procedures.stream()
> > >>                               .min( Comparator.comparingInt(
> > >> ProcedureMention::getBegin ) )
> > >>                               .orElse( null );
> > >>             if ( firstProcedure != null ) {
> > >>                String cui
> > >>                      = OntologyConceptUtil.getCuis( firstProcedure )
> > >>                                           .stream()
> > >>                                           .findFirst()
> > >>                                           .orElse( "" );
> > >>                if ( !cui.isEmpty() ) {
> > >>                   writer.write( cui + "\n" );
> > >>                }
> > >>             }
> > >>          }
> > >>       }
> > >>    }
> > >> }
> > >>
> > >> ________________________________________
> > >> From: Ryan Young <ro...@buffalo.edu>
> > >> Sent: Monday, March 23, 2020 11:02 AM
> > >> To: dev@ctakes.apache.org
> > >> Subject: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code
> > >> (CUI) [EXTERNAL]
> > >>
> > >> * External Email - Caution *
> > >>
> > >>
> > >> Hello,
> > >>
> > >> I am a medical student who happened to come across cTAKES for a
> project
> > I
> > >> am working on. What I would like to do is take a list of surgeries in
> a
> > >> text file and have cTAKES output what it determines to be the best
> UMLS
> > >> code (CUI) for that particular line.
> > >>
> > >> Each line of the text file is independent of the others (i.e., each
> line
> > >> should be read and analyzed separately). For example, here's my list
> of
> > >> the
> > >> surgeries (Surgery_List.txt):
> > >> Colonoscopy with Polypectomy
> > >> Esophagogastroduodenoscopy Colonoscopy
> > >> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
> > >> aspiration
> > >>
> > >> When I run the piper file (see below), I get the following output:
> > >> Colonoscopy with Polypectomy
> > >> "Colonoscopy"
> > >>   Procedure
> > >>   C0009378 colonoscopy
> > >> "Polypectomy"
> > >>   Procedure
> > >>   C0521210 Resection of polyp
> > >>
> > >> Esophagogastroduodenoscopy Colonoscopy
> > >> "Esophagogastroduodenoscopy"
> > >>   Procedure
> > >>   C0079304 Esophagogastroduodenoscopy
> > >> "Colonoscopy"
> > >>   Procedure
> > >>   C0009378 colonoscopy
> > >>
> > >> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
> > >> aspiration
> > >> "Esophagogastroduodenoscopy"
> > >>   Procedure
> > >>   C0079304 Esophagogastroduodenoscopy
> > >> "Endoscopic ultrasound"
> > >>   Procedure
> > >>   C0376443 Endoscopic Ultrasound
> > >> "Endoscopic"
> > >>   Procedure
> > >>   C0014245 Endoscopy (procedure)
> > >> "ultrasound"
> > >>   Procedure
> > >>   C0041618 Ultrasonography
> > >> "Fine needle aspiration"
> > >>   Procedure
> > >>   C1510483 Fine needle aspiration biopsy
> > >> "aspiration"
> > >>   Procedure
> > >>   C0349707 Aspiration-action
> > >>
> > >> Here's the piper file I have been using:
> > >> reader org.apache.ctakes.core.cr.FileTreeReader
> > >> InputDirectory="C:\path\to\input\folder"
> > >> load DefaultTokenizerPipeline.piper
> > >>
> > >>
> >
> SentenceModelFile=C:\cTAKES_4.0.0\desc\ctakes-core\desc\analysis_engine\SentenceDetectorAnnotatorBIO.xml
> > >> add ContextDependentTokenizerAnnotator
> > >> add org.apache.ctakes.necontexts.ContextAnnotator
> > >> addDescription POSTagger
> > >> load ChunkerSubPipe.piper
> > >> set ctakes.umlsuser=my_username ctakes.umlspw=my_password
> > >> add org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator
> > >>
> > >>
> >
> DictionaryDescriptor=C:\cTAKES_4.0.0\desc\ctakes-dictionary-lookup-fast\desc\analysis_engine\UmlsLookupAnnotator.xml
> > >>
> > >>
> >
> LookupXml=C:\cTAKES_4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab.xml
> > >> add property.plaintext.PropertyTextWriterFit
> > >> OutputDirectory="C:\path\to\output\folder"
> > >>
> > >> The workaround I have developed is as follows...
> > >> 1.) Save each line of Surgery_List.txt to separate text files
> > >> 2.) Use a Python script to parse each individual text file to extract
> > the
> > >> first UMLS code (CUI) given in the text file
> > >>
> > >> The above method works fine when there's only 10 lines, but not so
> well
> > >> when there's 40,000 lines in Surgery_List.txt.
> > >>
> > >> Ideally, I would like for Fast Dictionary Lookup to just return the
> top
> > >> result for each line of Surgery_List.txt. For example, Output.txt
> would
> > >> look just like this:
> > >> C0009378
> > >> C0079304
> > >> C0079304
> > >>
> > >> Just for reference here's how UMLS codes correspond between
> > >> Surgery_List.txt and Output.txt:
> > >> C0009378 --> Colonoscopy with Polypectomy
> > >> C0079304 --> Esophagogastroduodenoscopy Colonoscopy
> > >> C0079304 --> Esophagogastroduodenoscopy with Endoscopic ultrasound
> Fine
> > >> needle aspiration
> > >>
> > >> Is there something I can add to the piper file to make this happen?
> > >>
> > >> Currently, I have the cTAKES user version installed, but I could
> install
> > >> the developer version if need be. I would just need a little guidance
> on
> > >> which Java script I would need to modify to get the desired results.
> > >>
> > >> Thank You,
> > >>
> > >> Ryan Young
> > >> MD/MBA Candidate
> > >> Jacobs School of Medicine & Biomedical Sciences
> > >>
> > >
> >
>

Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Ryan,

That list didn't work for me as email added its own line endings, splitting intended lines, and it ends up being 120 rows.
For instance:

Esophagogastroduodenoscopy Endoscopic retrograde cholangiopancreatography
with Ampullectomy

is 2 lines instead of 1.

I will write something new for you in a little bit and maybe we can figure this out.

Sean

________________________________________
From: Ryan Young <ro...@buffalo.edu>
Sent: Tuesday, March 31, 2020 10:13 PM
To: dev@ctakes.apache.org
Subject: Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL]

* External Email - Caution *


Hello Sean,

I was able to get cTAKES packaged. However, the output text file isn't the
same number of lines as the input text file. For example, if the input text
file is 10,000 lines long then the output text file ends up being 10,630
lines.

This makes me think that there's another conditional statement (or two)
which needs to be added to the end of SentenceFirstCuiWriter.java.

Here's the current version of SentenceFirstCuiWriter.java I am using:
public class SentenceFirstCuiWriter extends AbstractJCasFileWriter {

   public void writeFile( final JCas jCas, final String outputDir,
                          final String documentId, final String fileName )
throws IOException
{
      File cuiFile = new File( outputDir, fileName + "_cui.txt" );
      Map<Sentence, Collection<ProcedureMention>> sentenceMap
            = JCasUtil.indexCovered( jCas, Sentence.class,
ProcedureMention.class );
      List<Collection<ProcedureMention>> sortedSentenceProcedures
            = sentenceMap.entrySet()
                         .stream()
                         .sorted( Map.Entry.comparingByKey(
DefaultAspanComparator.INSTANCE
) )
                         .map( Map.Entry::getValue )
                         .collect( Collectors.toList() );
      try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile ) )
) {
         for ( Collection<ProcedureMention> procedures :
sortedSentenceProcedures )
{
            ProcedureMention firstProcedure
                  = procedures.stream()
                              .min( Comparator.comparingInt(
ProcedureMention::getBegin )
)
                              .orElse( null );
            if ( firstProcedure == null ) {
            writer.write( "\n" );
            } else {
               String cui
                     = OntologyConceptUtil.getCuis( firstProcedure )
                                          .stream()
                                          .findFirst()
                                          .orElse( "" );
          if ( cui.isEmpty() ) {
          writer.write( "\n" );
          } else {
                  writer.write( cui + "\n" );
               }
            }
         }
      }
   }
}

Below is the piper file I am using:
// Piper
reader org.apache.ctakes.core.cr.FileTreeReader
InputDirectory="C:\path\to\input\folder"
set ctakes.umlsuser=username ctakes.umlspw=password
load DefaultTokenizerPipeline
add POSTagger
load DictionarySubPipe
add SentenceFirstCuiWriter OutputDirectory="C:\path\to\output\folder"

If it helps, I have listed the first 100 lines of the input text file.
Again, the expected output text file should be 100 lines (i.e., 100 CUIs)
as well. However, the output text file returns 103 lines (103 CUIs). 3
extra CUI than what it should.
Input.txt
Colonoscopy with Polypectomy
Esophagogastroduodenoscopy Colonoscopy
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
Esophagogastroduodenoscopy with Endoscopic ultrasound
Esophagogastroduodenoscopy with Biopsy()
Linear Endobronchial Ultrasound (EBUS) with Nav Bronch
Bronchoscopy Endobronchial Ultrasound (EBUS) OR
Esophagogastroduodenoscopy with Dilation Savary
Esophagogastroduodenoscopy
Esophagogastroduodenoscopy Endoscopic retrograde cholangiopancreatography
with Ampullectomy
Wide Local Excision Flap Local Cheek Skin Graft Full Thickness (FTSG)
Esophagogastroduodenoscopy with Dilation Balloon
Esophagogastroduodenoscopy with Biopsy()
Excision Soft Tissue Tumor
Axillary Node Dissection
Wide Local Excision w Removal of Radioactive Seed
Laparoscopic Partial Gastrectomy
ZLumpectomy, with   Sentinel lymph node Biopsy Sentinel Lymph Node Biopsy
Excision
Cysto with Pre-Op Ureteral Catheter Placement Diagnostic Laparoscopy
Sigmoid Colectomy Salpingo Oophorectomy
Laminectomy Cervical with Instrumentation
Transanal Endoscopic Microsurgery
Implantation Procedure Ommaya Reservoir Insertion with Axiem
Suprahyoid Lymphadenectomy Procedure Transcervical Extended Mediastinal
Lymphadenectomy (Transcervcl Extndd Medstnl Lymphadenectmy) Video Assisted
Thorascopic Surgery with Lobectomy Intubating Bronchoscopy Nerve Block
Intercostal, Multiple
Video Assisted Thorascopic Surgery with Wedge Resection Nerve Block
Intercostal, Multiple
Video Assisted Thorascopic Surgery with Decortication
Esophagogastroduodenoscopy Colonoscopy
Esophagogastroduodenoscopy with Endoscopic ultrasound
Colonoscopy with Biopsy()
Esophagogastroduodenoscopy with Endoscopic ultrasound
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
aspiration Endoscopic retrograde cholangiopancreatography with Ampullectomy
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
Esophagogastroduodenoscopy
Colonoscopy
Colonoscopy
Esophagogastroduodenoscopy with Biopsy()
Esophagogastroduodenoscopy with Esophageal Stent Placement
Colonoscopy with Biopsy()
Esophagogastroduodenoscopy with Biopsy()
Esophagogastroduodenoscopy Colonoscopy
Colonoscopy with Biopsy()
Wide local excision with Removal of Seeds Sentinel lymph node Biopsys
Wide Local Excision w Removal of Radioactive Seed
Abscess Drainage Empyema Rib Resection Flap Latissimus Dorsi Thoracoplasty
Removal of Foreign Body
Laparoscopic Cholecystectomy Laparoscopic Liver Biopsy
Laparotomy Salpingo Oophorectomy Resection Pelvic Abcess Ruptured
Diverticulum
Minimally Invasive Esophagectomy with Feeding J
Ileostomy
Diagnostic Hysteroscopy Dilation Curettage (D and C)
Exploratory Laparotomy Lysis of Adhesions Bowel Resection End to End
 Anastomosis Take Down of Ostomy
Robot Assisted Sigmoid Colon Resection Robot Assisted Right Colectomy
Esophagogastroduodenoscopy with Dilation Balloon
Segmentectomy (Thoracic) Nerve Block Intercostal, Multiple
Video Assisted Thorascopic Surgery with Wedge Resection Intubating
Bronchoscopy Nerve Block Intercostal, Multiple
Esophagogastroduodenoscopy Colonoscopy
Esophagogastroduodenoscopy Endoscopy Mucosal Resection
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
aspiration Endoscopic retrograde cholangiopancreatography with Stent Change
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
aspiration Endoscopic retrograde cholangiopancreatography with Ampullectomy
Esophagogastroduodenoscopy with Esophageal Stent Placement
Colonoscopy
Esophagogastroduodenoscopy with Biopsy()
Colonoscopy with Biopsy()
Esophagogastroduodenoscopy
Esophagogastroduodenoscopy with Biopsy()
Craniectomy Frontal withStealth Drainage of Abscess
Craniotomy Excision Tumor Posterior Fossa Craniotomy Temporal
Thyroid Lobectomy with Isthmusectomy
Neck Exploration Mediastinal Exploration Video Assisted Thorascopic Surgery
with Bullectomy
Thyroid Lobectomy
Cytoreduction Debulking Hyperthermic Intraperitoneal Chemotherapy (HIPEC)
Dilation Curettage (D and C) with Hysteroscopy
Open Biopsy Lymph Node Biopsy Excision
Total abdominal hysterectomy Bilateral salpingo-oophorectomy with Radical
Dissection For Debulking Exploratory Laparotomy Peritoneal Stripping
Resection of Tumor Resection of Tumor Bowel Resection Omentectomy
Laminectomy Lumbar
Craniotomy Occipital with Axiem
Cytoreduction Debulking Hyperthermic Intraperitoneal Chemotherapy (HIPEC)
Exploratory Laparotomy Colectomy Partial Omentectomy
Cystoscopy with Ureteral Stent Insertion Change Cystoscopy with Ureteral
Cath  Retrograde Pyelogramm
Cystoscopy with TURP
Retrograde Pyelogram Ureteroscopy Cystoscopy with Ureteral Cathm
Colonoscopy with Polypectomy
Esophagogastroduodenoscopy with Dilation Balloon
Esophagogastroduodenoscopy with RFA (Halo)
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
Esophagogastroduodenoscopy with Endoscopic ultrasound
Esophagogastroduodenoscopy with Endoscopic ultrasound
Linear Endobronchial Ultrasound (EBUS) with Nav Bronch
Linear Endobronchial Ultrasound (EBUS) with Nav Bronch
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
aspiration Endoscopic retrograde cholangiopancreatography with Ampullectomy
Bronchoscopy with Endobronchial Ultrasound (EBUS)
Microsuspension Laryngoscopy
Selective Neck Dissection
Wide Local Excision w Removal of Radioactive Seed
Breast Re-Excision
Bronchoscopy
Robot Assisted Total Hysterectomy SO
Craniotomy Parietal with ioMRI
Bronchoscopy with Biopsy()
Ex Laparotomy Total abdominal hysterectomy with Salpingo Oophorectomy
Robot Assisted Prostatectomy Robot Assisted Pelvic Lymphadenectomy

Thank You,

Ryan Young

On Tue, Mar 31, 2020 at 10:44 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Ryan,
>
> You made some excellent progress.  ctakes is a little complicated for new
> users - especially anybody that isn't familiar with Java.
>
> Since you are going to be running from a  command line (via python) and
> have already done so successfully, we can just try to get you set up to
> repeat that process.
>
> In Eclipse, you should be able to run the maven "package" configuration.
>
> That will compile and build an installation similar to what you were using
> before.
>
> After you execute maven package,
> open the directory ctakes-distribution/target/
> There should be a .zip file named apache-ctakes-4.0.1-SNAPSHOT-bin
> That zip file contains a ctakes installation for Windows.
> Unzip the installation wherever you like - preferably without spaces in
> directory names.
>
> You should be able to treat this new installation just like you did the
> one downloaded from the ctakes website.
>
> Before you do all of that ...  We should change a couple of things in that
> SentenceFirstCuiWriter to output blanks where procedures or cuis are not
> discovered for your snippets.
>
>
> >> public class SentenceFirstCuiWriter extends AbstractJCasFileWriter {
> >>
> >>    public void writeFile( final JCas jCas, final String outputDir,
> >>                           final String documentId, final String fileName
> >> ) throws IOException {
> >>       File cuiFile = new File( outputDir, fileName + "_cui.txt" );
> >>       Map<Sentence, Collection<ProcedureMention>> sentenceMap
> >>             = JCasUtil.indexCovered( jCas, Sentence.class,
> >> ProcedureMention.class );
> >>       List<Collection<ProcedureMention>> sortedSentenceProcedures
> >>             = sentenceMap.entrySet()
> >>                          .stream()
> >>                          .sorted( Map.Entry.comparingByKey(
> >> DefaultAspanComparator.INSTANCE ) )
> >>                          .map( Map.Entry::getValue )
> >>                          .collect( Collectors.toList() );
> >>       try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile
> )
> >> ) ) {
> >>          for ( Collection<ProcedureMention> procedures :
> >> sortedSentenceProcedures ) {
> >>             ProcedureMention firstProcedure
> >>                   = procedures.stream()
> >>                               .min( Comparator.comparingInt(
> >> ProcedureMention::getBegin ) )
> >>                               .orElse( null );
> >>             if ( firstProcedure != null ) {
>
> ---------- Change the above line to
>
> if ( firstProcedure == null ) {
>    writer.write( "\n" );
> } else {
>
> >>                String cui
> >>                      = OntologyConceptUtil.getCuis( firstProcedure )
> >>                                           .stream()
> >>                                           .findFirst()
> >>                                           .orElse( "" );
> >>                if ( !cui.isEmpty() ) {
>
> --------- Change the above line to
>
> if ( cuis.isEmpty() ) {
>    writer.write( "\n" );
> } else {
>
> >>                   writer.write( cui + "\n" );
> >>                }
> >>             }
> >>          }
> >>       }
> >>    }
> >> }
>
>
> So, after
> 1.  Editing the SentenceFirstCuiWriter
> 2.  Running the maven package step
> 3.  Unzipping your ctakes installation
>
> You should be able to
> 1.  Run ctakes from command line like you did before
> 2.  Use the custom piper file
> 3.  Resolve the firstly-discovered procedure for a snippet on each line
> 4.  Write file(s) with corresponding line-by-line cuis or empty lines
> where none are resolved
>
> Let me know if I missed anything.
>
> Sean
>
> ________________________________________
> From: Ryan Young <ro...@buffalo.edu>
> Sent: Monday, March 30, 2020 9:44 PM
> To: dev@ctakes.apache.org
> Subject: Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code
> (CUI) [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hello Sean,
>
> I have run into some difficulty actually running the script you wrote
> (SentenceFirstCuiWriter.java). I spent the last week doing the following:
> 1.) Installed cTAKES developer version using Eclipse IDE
> 2.) Added the appropriate import statements at the beginning of
> SentenceFirstCuiWriter.java
> 3.) Placed SentenceFirstCuiWriter.java in this directory:
>
> C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\cc
> 4.) Successfully built and compiled cTAKES developer version
> 5.) Successfully run the test configurations which were already in cTAKES
> in Eclipse (Run --> Run As --> Maven test)
>
> My main question is how do I run the cTAKES developer version from command
> line without running Eclipse or Maven?
>
> I found a post you made last year (
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Ddev_201907.mbox_-253C1563805239741.31947-2540childrens.harvard.edu-253E&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ilUJmT8axx_RhXR_47XCxeR_aqswpoVXkSF5HQAxASQ&s=dxIE3QRB6OI1CxljCVx7K9Lgih-ymSq-wou0LqCvkvk&e=
> ).
> You stated, *"You can put PipelineBuilder in any main(..) method and then
> start that main(..) from a command line just as you would any other java
> program.  Just like any other java program, you need to have your
> $CLASSPATH set correctly and, for memory use, increase your maximum memory
> with -Xmx .  These are VM options."*
>
> I think this is what I have to do. But, I am unsure of how to accomplish
> this exactly. What I have tried already is:
> 1.) Launch Command Prompt
> 2.) Change directory to where PipelineBuilder.java is located
> cd
>
> C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\pipeline\PipelineBuilder.java
> 3.) Enter the following into Command Prompt
> java org.apache.ctakes.core.pipeline.PiperFileRunner -p
> C:\Users\Ryan\SkyDrive\Desktop\Piper_File.piper -i
> C:\Users\Ryan\SkyDrive\Desktop\Input_Folder --writeXmis
> C:\Users\Ryan\SkyDrive\Desktop\Output_Folder
>
> I receive the following error in Command Prompt:
> Error: Could not find or load main class
> org.apache.ctakes.core.pipeline.PiperFileRunner
>
> I am probably missing something. Just not sure what exactly. I'm not too
> familiar with Java. The documentation I have been reading hasn't been as
> helpful since cTAKES is a much more complex project than the simple
> examples they provide.
>
> Lastly, I am using Windows 10.
>
> Thank You,
>
> Ryan Young
> MD/MBA Candidate
> Jacobs School of Medicine & Biomedical Sciences
>
> On Mon, Mar 23, 2020 at 3:28 PM Ryan Young <ro...@buffalo.edu> wrote:
>
> > Hello Sean,
> >
> > Wow! This was a lot more than I was anticipating! Thank you very much!
> >
> > To answer your questions...
> > • I am using Windows 10
> > • I have the Python script call a shell command to run a batch file. The
> > batch file just contains the following line:
> > "C:\cTAKES_4.0.0\bin\runPiperFile.bat"  -p "C:\path\to\piper.piper"
> > • The Python script waits for the shell command to complete (i.e., when
> > cTAKES is finished processing)
> > • The Python script will then parse the output text files and then
> > continue on with the code
> >
> > Prior to calling cTAKES, the surgery list is in a Pandas dataframe. The
> > workaround I had created was to save each line of the surgery list column
> > in the dataframe to a different text file to make it easier for when I
> had
> > to parse the output cTAKES text file. As I had mentioned previously, I
> > would like to have just 1 input text file and 1 output text file (as long
> > as the output file can be easily parsed by Python).
> >
> > Regarding my coding background, I don't have much background in Java.
> > However, a few years ago, I had no knowledge of Python either, but I was
> > able to teach myself while in medical school.
> >
> > A few more questions for you...
> > 1.) Should I save the code you posted in the following location as a .jar
> > file?
> > C:\cTAKES_4.0.0\lib\SentenceFirstCuiWriter.jar
> >
> > 2.) Should I replace "add CuiLookupLister" with "add
> > SentenceFirstCuiWriter" in the piper file or do I need both?
> >
> > 3.) If the SentenceFirstCuiWriter is unable to find a valid CUI, will it
> > leave a blank, N/A, or NaN value? Having any of these values would
> > definitely help when I have Python parse the output text file. When I
> have
> > Python read the output text file, I would have it delete any dataframe
> rows
> > with NaN or N/A in the CUI column.
> >
> > Thank you very much for your assistance!
> >
> > Ryan Young
> > MD/MBA Candidate
> > Jacobs School of Medicine & Biomedical Sciences
> >
> > On Mon, Mar 23, 2020 at 1:01 PM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> >> Hi Ryan,
> >>
> >> Here is some code for a writer that will do what you want.
> >> To use it, get rid of those first two lines in the piper that I sent
> >> (set, reader).
> >> The default reader will work just fine, and it will allow you to process
> >> multiple surgery lists in on run.
> >>
> >> Then just add SentenceFirstCuiWriter to the end of your piper.
> >>
> >> Sean
> >>
> >>
> >> public class SentenceFirstCuiWriter extends AbstractJCasFileWriter {
> >>
> >>    public void writeFile( final JCas jCas, final String outputDir,
> >>                           final String documentId, final String fileName
> >> ) throws IOException {
> >>       File cuiFile = new File( outputDir, fileName + "_cui.txt" );
> >>       Map<Sentence, Collection<ProcedureMention>> sentenceMap
> >>             = JCasUtil.indexCovered( jCas, Sentence.class,
> >> ProcedureMention.class );
> >>       List<Collection<ProcedureMention>> sortedSentenceProcedures
> >>             = sentenceMap.entrySet()
> >>                          .stream()
> >>                          .sorted( Map.Entry.comparingByKey(
> >> DefaultAspanComparator.INSTANCE ) )
> >>                          .map( Map.Entry::getValue )
> >>                          .collect( Collectors.toList() );
> >>       try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile
> )
> >> ) ) {
> >>          for ( Collection<ProcedureMention> procedures :
> >> sortedSentenceProcedures ) {
> >>             ProcedureMention firstProcedure
> >>                   = procedures.stream()
> >>                               .min( Comparator.comparingInt(
> >> ProcedureMention::getBegin ) )
> >>                               .orElse( null );
> >>             if ( firstProcedure != null ) {
> >>                String cui
> >>                      = OntologyConceptUtil.getCuis( firstProcedure )
> >>                                           .stream()
> >>                                           .findFirst()
> >>                                           .orElse( "" );
> >>                if ( !cui.isEmpty() ) {
> >>                   writer.write( cui + "\n" );
> >>                }
> >>             }
> >>          }
> >>       }
> >>    }
> >> }
> >>
> >> ________________________________________
> >> From: Ryan Young <ro...@buffalo.edu>
> >> Sent: Monday, March 23, 2020 11:02 AM
> >> To: dev@ctakes.apache.org
> >> Subject: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code
> >> (CUI) [EXTERNAL]
> >>
> >> * External Email - Caution *
> >>
> >>
> >> Hello,
> >>
> >> I am a medical student who happened to come across cTAKES for a project
> I
> >> am working on. What I would like to do is take a list of surgeries in a
> >> text file and have cTAKES output what it determines to be the best UMLS
> >> code (CUI) for that particular line.
> >>
> >> Each line of the text file is independent of the others (i.e., each line
> >> should be read and analyzed separately). For example, here's my list of
> >> the
> >> surgeries (Surgery_List.txt):
> >> Colonoscopy with Polypectomy
> >> Esophagogastroduodenoscopy Colonoscopy
> >> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
> >> aspiration
> >>
> >> When I run the piper file (see below), I get the following output:
> >> Colonoscopy with Polypectomy
> >> "Colonoscopy"
> >>   Procedure
> >>   C0009378 colonoscopy
> >> "Polypectomy"
> >>   Procedure
> >>   C0521210 Resection of polyp
> >>
> >> Esophagogastroduodenoscopy Colonoscopy
> >> "Esophagogastroduodenoscopy"
> >>   Procedure
> >>   C0079304 Esophagogastroduodenoscopy
> >> "Colonoscopy"
> >>   Procedure
> >>   C0009378 colonoscopy
> >>
> >> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
> >> aspiration
> >> "Esophagogastroduodenoscopy"
> >>   Procedure
> >>   C0079304 Esophagogastroduodenoscopy
> >> "Endoscopic ultrasound"
> >>   Procedure
> >>   C0376443 Endoscopic Ultrasound
> >> "Endoscopic"
> >>   Procedure
> >>   C0014245 Endoscopy (procedure)
> >> "ultrasound"
> >>   Procedure
> >>   C0041618 Ultrasonography
> >> "Fine needle aspiration"
> >>   Procedure
> >>   C1510483 Fine needle aspiration biopsy
> >> "aspiration"
> >>   Procedure
> >>   C0349707 Aspiration-action
> >>
> >> Here's the piper file I have been using:
> >> reader org.apache.ctakes.core.cr.FileTreeReader
> >> InputDirectory="C:\path\to\input\folder"
> >> load DefaultTokenizerPipeline.piper
> >>
> >>
> SentenceModelFile=C:\cTAKES_4.0.0\desc\ctakes-core\desc\analysis_engine\SentenceDetectorAnnotatorBIO.xml
> >> add ContextDependentTokenizerAnnotator
> >> add org.apache.ctakes.necontexts.ContextAnnotator
> >> addDescription POSTagger
> >> load ChunkerSubPipe.piper
> >> set ctakes.umlsuser=my_username ctakes.umlspw=my_password
> >> add org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator
> >>
> >>
> DictionaryDescriptor=C:\cTAKES_4.0.0\desc\ctakes-dictionary-lookup-fast\desc\analysis_engine\UmlsLookupAnnotator.xml
> >>
> >>
> LookupXml=C:\cTAKES_4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab.xml
> >> add property.plaintext.PropertyTextWriterFit
> >> OutputDirectory="C:\path\to\output\folder"
> >>
> >> The workaround I have developed is as follows...
> >> 1.) Save each line of Surgery_List.txt to separate text files
> >> 2.) Use a Python script to parse each individual text file to extract
> the
> >> first UMLS code (CUI) given in the text file
> >>
> >> The above method works fine when there's only 10 lines, but not so well
> >> when there's 40,000 lines in Surgery_List.txt.
> >>
> >> Ideally, I would like for Fast Dictionary Lookup to just return the top
> >> result for each line of Surgery_List.txt. For example, Output.txt would
> >> look just like this:
> >> C0009378
> >> C0079304
> >> C0079304
> >>
> >> Just for reference here's how UMLS codes correspond between
> >> Surgery_List.txt and Output.txt:
> >> C0009378 --> Colonoscopy with Polypectomy
> >> C0079304 --> Esophagogastroduodenoscopy Colonoscopy
> >> C0079304 --> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine
> >> needle aspiration
> >>
> >> Is there something I can add to the piper file to make this happen?
> >>
> >> Currently, I have the cTAKES user version installed, but I could install
> >> the developer version if need be. I would just need a little guidance on
> >> which Java script I would need to modify to get the desired results.
> >>
> >> Thank You,
> >>
> >> Ryan Young
> >> MD/MBA Candidate
> >> Jacobs School of Medicine & Biomedical Sciences
> >>
> >
>

Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL]

Posted by Ryan Young <ro...@buffalo.edu>.
Hello Sean,

I was able to get cTAKES packaged. However, the output text file isn't the
same number of lines as the input text file. For example, if the input text
file is 10,000 lines long then the output text file ends up being 10,630
lines.

This makes me think that there's another conditional statement (or two)
which needs to be added to the end of SentenceFirstCuiWriter.java.

Here's the current version of SentenceFirstCuiWriter.java I am using:
public class SentenceFirstCuiWriter extends AbstractJCasFileWriter {

   public void writeFile( final JCas jCas, final String outputDir,
                          final String documentId, final String fileName )
throws IOException
{
      File cuiFile = new File( outputDir, fileName + "_cui.txt" );
      Map<Sentence, Collection<ProcedureMention>> sentenceMap
            = JCasUtil.indexCovered( jCas, Sentence.class,
ProcedureMention.class );
      List<Collection<ProcedureMention>> sortedSentenceProcedures
            = sentenceMap.entrySet()
                         .stream()
                         .sorted( Map.Entry.comparingByKey(
DefaultAspanComparator.INSTANCE
) )
                         .map( Map.Entry::getValue )
                         .collect( Collectors.toList() );
      try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile ) )
) {
         for ( Collection<ProcedureMention> procedures :
sortedSentenceProcedures )
{
            ProcedureMention firstProcedure
                  = procedures.stream()
                              .min( Comparator.comparingInt(
ProcedureMention::getBegin )
)
                              .orElse( null );
            if ( firstProcedure == null ) {
            writer.write( "\n" );
            } else {
               String cui
                     = OntologyConceptUtil.getCuis( firstProcedure )
                                          .stream()
                                          .findFirst()
                                          .orElse( "" );
          if ( cui.isEmpty() ) {
          writer.write( "\n" );
          } else {
                  writer.write( cui + "\n" );
               }
            }
         }
      }
   }
}

Below is the piper file I am using:
// Piper
reader org.apache.ctakes.core.cr.FileTreeReader
InputDirectory="C:\path\to\input\folder"
set ctakes.umlsuser=username ctakes.umlspw=password
load DefaultTokenizerPipeline
add POSTagger
load DictionarySubPipe
add SentenceFirstCuiWriter OutputDirectory="C:\path\to\output\folder"

If it helps, I have listed the first 100 lines of the input text file.
Again, the expected output text file should be 100 lines (i.e., 100 CUIs)
as well. However, the output text file returns 103 lines (103 CUIs). 3
extra CUI than what it should.
Input.txt
Colonoscopy with Polypectomy
Esophagogastroduodenoscopy Colonoscopy
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
Esophagogastroduodenoscopy with Endoscopic ultrasound
Esophagogastroduodenoscopy with Biopsy()
Linear Endobronchial Ultrasound (EBUS) with Nav Bronch
Bronchoscopy Endobronchial Ultrasound (EBUS) OR
Esophagogastroduodenoscopy with Dilation Savary
Esophagogastroduodenoscopy
Esophagogastroduodenoscopy Endoscopic retrograde cholangiopancreatography
with Ampullectomy
Wide Local Excision Flap Local Cheek Skin Graft Full Thickness (FTSG)
Esophagogastroduodenoscopy with Dilation Balloon
Esophagogastroduodenoscopy with Biopsy()
Excision Soft Tissue Tumor
Axillary Node Dissection
Wide Local Excision w Removal of Radioactive Seed
Laparoscopic Partial Gastrectomy
ZLumpectomy, with   Sentinel lymph node Biopsy Sentinel Lymph Node Biopsy
Excision
Cysto with Pre-Op Ureteral Catheter Placement Diagnostic Laparoscopy
Sigmoid Colectomy Salpingo Oophorectomy
Laminectomy Cervical with Instrumentation
Transanal Endoscopic Microsurgery
Implantation Procedure Ommaya Reservoir Insertion with Axiem
Suprahyoid Lymphadenectomy Procedure Transcervical Extended Mediastinal
Lymphadenectomy (Transcervcl Extndd Medstnl Lymphadenectmy) Video Assisted
Thorascopic Surgery with Lobectomy Intubating Bronchoscopy Nerve Block
Intercostal, Multiple
Video Assisted Thorascopic Surgery with Wedge Resection Nerve Block
Intercostal, Multiple
Video Assisted Thorascopic Surgery with Decortication
Esophagogastroduodenoscopy Colonoscopy
Esophagogastroduodenoscopy with Endoscopic ultrasound
Colonoscopy with Biopsy()
Esophagogastroduodenoscopy with Endoscopic ultrasound
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
aspiration Endoscopic retrograde cholangiopancreatography with Ampullectomy
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
Esophagogastroduodenoscopy
Colonoscopy
Colonoscopy
Esophagogastroduodenoscopy with Biopsy()
Esophagogastroduodenoscopy with Esophageal Stent Placement
Colonoscopy with Biopsy()
Esophagogastroduodenoscopy with Biopsy()
Esophagogastroduodenoscopy Colonoscopy
Colonoscopy with Biopsy()
Wide local excision with Removal of Seeds Sentinel lymph node Biopsys
Wide Local Excision w Removal of Radioactive Seed
Abscess Drainage Empyema Rib Resection Flap Latissimus Dorsi Thoracoplasty
Removal of Foreign Body
Laparoscopic Cholecystectomy Laparoscopic Liver Biopsy
Laparotomy Salpingo Oophorectomy Resection Pelvic Abcess Ruptured
Diverticulum
Minimally Invasive Esophagectomy with Feeding J
Ileostomy
Diagnostic Hysteroscopy Dilation Curettage (D and C)
Exploratory Laparotomy Lysis of Adhesions Bowel Resection End to End
 Anastomosis Take Down of Ostomy
Robot Assisted Sigmoid Colon Resection Robot Assisted Right Colectomy
Esophagogastroduodenoscopy with Dilation Balloon
Segmentectomy (Thoracic) Nerve Block Intercostal, Multiple
Video Assisted Thorascopic Surgery with Wedge Resection Intubating
Bronchoscopy Nerve Block Intercostal, Multiple
Esophagogastroduodenoscopy Colonoscopy
Esophagogastroduodenoscopy Endoscopy Mucosal Resection
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
aspiration Endoscopic retrograde cholangiopancreatography with Stent Change
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
aspiration Endoscopic retrograde cholangiopancreatography with Ampullectomy
Esophagogastroduodenoscopy with Esophageal Stent Placement
Colonoscopy
Esophagogastroduodenoscopy with Biopsy()
Colonoscopy with Biopsy()
Esophagogastroduodenoscopy
Esophagogastroduodenoscopy with Biopsy()
Craniectomy Frontal withStealth Drainage of Abscess
Craniotomy Excision Tumor Posterior Fossa Craniotomy Temporal
Thyroid Lobectomy with Isthmusectomy
Neck Exploration Mediastinal Exploration Video Assisted Thorascopic Surgery
with Bullectomy
Thyroid Lobectomy
Cytoreduction Debulking Hyperthermic Intraperitoneal Chemotherapy (HIPEC)
Dilation Curettage (D and C) with Hysteroscopy
Open Biopsy Lymph Node Biopsy Excision
Total abdominal hysterectomy Bilateral salpingo-oophorectomy with Radical
Dissection For Debulking Exploratory Laparotomy Peritoneal Stripping
Resection of Tumor Resection of Tumor Bowel Resection Omentectomy
Laminectomy Lumbar
Craniotomy Occipital with Axiem
Cytoreduction Debulking Hyperthermic Intraperitoneal Chemotherapy (HIPEC)
Exploratory Laparotomy Colectomy Partial Omentectomy
Cystoscopy with Ureteral Stent Insertion Change Cystoscopy with Ureteral
Cath  Retrograde Pyelogramm
Cystoscopy with TURP
Retrograde Pyelogram Ureteroscopy Cystoscopy with Ureteral Cathm
Colonoscopy with Polypectomy
Esophagogastroduodenoscopy with Dilation Balloon
Esophagogastroduodenoscopy with RFA (Halo)
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
Esophagogastroduodenoscopy with Endoscopic ultrasound
Esophagogastroduodenoscopy with Endoscopic ultrasound
Linear Endobronchial Ultrasound (EBUS) with Nav Bronch
Linear Endobronchial Ultrasound (EBUS) with Nav Bronch
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
aspiration Endoscopic retrograde cholangiopancreatography with Ampullectomy
Bronchoscopy with Endobronchial Ultrasound (EBUS)
Microsuspension Laryngoscopy
Selective Neck Dissection
Wide Local Excision w Removal of Radioactive Seed
Breast Re-Excision
Bronchoscopy
Robot Assisted Total Hysterectomy SO
Craniotomy Parietal with ioMRI
Bronchoscopy with Biopsy()
Ex Laparotomy Total abdominal hysterectomy with Salpingo Oophorectomy
Robot Assisted Prostatectomy Robot Assisted Pelvic Lymphadenectomy

Thank You,

Ryan Young

On Tue, Mar 31, 2020 at 10:44 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Ryan,
>
> You made some excellent progress.  ctakes is a little complicated for new
> users - especially anybody that isn't familiar with Java.
>
> Since you are going to be running from a  command line (via python) and
> have already done so successfully, we can just try to get you set up to
> repeat that process.
>
> In Eclipse, you should be able to run the maven "package" configuration.
>
> That will compile and build an installation similar to what you were using
> before.
>
> After you execute maven package,
> open the directory ctakes-distribution/target/
> There should be a .zip file named apache-ctakes-4.0.1-SNAPSHOT-bin
> That zip file contains a ctakes installation for Windows.
> Unzip the installation wherever you like - preferably without spaces in
> directory names.
>
> You should be able to treat this new installation just like you did the
> one downloaded from the ctakes website.
>
> Before you do all of that ...  We should change a couple of things in that
> SentenceFirstCuiWriter to output blanks where procedures or cuis are not
> discovered for your snippets.
>
>
> >> public class SentenceFirstCuiWriter extends AbstractJCasFileWriter {
> >>
> >>    public void writeFile( final JCas jCas, final String outputDir,
> >>                           final String documentId, final String fileName
> >> ) throws IOException {
> >>       File cuiFile = new File( outputDir, fileName + "_cui.txt" );
> >>       Map<Sentence, Collection<ProcedureMention>> sentenceMap
> >>             = JCasUtil.indexCovered( jCas, Sentence.class,
> >> ProcedureMention.class );
> >>       List<Collection<ProcedureMention>> sortedSentenceProcedures
> >>             = sentenceMap.entrySet()
> >>                          .stream()
> >>                          .sorted( Map.Entry.comparingByKey(
> >> DefaultAspanComparator.INSTANCE ) )
> >>                          .map( Map.Entry::getValue )
> >>                          .collect( Collectors.toList() );
> >>       try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile
> )
> >> ) ) {
> >>          for ( Collection<ProcedureMention> procedures :
> >> sortedSentenceProcedures ) {
> >>             ProcedureMention firstProcedure
> >>                   = procedures.stream()
> >>                               .min( Comparator.comparingInt(
> >> ProcedureMention::getBegin ) )
> >>                               .orElse( null );
> >>             if ( firstProcedure != null ) {
>
> ---------- Change the above line to
>
> if ( firstProcedure == null ) {
>    writer.write( "\n" );
> } else {
>
> >>                String cui
> >>                      = OntologyConceptUtil.getCuis( firstProcedure )
> >>                                           .stream()
> >>                                           .findFirst()
> >>                                           .orElse( "" );
> >>                if ( !cui.isEmpty() ) {
>
> --------- Change the above line to
>
> if ( cuis.isEmpty() ) {
>    writer.write( "\n" );
> } else {
>
> >>                   writer.write( cui + "\n" );
> >>                }
> >>             }
> >>          }
> >>       }
> >>    }
> >> }
>
>
> So, after
> 1.  Editing the SentenceFirstCuiWriter
> 2.  Running the maven package step
> 3.  Unzipping your ctakes installation
>
> You should be able to
> 1.  Run ctakes from command line like you did before
> 2.  Use the custom piper file
> 3.  Resolve the firstly-discovered procedure for a snippet on each line
> 4.  Write file(s) with corresponding line-by-line cuis or empty lines
> where none are resolved
>
> Let me know if I missed anything.
>
> Sean
>
> ________________________________________
> From: Ryan Young <ro...@buffalo.edu>
> Sent: Monday, March 30, 2020 9:44 PM
> To: dev@ctakes.apache.org
> Subject: Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code
> (CUI) [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hello Sean,
>
> I have run into some difficulty actually running the script you wrote
> (SentenceFirstCuiWriter.java). I spent the last week doing the following:
> 1.) Installed cTAKES developer version using Eclipse IDE
> 2.) Added the appropriate import statements at the beginning of
> SentenceFirstCuiWriter.java
> 3.) Placed SentenceFirstCuiWriter.java in this directory:
>
> C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\cc
> 4.) Successfully built and compiled cTAKES developer version
> 5.) Successfully run the test configurations which were already in cTAKES
> in Eclipse (Run --> Run As --> Maven test)
>
> My main question is how do I run the cTAKES developer version from command
> line without running Eclipse or Maven?
>
> I found a post you made last year (
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Ddev_201907.mbox_-253C1563805239741.31947-2540childrens.harvard.edu-253E&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ilUJmT8axx_RhXR_47XCxeR_aqswpoVXkSF5HQAxASQ&s=dxIE3QRB6OI1CxljCVx7K9Lgih-ymSq-wou0LqCvkvk&e=
> ).
> You stated, *"You can put PipelineBuilder in any main(..) method and then
> start that main(..) from a command line just as you would any other java
> program.  Just like any other java program, you need to have your
> $CLASSPATH set correctly and, for memory use, increase your maximum memory
> with -Xmx .  These are VM options."*
>
> I think this is what I have to do. But, I am unsure of how to accomplish
> this exactly. What I have tried already is:
> 1.) Launch Command Prompt
> 2.) Change directory to where PipelineBuilder.java is located
> cd
>
> C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\pipeline\PipelineBuilder.java
> 3.) Enter the following into Command Prompt
> java org.apache.ctakes.core.pipeline.PiperFileRunner -p
> C:\Users\Ryan\SkyDrive\Desktop\Piper_File.piper -i
> C:\Users\Ryan\SkyDrive\Desktop\Input_Folder --writeXmis
> C:\Users\Ryan\SkyDrive\Desktop\Output_Folder
>
> I receive the following error in Command Prompt:
> Error: Could not find or load main class
> org.apache.ctakes.core.pipeline.PiperFileRunner
>
> I am probably missing something. Just not sure what exactly. I'm not too
> familiar with Java. The documentation I have been reading hasn't been as
> helpful since cTAKES is a much more complex project than the simple
> examples they provide.
>
> Lastly, I am using Windows 10.
>
> Thank You,
>
> Ryan Young
> MD/MBA Candidate
> Jacobs School of Medicine & Biomedical Sciences
>
> On Mon, Mar 23, 2020 at 3:28 PM Ryan Young <ro...@buffalo.edu> wrote:
>
> > Hello Sean,
> >
> > Wow! This was a lot more than I was anticipating! Thank you very much!
> >
> > To answer your questions...
> > • I am using Windows 10
> > • I have the Python script call a shell command to run a batch file. The
> > batch file just contains the following line:
> > "C:\cTAKES_4.0.0\bin\runPiperFile.bat"  -p "C:\path\to\piper.piper"
> > • The Python script waits for the shell command to complete (i.e., when
> > cTAKES is finished processing)
> > • The Python script will then parse the output text files and then
> > continue on with the code
> >
> > Prior to calling cTAKES, the surgery list is in a Pandas dataframe. The
> > workaround I had created was to save each line of the surgery list column
> > in the dataframe to a different text file to make it easier for when I
> had
> > to parse the output cTAKES text file. As I had mentioned previously, I
> > would like to have just 1 input text file and 1 output text file (as long
> > as the output file can be easily parsed by Python).
> >
> > Regarding my coding background, I don't have much background in Java.
> > However, a few years ago, I had no knowledge of Python either, but I was
> > able to teach myself while in medical school.
> >
> > A few more questions for you...
> > 1.) Should I save the code you posted in the following location as a .jar
> > file?
> > C:\cTAKES_4.0.0\lib\SentenceFirstCuiWriter.jar
> >
> > 2.) Should I replace "add CuiLookupLister" with "add
> > SentenceFirstCuiWriter" in the piper file or do I need both?
> >
> > 3.) If the SentenceFirstCuiWriter is unable to find a valid CUI, will it
> > leave a blank, N/A, or NaN value? Having any of these values would
> > definitely help when I have Python parse the output text file. When I
> have
> > Python read the output text file, I would have it delete any dataframe
> rows
> > with NaN or N/A in the CUI column.
> >
> > Thank you very much for your assistance!
> >
> > Ryan Young
> > MD/MBA Candidate
> > Jacobs School of Medicine & Biomedical Sciences
> >
> > On Mon, Mar 23, 2020 at 1:01 PM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> >> Hi Ryan,
> >>
> >> Here is some code for a writer that will do what you want.
> >> To use it, get rid of those first two lines in the piper that I sent
> >> (set, reader).
> >> The default reader will work just fine, and it will allow you to process
> >> multiple surgery lists in on run.
> >>
> >> Then just add SentenceFirstCuiWriter to the end of your piper.
> >>
> >> Sean
> >>
> >>
> >> public class SentenceFirstCuiWriter extends AbstractJCasFileWriter {
> >>
> >>    public void writeFile( final JCas jCas, final String outputDir,
> >>                           final String documentId, final String fileName
> >> ) throws IOException {
> >>       File cuiFile = new File( outputDir, fileName + "_cui.txt" );
> >>       Map<Sentence, Collection<ProcedureMention>> sentenceMap
> >>             = JCasUtil.indexCovered( jCas, Sentence.class,
> >> ProcedureMention.class );
> >>       List<Collection<ProcedureMention>> sortedSentenceProcedures
> >>             = sentenceMap.entrySet()
> >>                          .stream()
> >>                          .sorted( Map.Entry.comparingByKey(
> >> DefaultAspanComparator.INSTANCE ) )
> >>                          .map( Map.Entry::getValue )
> >>                          .collect( Collectors.toList() );
> >>       try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile
> )
> >> ) ) {
> >>          for ( Collection<ProcedureMention> procedures :
> >> sortedSentenceProcedures ) {
> >>             ProcedureMention firstProcedure
> >>                   = procedures.stream()
> >>                               .min( Comparator.comparingInt(
> >> ProcedureMention::getBegin ) )
> >>                               .orElse( null );
> >>             if ( firstProcedure != null ) {
> >>                String cui
> >>                      = OntologyConceptUtil.getCuis( firstProcedure )
> >>                                           .stream()
> >>                                           .findFirst()
> >>                                           .orElse( "" );
> >>                if ( !cui.isEmpty() ) {
> >>                   writer.write( cui + "\n" );
> >>                }
> >>             }
> >>          }
> >>       }
> >>    }
> >> }
> >>
> >> ________________________________________
> >> From: Ryan Young <ro...@buffalo.edu>
> >> Sent: Monday, March 23, 2020 11:02 AM
> >> To: dev@ctakes.apache.org
> >> Subject: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code
> >> (CUI) [EXTERNAL]
> >>
> >> * External Email - Caution *
> >>
> >>
> >> Hello,
> >>
> >> I am a medical student who happened to come across cTAKES for a project
> I
> >> am working on. What I would like to do is take a list of surgeries in a
> >> text file and have cTAKES output what it determines to be the best UMLS
> >> code (CUI) for that particular line.
> >>
> >> Each line of the text file is independent of the others (i.e., each line
> >> should be read and analyzed separately). For example, here's my list of
> >> the
> >> surgeries (Surgery_List.txt):
> >> Colonoscopy with Polypectomy
> >> Esophagogastroduodenoscopy Colonoscopy
> >> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
> >> aspiration
> >>
> >> When I run the piper file (see below), I get the following output:
> >> Colonoscopy with Polypectomy
> >> "Colonoscopy"
> >>   Procedure
> >>   C0009378 colonoscopy
> >> "Polypectomy"
> >>   Procedure
> >>   C0521210 Resection of polyp
> >>
> >> Esophagogastroduodenoscopy Colonoscopy
> >> "Esophagogastroduodenoscopy"
> >>   Procedure
> >>   C0079304 Esophagogastroduodenoscopy
> >> "Colonoscopy"
> >>   Procedure
> >>   C0009378 colonoscopy
> >>
> >> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
> >> aspiration
> >> "Esophagogastroduodenoscopy"
> >>   Procedure
> >>   C0079304 Esophagogastroduodenoscopy
> >> "Endoscopic ultrasound"
> >>   Procedure
> >>   C0376443 Endoscopic Ultrasound
> >> "Endoscopic"
> >>   Procedure
> >>   C0014245 Endoscopy (procedure)
> >> "ultrasound"
> >>   Procedure
> >>   C0041618 Ultrasonography
> >> "Fine needle aspiration"
> >>   Procedure
> >>   C1510483 Fine needle aspiration biopsy
> >> "aspiration"
> >>   Procedure
> >>   C0349707 Aspiration-action
> >>
> >> Here's the piper file I have been using:
> >> reader org.apache.ctakes.core.cr.FileTreeReader
> >> InputDirectory="C:\path\to\input\folder"
> >> load DefaultTokenizerPipeline.piper
> >>
> >>
> SentenceModelFile=C:\cTAKES_4.0.0\desc\ctakes-core\desc\analysis_engine\SentenceDetectorAnnotatorBIO.xml
> >> add ContextDependentTokenizerAnnotator
> >> add org.apache.ctakes.necontexts.ContextAnnotator
> >> addDescription POSTagger
> >> load ChunkerSubPipe.piper
> >> set ctakes.umlsuser=my_username ctakes.umlspw=my_password
> >> add org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator
> >>
> >>
> DictionaryDescriptor=C:\cTAKES_4.0.0\desc\ctakes-dictionary-lookup-fast\desc\analysis_engine\UmlsLookupAnnotator.xml
> >>
> >>
> LookupXml=C:\cTAKES_4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab.xml
> >> add property.plaintext.PropertyTextWriterFit
> >> OutputDirectory="C:\path\to\output\folder"
> >>
> >> The workaround I have developed is as follows...
> >> 1.) Save each line of Surgery_List.txt to separate text files
> >> 2.) Use a Python script to parse each individual text file to extract
> the
> >> first UMLS code (CUI) given in the text file
> >>
> >> The above method works fine when there's only 10 lines, but not so well
> >> when there's 40,000 lines in Surgery_List.txt.
> >>
> >> Ideally, I would like for Fast Dictionary Lookup to just return the top
> >> result for each line of Surgery_List.txt. For example, Output.txt would
> >> look just like this:
> >> C0009378
> >> C0079304
> >> C0079304
> >>
> >> Just for reference here's how UMLS codes correspond between
> >> Surgery_List.txt and Output.txt:
> >> C0009378 --> Colonoscopy with Polypectomy
> >> C0079304 --> Esophagogastroduodenoscopy Colonoscopy
> >> C0079304 --> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine
> >> needle aspiration
> >>
> >> Is there something I can add to the piper file to make this happen?
> >>
> >> Currently, I have the cTAKES user version installed, but I could install
> >> the developer version if need be. I would just need a little guidance on
> >> which Java script I would need to modify to get the desired results.
> >>
> >> Thank You,
> >>
> >> Ryan Young
> >> MD/MBA Candidate
> >> Jacobs School of Medicine & Biomedical Sciences
> >>
> >
>

Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Ryan,

You made some excellent progress.  ctakes is a little complicated for new users - especially anybody that isn't familiar with Java.

Since you are going to be running from a  command line (via python) and have already done so successfully, we can just try to get you set up to repeat that process.

In Eclipse, you should be able to run the maven "package" configuration.

That will compile and build an installation similar to what you were using before.

After you execute maven package, 
open the directory ctakes-distribution/target/
There should be a .zip file named apache-ctakes-4.0.1-SNAPSHOT-bin
That zip file contains a ctakes installation for Windows.
Unzip the installation wherever you like - preferably without spaces in directory names.

You should be able to treat this new installation just like you did the one downloaded from the ctakes website.

Before you do all of that ...  We should change a couple of things in that SentenceFirstCuiWriter to output blanks where procedures or cuis are not discovered for your snippets.


>> public class SentenceFirstCuiWriter extends AbstractJCasFileWriter {
>>
>>    public void writeFile( final JCas jCas, final String outputDir,
>>                           final String documentId, final String fileName
>> ) throws IOException {
>>       File cuiFile = new File( outputDir, fileName + "_cui.txt" );
>>       Map<Sentence, Collection<ProcedureMention>> sentenceMap
>>             = JCasUtil.indexCovered( jCas, Sentence.class,
>> ProcedureMention.class );
>>       List<Collection<ProcedureMention>> sortedSentenceProcedures
>>             = sentenceMap.entrySet()
>>                          .stream()
>>                          .sorted( Map.Entry.comparingByKey(
>> DefaultAspanComparator.INSTANCE ) )
>>                          .map( Map.Entry::getValue )
>>                          .collect( Collectors.toList() );
>>       try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile )
>> ) ) {
>>          for ( Collection<ProcedureMention> procedures :
>> sortedSentenceProcedures ) {
>>             ProcedureMention firstProcedure
>>                   = procedures.stream()
>>                               .min( Comparator.comparingInt(
>> ProcedureMention::getBegin ) )
>>                               .orElse( null );
>>             if ( firstProcedure != null ) {

---------- Change the above line to

if ( firstProcedure == null ) {
   writer.write( "\n" );
} else {

>>                String cui
>>                      = OntologyConceptUtil.getCuis( firstProcedure )
>>                                           .stream()
>>                                           .findFirst()
>>                                           .orElse( "" );
>>                if ( !cui.isEmpty() ) {

--------- Change the above line to

if ( cuis.isEmpty() ) {
   writer.write( "\n" );
} else {

>>                   writer.write( cui + "\n" );
>>                }
>>             }
>>          }
>>       }
>>    }
>> }


So, after
1.  Editing the SentenceFirstCuiWriter
2.  Running the maven package step
3.  Unzipping your ctakes installation

You should be able to
1.  Run ctakes from command line like you did before
2.  Use the custom piper file
3.  Resolve the firstly-discovered procedure for a snippet on each line
4.  Write file(s) with corresponding line-by-line cuis or empty lines where none are resolved

Let me know if I missed anything.

Sean

________________________________________
From: Ryan Young <ro...@buffalo.edu>
Sent: Monday, March 30, 2020 9:44 PM
To: dev@ctakes.apache.org
Subject: Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL]

* External Email - Caution *


Hello Sean,

I have run into some difficulty actually running the script you wrote
(SentenceFirstCuiWriter.java). I spent the last week doing the following:
1.) Installed cTAKES developer version using Eclipse IDE
2.) Added the appropriate import statements at the beginning of
SentenceFirstCuiWriter.java
3.) Placed SentenceFirstCuiWriter.java in this directory:
C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\cc
4.) Successfully built and compiled cTAKES developer version
5.) Successfully run the test configurations which were already in cTAKES
in Eclipse (Run --> Run As --> Maven test)

My main question is how do I run the cTAKES developer version from command
line without running Eclipse or Maven?

I found a post you made last year (
https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Ddev_201907.mbox_-253C1563805239741.31947-2540childrens.harvard.edu-253E&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ilUJmT8axx_RhXR_47XCxeR_aqswpoVXkSF5HQAxASQ&s=dxIE3QRB6OI1CxljCVx7K9Lgih-ymSq-wou0LqCvkvk&e= ).
You stated, *"You can put PipelineBuilder in any main(..) method and then
start that main(..) from a command line just as you would any other java
program.  Just like any other java program, you need to have your
$CLASSPATH set correctly and, for memory use, increase your maximum memory
with -Xmx .  These are VM options."*

I think this is what I have to do. But, I am unsure of how to accomplish
this exactly. What I have tried already is:
1.) Launch Command Prompt
2.) Change directory to where PipelineBuilder.java is located
cd
C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\pipeline\PipelineBuilder.java
3.) Enter the following into Command Prompt
java org.apache.ctakes.core.pipeline.PiperFileRunner -p
C:\Users\Ryan\SkyDrive\Desktop\Piper_File.piper -i
C:\Users\Ryan\SkyDrive\Desktop\Input_Folder --writeXmis
C:\Users\Ryan\SkyDrive\Desktop\Output_Folder

I receive the following error in Command Prompt:
Error: Could not find or load main class
org.apache.ctakes.core.pipeline.PiperFileRunner

I am probably missing something. Just not sure what exactly. I'm not too
familiar with Java. The documentation I have been reading hasn't been as
helpful since cTAKES is a much more complex project than the simple
examples they provide.

Lastly, I am using Windows 10.

Thank You,

Ryan Young
MD/MBA Candidate
Jacobs School of Medicine & Biomedical Sciences

On Mon, Mar 23, 2020 at 3:28 PM Ryan Young <ro...@buffalo.edu> wrote:

> Hello Sean,
>
> Wow! This was a lot more than I was anticipating! Thank you very much!
>
> To answer your questions...
> • I am using Windows 10
> • I have the Python script call a shell command to run a batch file. The
> batch file just contains the following line:
> "C:\cTAKES_4.0.0\bin\runPiperFile.bat"  -p "C:\path\to\piper.piper"
> • The Python script waits for the shell command to complete (i.e., when
> cTAKES is finished processing)
> • The Python script will then parse the output text files and then
> continue on with the code
>
> Prior to calling cTAKES, the surgery list is in a Pandas dataframe. The
> workaround I had created was to save each line of the surgery list column
> in the dataframe to a different text file to make it easier for when I had
> to parse the output cTAKES text file. As I had mentioned previously, I
> would like to have just 1 input text file and 1 output text file (as long
> as the output file can be easily parsed by Python).
>
> Regarding my coding background, I don't have much background in Java.
> However, a few years ago, I had no knowledge of Python either, but I was
> able to teach myself while in medical school.
>
> A few more questions for you...
> 1.) Should I save the code you posted in the following location as a .jar
> file?
> C:\cTAKES_4.0.0\lib\SentenceFirstCuiWriter.jar
>
> 2.) Should I replace "add CuiLookupLister" with "add
> SentenceFirstCuiWriter" in the piper file or do I need both?
>
> 3.) If the SentenceFirstCuiWriter is unable to find a valid CUI, will it
> leave a blank, N/A, or NaN value? Having any of these values would
> definitely help when I have Python parse the output text file. When I have
> Python read the output text file, I would have it delete any dataframe rows
> with NaN or N/A in the CUI column.
>
> Thank you very much for your assistance!
>
> Ryan Young
> MD/MBA Candidate
> Jacobs School of Medicine & Biomedical Sciences
>
> On Mon, Mar 23, 2020 at 1:01 PM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
>> Hi Ryan,
>>
>> Here is some code for a writer that will do what you want.
>> To use it, get rid of those first two lines in the piper that I sent
>> (set, reader).
>> The default reader will work just fine, and it will allow you to process
>> multiple surgery lists in on run.
>>
>> Then just add SentenceFirstCuiWriter to the end of your piper.
>>
>> Sean
>>
>>
>> public class SentenceFirstCuiWriter extends AbstractJCasFileWriter {
>>
>>    public void writeFile( final JCas jCas, final String outputDir,
>>                           final String documentId, final String fileName
>> ) throws IOException {
>>       File cuiFile = new File( outputDir, fileName + "_cui.txt" );
>>       Map<Sentence, Collection<ProcedureMention>> sentenceMap
>>             = JCasUtil.indexCovered( jCas, Sentence.class,
>> ProcedureMention.class );
>>       List<Collection<ProcedureMention>> sortedSentenceProcedures
>>             = sentenceMap.entrySet()
>>                          .stream()
>>                          .sorted( Map.Entry.comparingByKey(
>> DefaultAspanComparator.INSTANCE ) )
>>                          .map( Map.Entry::getValue )
>>                          .collect( Collectors.toList() );
>>       try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile )
>> ) ) {
>>          for ( Collection<ProcedureMention> procedures :
>> sortedSentenceProcedures ) {
>>             ProcedureMention firstProcedure
>>                   = procedures.stream()
>>                               .min( Comparator.comparingInt(
>> ProcedureMention::getBegin ) )
>>                               .orElse( null );
>>             if ( firstProcedure != null ) {
>>                String cui
>>                      = OntologyConceptUtil.getCuis( firstProcedure )
>>                                           .stream()
>>                                           .findFirst()
>>                                           .orElse( "" );
>>                if ( !cui.isEmpty() ) {
>>                   writer.write( cui + "\n" );
>>                }
>>             }
>>          }
>>       }
>>    }
>> }
>>
>> ________________________________________
>> From: Ryan Young <ro...@buffalo.edu>
>> Sent: Monday, March 23, 2020 11:02 AM
>> To: dev@ctakes.apache.org
>> Subject: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code
>> (CUI) [EXTERNAL]
>>
>> * External Email - Caution *
>>
>>
>> Hello,
>>
>> I am a medical student who happened to come across cTAKES for a project I
>> am working on. What I would like to do is take a list of surgeries in a
>> text file and have cTAKES output what it determines to be the best UMLS
>> code (CUI) for that particular line.
>>
>> Each line of the text file is independent of the others (i.e., each line
>> should be read and analyzed separately). For example, here's my list of
>> the
>> surgeries (Surgery_List.txt):
>> Colonoscopy with Polypectomy
>> Esophagogastroduodenoscopy Colonoscopy
>> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
>> aspiration
>>
>> When I run the piper file (see below), I get the following output:
>> Colonoscopy with Polypectomy
>> "Colonoscopy"
>>   Procedure
>>   C0009378 colonoscopy
>> "Polypectomy"
>>   Procedure
>>   C0521210 Resection of polyp
>>
>> Esophagogastroduodenoscopy Colonoscopy
>> "Esophagogastroduodenoscopy"
>>   Procedure
>>   C0079304 Esophagogastroduodenoscopy
>> "Colonoscopy"
>>   Procedure
>>   C0009378 colonoscopy
>>
>> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
>> aspiration
>> "Esophagogastroduodenoscopy"
>>   Procedure
>>   C0079304 Esophagogastroduodenoscopy
>> "Endoscopic ultrasound"
>>   Procedure
>>   C0376443 Endoscopic Ultrasound
>> "Endoscopic"
>>   Procedure
>>   C0014245 Endoscopy (procedure)
>> "ultrasound"
>>   Procedure
>>   C0041618 Ultrasonography
>> "Fine needle aspiration"
>>   Procedure
>>   C1510483 Fine needle aspiration biopsy
>> "aspiration"
>>   Procedure
>>   C0349707 Aspiration-action
>>
>> Here's the piper file I have been using:
>> reader org.apache.ctakes.core.cr.FileTreeReader
>> InputDirectory="C:\path\to\input\folder"
>> load DefaultTokenizerPipeline.piper
>>
>> SentenceModelFile=C:\cTAKES_4.0.0\desc\ctakes-core\desc\analysis_engine\SentenceDetectorAnnotatorBIO.xml
>> add ContextDependentTokenizerAnnotator
>> add org.apache.ctakes.necontexts.ContextAnnotator
>> addDescription POSTagger
>> load ChunkerSubPipe.piper
>> set ctakes.umlsuser=my_username ctakes.umlspw=my_password
>> add org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator
>>
>> DictionaryDescriptor=C:\cTAKES_4.0.0\desc\ctakes-dictionary-lookup-fast\desc\analysis_engine\UmlsLookupAnnotator.xml
>>
>> LookupXml=C:\cTAKES_4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab.xml
>> add property.plaintext.PropertyTextWriterFit
>> OutputDirectory="C:\path\to\output\folder"
>>
>> The workaround I have developed is as follows...
>> 1.) Save each line of Surgery_List.txt to separate text files
>> 2.) Use a Python script to parse each individual text file to extract the
>> first UMLS code (CUI) given in the text file
>>
>> The above method works fine when there's only 10 lines, but not so well
>> when there's 40,000 lines in Surgery_List.txt.
>>
>> Ideally, I would like for Fast Dictionary Lookup to just return the top
>> result for each line of Surgery_List.txt. For example, Output.txt would
>> look just like this:
>> C0009378
>> C0079304
>> C0079304
>>
>> Just for reference here's how UMLS codes correspond between
>> Surgery_List.txt and Output.txt:
>> C0009378 --> Colonoscopy with Polypectomy
>> C0079304 --> Esophagogastroduodenoscopy Colonoscopy
>> C0079304 --> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine
>> needle aspiration
>>
>> Is there something I can add to the piper file to make this happen?
>>
>> Currently, I have the cTAKES user version installed, but I could install
>> the developer version if need be. I would just need a little guidance on
>> which Java script I would need to modify to get the desired results.
>>
>> Thank You,
>>
>> Ryan Young
>> MD/MBA Candidate
>> Jacobs School of Medicine & Biomedical Sciences
>>
>

Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL]

Posted by Ryan Young <ro...@buffalo.edu>.
Hello Sean,

I have run into some difficulty actually running the script you wrote
(SentenceFirstCuiWriter.java). I spent the last week doing the following:
1.) Installed cTAKES developer version using Eclipse IDE
2.) Added the appropriate import statements at the beginning of
SentenceFirstCuiWriter.java
3.) Placed SentenceFirstCuiWriter.java in this directory:
C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\cc
4.) Successfully built and compiled cTAKES developer version
5.) Successfully run the test configurations which were already in cTAKES
in Eclipse (Run --> Run As --> Maven test)

My main question is how do I run the cTAKES developer version from command
line without running Eclipse or Maven?

I found a post you made last year (
http://mail-archives.apache.org/mod_mbox/ctakes-dev/201907.mbox/%3C1563805239741.31947%40childrens.harvard.edu%3E).
You stated, *"You can put PipelineBuilder in any main(..) method and then
start that main(..) from a command line just as you would any other java
program.  Just like any other java program, you need to have your
$CLASSPATH set correctly and, for memory use, increase your maximum memory
with -Xmx .  These are VM options."*

I think this is what I have to do. But, I am unsure of how to accomplish
this exactly. What I have tried already is:
1.) Launch Command Prompt
2.) Change directory to where PipelineBuilder.java is located
cd
C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\pipeline\PipelineBuilder.java
3.) Enter the following into Command Prompt
java org.apache.ctakes.core.pipeline.PiperFileRunner -p
C:\Users\Ryan\SkyDrive\Desktop\Piper_File.piper -i
C:\Users\Ryan\SkyDrive\Desktop\Input_Folder --writeXmis
C:\Users\Ryan\SkyDrive\Desktop\Output_Folder

I receive the following error in Command Prompt:
Error: Could not find or load main class
org.apache.ctakes.core.pipeline.PiperFileRunner

I am probably missing something. Just not sure what exactly. I'm not too
familiar with Java. The documentation I have been reading hasn't been as
helpful since cTAKES is a much more complex project than the simple
examples they provide.

Lastly, I am using Windows 10.

Thank You,

Ryan Young
MD/MBA Candidate
Jacobs School of Medicine & Biomedical Sciences

On Mon, Mar 23, 2020 at 3:28 PM Ryan Young <ro...@buffalo.edu> wrote:

> Hello Sean,
>
> Wow! This was a lot more than I was anticipating! Thank you very much!
>
> To answer your questions...
> • I am using Windows 10
> • I have the Python script call a shell command to run a batch file. The
> batch file just contains the following line:
> "C:\cTAKES_4.0.0\bin\runPiperFile.bat"  -p "C:\path\to\piper.piper"
> • The Python script waits for the shell command to complete (i.e., when
> cTAKES is finished processing)
> • The Python script will then parse the output text files and then
> continue on with the code
>
> Prior to calling cTAKES, the surgery list is in a Pandas dataframe. The
> workaround I had created was to save each line of the surgery list column
> in the dataframe to a different text file to make it easier for when I had
> to parse the output cTAKES text file. As I had mentioned previously, I
> would like to have just 1 input text file and 1 output text file (as long
> as the output file can be easily parsed by Python).
>
> Regarding my coding background, I don't have much background in Java.
> However, a few years ago, I had no knowledge of Python either, but I was
> able to teach myself while in medical school.
>
> A few more questions for you...
> 1.) Should I save the code you posted in the following location as a .jar
> file?
> C:\cTAKES_4.0.0\lib\SentenceFirstCuiWriter.jar
>
> 2.) Should I replace "add CuiLookupLister" with "add
> SentenceFirstCuiWriter" in the piper file or do I need both?
>
> 3.) If the SentenceFirstCuiWriter is unable to find a valid CUI, will it
> leave a blank, N/A, or NaN value? Having any of these values would
> definitely help when I have Python parse the output text file. When I have
> Python read the output text file, I would have it delete any dataframe rows
> with NaN or N/A in the CUI column.
>
> Thank you very much for your assistance!
>
> Ryan Young
> MD/MBA Candidate
> Jacobs School of Medicine & Biomedical Sciences
>
> On Mon, Mar 23, 2020 at 1:01 PM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
>> Hi Ryan,
>>
>> Here is some code for a writer that will do what you want.
>> To use it, get rid of those first two lines in the piper that I sent
>> (set, reader).
>> The default reader will work just fine, and it will allow you to process
>> multiple surgery lists in on run.
>>
>> Then just add SentenceFirstCuiWriter to the end of your piper.
>>
>> Sean
>>
>>
>> public class SentenceFirstCuiWriter extends AbstractJCasFileWriter {
>>
>>    public void writeFile( final JCas jCas, final String outputDir,
>>                           final String documentId, final String fileName
>> ) throws IOException {
>>       File cuiFile = new File( outputDir, fileName + "_cui.txt" );
>>       Map<Sentence, Collection<ProcedureMention>> sentenceMap
>>             = JCasUtil.indexCovered( jCas, Sentence.class,
>> ProcedureMention.class );
>>       List<Collection<ProcedureMention>> sortedSentenceProcedures
>>             = sentenceMap.entrySet()
>>                          .stream()
>>                          .sorted( Map.Entry.comparingByKey(
>> DefaultAspanComparator.INSTANCE ) )
>>                          .map( Map.Entry::getValue )
>>                          .collect( Collectors.toList() );
>>       try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile )
>> ) ) {
>>          for ( Collection<ProcedureMention> procedures :
>> sortedSentenceProcedures ) {
>>             ProcedureMention firstProcedure
>>                   = procedures.stream()
>>                               .min( Comparator.comparingInt(
>> ProcedureMention::getBegin ) )
>>                               .orElse( null );
>>             if ( firstProcedure != null ) {
>>                String cui
>>                      = OntologyConceptUtil.getCuis( firstProcedure )
>>                                           .stream()
>>                                           .findFirst()
>>                                           .orElse( "" );
>>                if ( !cui.isEmpty() ) {
>>                   writer.write( cui + "\n" );
>>                }
>>             }
>>          }
>>       }
>>    }
>> }
>>
>> ________________________________________
>> From: Ryan Young <ro...@buffalo.edu>
>> Sent: Monday, March 23, 2020 11:02 AM
>> To: dev@ctakes.apache.org
>> Subject: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code
>> (CUI) [EXTERNAL]
>>
>> * External Email - Caution *
>>
>>
>> Hello,
>>
>> I am a medical student who happened to come across cTAKES for a project I
>> am working on. What I would like to do is take a list of surgeries in a
>> text file and have cTAKES output what it determines to be the best UMLS
>> code (CUI) for that particular line.
>>
>> Each line of the text file is independent of the others (i.e., each line
>> should be read and analyzed separately). For example, here's my list of
>> the
>> surgeries (Surgery_List.txt):
>> Colonoscopy with Polypectomy
>> Esophagogastroduodenoscopy Colonoscopy
>> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
>> aspiration
>>
>> When I run the piper file (see below), I get the following output:
>> Colonoscopy with Polypectomy
>> "Colonoscopy"
>>   Procedure
>>   C0009378 colonoscopy
>> "Polypectomy"
>>   Procedure
>>   C0521210 Resection of polyp
>>
>> Esophagogastroduodenoscopy Colonoscopy
>> "Esophagogastroduodenoscopy"
>>   Procedure
>>   C0079304 Esophagogastroduodenoscopy
>> "Colonoscopy"
>>   Procedure
>>   C0009378 colonoscopy
>>
>> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
>> aspiration
>> "Esophagogastroduodenoscopy"
>>   Procedure
>>   C0079304 Esophagogastroduodenoscopy
>> "Endoscopic ultrasound"
>>   Procedure
>>   C0376443 Endoscopic Ultrasound
>> "Endoscopic"
>>   Procedure
>>   C0014245 Endoscopy (procedure)
>> "ultrasound"
>>   Procedure
>>   C0041618 Ultrasonography
>> "Fine needle aspiration"
>>   Procedure
>>   C1510483 Fine needle aspiration biopsy
>> "aspiration"
>>   Procedure
>>   C0349707 Aspiration-action
>>
>> Here's the piper file I have been using:
>> reader org.apache.ctakes.core.cr.FileTreeReader
>> InputDirectory="C:\path\to\input\folder"
>> load DefaultTokenizerPipeline.piper
>>
>> SentenceModelFile=C:\cTAKES_4.0.0\desc\ctakes-core\desc\analysis_engine\SentenceDetectorAnnotatorBIO.xml
>> add ContextDependentTokenizerAnnotator
>> add org.apache.ctakes.necontexts.ContextAnnotator
>> addDescription POSTagger
>> load ChunkerSubPipe.piper
>> set ctakes.umlsuser=my_username ctakes.umlspw=my_password
>> add org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator
>>
>> DictionaryDescriptor=C:\cTAKES_4.0.0\desc\ctakes-dictionary-lookup-fast\desc\analysis_engine\UmlsLookupAnnotator.xml
>>
>> LookupXml=C:\cTAKES_4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab.xml
>> add property.plaintext.PropertyTextWriterFit
>> OutputDirectory="C:\path\to\output\folder"
>>
>> The workaround I have developed is as follows...
>> 1.) Save each line of Surgery_List.txt to separate text files
>> 2.) Use a Python script to parse each individual text file to extract the
>> first UMLS code (CUI) given in the text file
>>
>> The above method works fine when there's only 10 lines, but not so well
>> when there's 40,000 lines in Surgery_List.txt.
>>
>> Ideally, I would like for Fast Dictionary Lookup to just return the top
>> result for each line of Surgery_List.txt. For example, Output.txt would
>> look just like this:
>> C0009378
>> C0079304
>> C0079304
>>
>> Just for reference here's how UMLS codes correspond between
>> Surgery_List.txt and Output.txt:
>> C0009378 --> Colonoscopy with Polypectomy
>> C0079304 --> Esophagogastroduodenoscopy Colonoscopy
>> C0079304 --> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine
>> needle aspiration
>>
>> Is there something I can add to the piper file to make this happen?
>>
>> Currently, I have the cTAKES user version installed, but I could install
>> the developer version if need be. I would just need a little guidance on
>> which Java script I would need to modify to get the desired results.
>>
>> Thank You,
>>
>> Ryan Young
>> MD/MBA Candidate
>> Jacobs School of Medicine & Biomedical Sciences
>>
>

Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL]

Posted by Ryan Young <ro...@buffalo.edu>.
Hello Sean,

Wow! This was a lot more than I was anticipating! Thank you very much!

To answer your questions...
• I am using Windows 10
• I have the Python script call a shell command to run a batch file. The
batch file just contains the following line:
"C:\cTAKES_4.0.0\bin\runPiperFile.bat"  -p "C:\path\to\piper.piper"
• The Python script waits for the shell command to complete (i.e., when
cTAKES is finished processing)
• The Python script will then parse the output text files and then continue
on with the code

Prior to calling cTAKES, the surgery list is in a Pandas dataframe. The
workaround I had created was to save each line of the surgery list column
in the dataframe to a different text file to make it easier for when I had
to parse the output cTAKES text file. As I had mentioned previously, I
would like to have just 1 input text file and 1 output text file (as long
as the output file can be easily parsed by Python).

Regarding my coding background, I don't have much background in Java.
However, a few years ago, I had no knowledge of Python either, but I was
able to teach myself while in medical school.

A few more questions for you...
1.) Should I save the code you posted in the following location as a .jar
file?
C:\cTAKES_4.0.0\lib\SentenceFirstCuiWriter.jar

2.) Should I replace "add CuiLookupLister" with "add
SentenceFirstCuiWriter" in the piper file or do I need both?

3.) If the SentenceFirstCuiWriter is unable to find a valid CUI, will it
leave a blank, N/A, or NaN value? Having any of these values would
definitely help when I have Python parse the output text file. When I have
Python read the output text file, I would have it delete any dataframe rows
with NaN or N/A in the CUI column.

Thank you very much for your assistance!

Ryan Young
MD/MBA Candidate
Jacobs School of Medicine & Biomedical Sciences

On Mon, Mar 23, 2020 at 1:01 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Ryan,
>
> Here is some code for a writer that will do what you want.
> To use it, get rid of those first two lines in the piper that I sent (set,
> reader).
> The default reader will work just fine, and it will allow you to process
> multiple surgery lists in on run.
>
> Then just add SentenceFirstCuiWriter to the end of your piper.
>
> Sean
>
>
> public class SentenceFirstCuiWriter extends AbstractJCasFileWriter {
>
>    public void writeFile( final JCas jCas, final String outputDir,
>                           final String documentId, final String fileName )
> throws IOException {
>       File cuiFile = new File( outputDir, fileName + "_cui.txt" );
>       Map<Sentence, Collection<ProcedureMention>> sentenceMap
>             = JCasUtil.indexCovered( jCas, Sentence.class,
> ProcedureMention.class );
>       List<Collection<ProcedureMention>> sortedSentenceProcedures
>             = sentenceMap.entrySet()
>                          .stream()
>                          .sorted( Map.Entry.comparingByKey(
> DefaultAspanComparator.INSTANCE ) )
>                          .map( Map.Entry::getValue )
>                          .collect( Collectors.toList() );
>       try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile )
> ) ) {
>          for ( Collection<ProcedureMention> procedures :
> sortedSentenceProcedures ) {
>             ProcedureMention firstProcedure
>                   = procedures.stream()
>                               .min( Comparator.comparingInt(
> ProcedureMention::getBegin ) )
>                               .orElse( null );
>             if ( firstProcedure != null ) {
>                String cui
>                      = OntologyConceptUtil.getCuis( firstProcedure )
>                                           .stream()
>                                           .findFirst()
>                                           .orElse( "" );
>                if ( !cui.isEmpty() ) {
>                   writer.write( cui + "\n" );
>                }
>             }
>          }
>       }
>    }
> }
>
> ________________________________________
> From: Ryan Young <ro...@buffalo.edu>
> Sent: Monday, March 23, 2020 11:02 AM
> To: dev@ctakes.apache.org
> Subject: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI)
> [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hello,
>
> I am a medical student who happened to come across cTAKES for a project I
> am working on. What I would like to do is take a list of surgeries in a
> text file and have cTAKES output what it determines to be the best UMLS
> code (CUI) for that particular line.
>
> Each line of the text file is independent of the others (i.e., each line
> should be read and analyzed separately). For example, here's my list of the
> surgeries (Surgery_List.txt):
> Colonoscopy with Polypectomy
> Esophagogastroduodenoscopy Colonoscopy
> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
> aspiration
>
> When I run the piper file (see below), I get the following output:
> Colonoscopy with Polypectomy
> "Colonoscopy"
>   Procedure
>   C0009378 colonoscopy
> "Polypectomy"
>   Procedure
>   C0521210 Resection of polyp
>
> Esophagogastroduodenoscopy Colonoscopy
> "Esophagogastroduodenoscopy"
>   Procedure
>   C0079304 Esophagogastroduodenoscopy
> "Colonoscopy"
>   Procedure
>   C0009378 colonoscopy
>
> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle
> aspiration
> "Esophagogastroduodenoscopy"
>   Procedure
>   C0079304 Esophagogastroduodenoscopy
> "Endoscopic ultrasound"
>   Procedure
>   C0376443 Endoscopic Ultrasound
> "Endoscopic"
>   Procedure
>   C0014245 Endoscopy (procedure)
> "ultrasound"
>   Procedure
>   C0041618 Ultrasonography
> "Fine needle aspiration"
>   Procedure
>   C1510483 Fine needle aspiration biopsy
> "aspiration"
>   Procedure
>   C0349707 Aspiration-action
>
> Here's the piper file I have been using:
> reader org.apache.ctakes.core.cr.FileTreeReader
> InputDirectory="C:\path\to\input\folder"
> load DefaultTokenizerPipeline.piper
>
> SentenceModelFile=C:\cTAKES_4.0.0\desc\ctakes-core\desc\analysis_engine\SentenceDetectorAnnotatorBIO.xml
> add ContextDependentTokenizerAnnotator
> add org.apache.ctakes.necontexts.ContextAnnotator
> addDescription POSTagger
> load ChunkerSubPipe.piper
> set ctakes.umlsuser=my_username ctakes.umlspw=my_password
> add org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator
>
> DictionaryDescriptor=C:\cTAKES_4.0.0\desc\ctakes-dictionary-lookup-fast\desc\analysis_engine\UmlsLookupAnnotator.xml
>
> LookupXml=C:\cTAKES_4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab.xml
> add property.plaintext.PropertyTextWriterFit
> OutputDirectory="C:\path\to\output\folder"
>
> The workaround I have developed is as follows...
> 1.) Save each line of Surgery_List.txt to separate text files
> 2.) Use a Python script to parse each individual text file to extract the
> first UMLS code (CUI) given in the text file
>
> The above method works fine when there's only 10 lines, but not so well
> when there's 40,000 lines in Surgery_List.txt.
>
> Ideally, I would like for Fast Dictionary Lookup to just return the top
> result for each line of Surgery_List.txt. For example, Output.txt would
> look just like this:
> C0009378
> C0079304
> C0079304
>
> Just for reference here's how UMLS codes correspond between
> Surgery_List.txt and Output.txt:
> C0009378 --> Colonoscopy with Polypectomy
> C0079304 --> Esophagogastroduodenoscopy Colonoscopy
> C0079304 --> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine
> needle aspiration
>
> Is there something I can add to the piper file to make this happen?
>
> Currently, I have the cTAKES user version installed, but I could install
> the developer version if need be. I would just need a little guidance on
> which Java script I would need to modify to get the desired results.
>
> Thank You,
>
> Ryan Young
> MD/MBA Candidate
> Jacobs School of Medicine & Biomedical Sciences
>

Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Ryan,

Here is some code for a writer that will do what you want.
To use it, get rid of those first two lines in the piper that I sent (set, reader).  
The default reader will work just fine, and it will allow you to process multiple surgery lists in on run.

Then just add SentenceFirstCuiWriter to the end of your piper.

Sean


public class SentenceFirstCuiWriter extends AbstractJCasFileWriter {

   public void writeFile( final JCas jCas, final String outputDir,
                          final String documentId, final String fileName ) throws IOException {
      File cuiFile = new File( outputDir, fileName + "_cui.txt" );
      Map<Sentence, Collection<ProcedureMention>> sentenceMap
            = JCasUtil.indexCovered( jCas, Sentence.class, ProcedureMention.class );
      List<Collection<ProcedureMention>> sortedSentenceProcedures
            = sentenceMap.entrySet()
                         .stream()
                         .sorted( Map.Entry.comparingByKey( DefaultAspanComparator.INSTANCE ) )
                         .map( Map.Entry::getValue )
                         .collect( Collectors.toList() );
      try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile ) ) ) {
         for ( Collection<ProcedureMention> procedures : sortedSentenceProcedures ) {
            ProcedureMention firstProcedure
                  = procedures.stream()
                              .min( Comparator.comparingInt( ProcedureMention::getBegin ) )
                              .orElse( null );
            if ( firstProcedure != null ) {
               String cui
                     = OntologyConceptUtil.getCuis( firstProcedure )
                                          .stream()
                                          .findFirst()
                                          .orElse( "" );
               if ( !cui.isEmpty() ) {
                  writer.write( cui + "\n" );
               }
            }
         }
      }
   }
}

________________________________________
From: Ryan Young <ro...@buffalo.edu>
Sent: Monday, March 23, 2020 11:02 AM
To: dev@ctakes.apache.org
Subject: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL]

* External Email - Caution *


Hello,

I am a medical student who happened to come across cTAKES for a project I
am working on. What I would like to do is take a list of surgeries in a
text file and have cTAKES output what it determines to be the best UMLS
code (CUI) for that particular line.

Each line of the text file is independent of the others (i.e., each line
should be read and analyzed separately). For example, here's my list of the
surgeries (Surgery_List.txt):
Colonoscopy with Polypectomy
Esophagogastroduodenoscopy Colonoscopy
Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration

When I run the piper file (see below), I get the following output:
Colonoscopy with Polypectomy
"Colonoscopy"
  Procedure
  C0009378 colonoscopy
"Polypectomy"
  Procedure
  C0521210 Resection of polyp

Esophagogastroduodenoscopy Colonoscopy
"Esophagogastroduodenoscopy"
  Procedure
  C0079304 Esophagogastroduodenoscopy
"Colonoscopy"
  Procedure
  C0009378 colonoscopy

Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration
"Esophagogastroduodenoscopy"
  Procedure
  C0079304 Esophagogastroduodenoscopy
"Endoscopic ultrasound"
  Procedure
  C0376443 Endoscopic Ultrasound
"Endoscopic"
  Procedure
  C0014245 Endoscopy (procedure)
"ultrasound"
  Procedure
  C0041618 Ultrasonography
"Fine needle aspiration"
  Procedure
  C1510483 Fine needle aspiration biopsy
"aspiration"
  Procedure
  C0349707 Aspiration-action

Here's the piper file I have been using:
reader org.apache.ctakes.core.cr.FileTreeReader
InputDirectory="C:\path\to\input\folder"
load DefaultTokenizerPipeline.piper
SentenceModelFile=C:\cTAKES_4.0.0\desc\ctakes-core\desc\analysis_engine\SentenceDetectorAnnotatorBIO.xml
add ContextDependentTokenizerAnnotator
add org.apache.ctakes.necontexts.ContextAnnotator
addDescription POSTagger
load ChunkerSubPipe.piper
set ctakes.umlsuser=my_username ctakes.umlspw=my_password
add org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator
DictionaryDescriptor=C:\cTAKES_4.0.0\desc\ctakes-dictionary-lookup-fast\desc\analysis_engine\UmlsLookupAnnotator.xml
LookupXml=C:\cTAKES_4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab.xml
add property.plaintext.PropertyTextWriterFit
OutputDirectory="C:\path\to\output\folder"

The workaround I have developed is as follows...
1.) Save each line of Surgery_List.txt to separate text files
2.) Use a Python script to parse each individual text file to extract the
first UMLS code (CUI) given in the text file

The above method works fine when there's only 10 lines, but not so well
when there's 40,000 lines in Surgery_List.txt.

Ideally, I would like for Fast Dictionary Lookup to just return the top
result for each line of Surgery_List.txt. For example, Output.txt would
look just like this:
C0009378
C0079304
C0079304

Just for reference here's how UMLS codes correspond between
Surgery_List.txt and Output.txt:
C0009378 --> Colonoscopy with Polypectomy
C0079304 --> Esophagogastroduodenoscopy Colonoscopy
C0079304 --> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine
needle aspiration

Is there something I can add to the piper file to make this happen?

Currently, I have the cTAKES user version installed, but I could install
the developer version if need be. I would just need a little guidance on
which Java script I would need to modify to get the desired results.

Thank You,

Ryan Young
MD/MBA Candidate
Jacobs School of Medicine & Biomedical Sciences