You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@ctakes.apache.org by bl...@apache.org on 2012/11/13 23:17:03 UTC

svn commit: r1408988 - /incubator/ctakes/site/trunk/content/ctakes/3.0.0/user-guide-3.0.mdtext

Author: bleeker
Date: Tue Nov 13 22:17:02 2012
New Revision: 1408988

URL: http://svn.apache.org/viewvc?rev=1408988&view=rev
Log:
CMS commit to ctakes by bleeker

Modified:
    incubator/ctakes/site/trunk/content/ctakes/3.0.0/user-guide-3.0.mdtext

Modified: incubator/ctakes/site/trunk/content/ctakes/3.0.0/user-guide-3.0.mdtext
URL: http://svn.apache.org/viewvc/incubator/ctakes/site/trunk/content/ctakes/3.0.0/user-guide-3.0.mdtext?rev=1408988&r1=1408987&r2=1408988&view=diff
==============================================================================
--- incubator/ctakes/site/trunk/content/ctakes/3.0.0/user-guide-3.0.mdtext (original)
+++ incubator/ctakes/site/trunk/content/ctakes/3.0.0/user-guide-3.0.mdtext Tue Nov 13 22:17:02 2012
@@ -17,128 +17,509 @@ Notice:    Licensed to the Apache Softwa
            under the License.
 
 #cTAKES 3.0 User Guide
-These instructions are for end users. With these instructions you can
-install cTAKES, configure it, and use it to process text (typically text
-associated with a medical record). If you were planning to expand,
-change, or modify the code within cTAKES, refer to the [cTAKES 2.5
-Developer Install Instructions][].
-
-These instructions will cover installation and a test of the main
-product including trained models for sentence detection and tagging
-parts of speech, dictionaries from a subset of the UMLS, a very small
-subset of the full LVG resource, etc. Optional components will also be
-described.
-
-Once you have finished installation of cTAKES, you will be able to see
-what cTAKES is capable of. Further exploitation of the software’s
-ability may require following a few additional steps involving what
-dictionaries are being used. These are the last steps in these
-instructions.
-
-Prerequisites
--------------
-
-<div class="table-wrap">
-<table class="confluenceTable">
-<tbody>
-<tr>
-<th class="confluenceTh">
+
+These instructions are for end users. With these instructions you can install
+cTAKES, configure it, and use it to process text (typically text associated
+with a medical record). If you were planning to expand, change, or modify the
+code within cTAKES, refer to the [cTAKES 2.5 Developer Install
+Instructions](/display/VKC/cTAKES+2.5+Developer+Install+Instructions).
+
+These instructions will cover installation and a test of the main product
+including trained models for sentence detection and tagging parts of speech,
+dictionaries from a subset of the UMLS, a very small subset of the full LVG
+resource, etc. Optional components will also be described.
+
+Once you have finished installation of cTAKES, you will be able to see what
+cTAKES is capable of. Further exploitation of the software's ability may
+require following a few additional steps involving what dictionaries are being
+used. These are the last steps in these instructions.
+
+## Prerequisites
+
+Step
+
+Example
+
+1. Make sure you have Java 1.6 or higher. Most systems come with Java already
+installed.
+
+Run this command to check your version.
+
+If you do not you can install Java from
+[java.com](http://www.java.com/en/download/faq/develop.xml).
+
+## Install cTAKES
+
 Step
 
-</th>
-<th class="confluenceTh">
 Example
 
-</th>
-</tr>
-<tr>
-<td class="confluenceTd">
-​1. Make sure you have Java 1.6 or higher. Most systems come with Java
-already installed. \
- Run this command to check your version. \
-
-<div class="code panel" style="border-width: 1px;">
-<div class="codeContent panelContent">
-~~~~ {.theme: .Confluence; .brush: .plain; .gutter: .false
-style="font-size:12px;"}
-java -version
-~~~~
-
-</div>
-</div>
-If you do not you can install Java from [java.com][].
-
-</td>
-<td class="confluenceTd">
-<div class="code panel" style="border-width: 1px;">
-<div class="codeContent panelContent">
-~~~~ {.theme: .Confluence; .brush: .plain; .gutter: .false
-style="font-size:12px;"}
-C:\>java -version
-java version "1.6.0_20"
-Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
-Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing)
-~~~~
-
-</div>
-</div>
-</td>
-</tr>
-</tbody>
-</table>
-</div>
-Install cTAKES
---------------
-
-<div class="table-wrap">
-<table class="confluenceTable">
-<tbody>
-<tr>
-<th class="confluenceTh">
+1. Navigate to the [source downloads for a released
+version](http://sourceforge.net/projects/ohnlp/files/cTAKES) on SourceForge
+
+  
+
+2. Download the **cTAKES-2.5.zip** file.
+
+Save the file to a temporary location on your machine.
+
+![screenshot illustrating step](/download/attachments/75014322/cTAKES-2.5
+-download-progress.jpg?version=1&modificationDate=1336161194000)
+
+3. Unzip (extract the contents of) the compressed file you downloaded into a
+directory that you want to be the cTAKES install location.
+
+For example, **Windows**:
+
+  
+**Linux**:   
+
+  
+This folder we will call **<cTAKES_HOME>**. You will need to refer to the
+directory later.
+
+![screenshot illustrating step](/download/attachments/75014322/cTAKES-2.5-Extr
+acting.jpg?version=1&modificationDate=1336163791000)
+
+|
+
+## Process documents using cTAKES
+
+This version allows you to test most components bundled in cTAKES in two
+different ways:
+
+  1. Using cTAKES CAS Visual Debugger (CVD) to view the results stored as XCAS files or run the annotators or
+  2. Using cTAKES collection processing engine (CPE) to process documents in cTAKES_HOME/testdata directory
+
+### CAS Visual Debugger (CVD)
+
 Step
 
-</th>
-<th class="confluenceTh">
 Example
 
-</th>
-</tr>
-<tr>
-<td class="confluenceTd">
-​1. Navigate to the [source downloads for a released version][] on
-SourceForge
-
-</td>
-<td class="confluenceTd">
-\
-
-</td>
-</tr>
-<tr>
-<td class="confluenceTd">
-​2. Download the **cTAKES-2.5.zip** file. \
- Save the file to a temporary location on your machine.
-
-</td>
-<td class="confluenceTd">
-![screenshot illustrating step][]
-
-</td>
-</tr>
-<tr>
-<td class="confluenceTd">
-​3. Unzip (extract the contents of) the compressed file you downloaded
-into a directory that you want to be the cTAKES install location. \
- For example, **Windows**: \
-
-<div class="code panel" style="border-width: 1px;">
-<div class="codeContent panelContent">
-~~~~ {.theme: .Confluence; .brush: .plain; .gutter: .false
-style="font-size:12px;"}
-c:\cTAKES-2.5
-~~~~
-
-  [cTAKES 2.5 Developer Install Instructions]: cTAKES%2B2.5%2BDeveloper%2BInstall%2BInstructions.html
-  [java.com]: http://www.java.com/en/download/faq/develop.xml
-  [source downloads for a released version]: http://sourceforge.net/projects/ohnlp/files/cTAKES
-  [screenshot illustrating step]: attachments/75014322/76808927.jpg
\ No newline at end of file
+1. Open a command prompt and change to the cTAKES_HOME directory.
+
+**Windows**:   
+
+  
+**Linux**:   
+
+![](/images/icons/emoticons/warning.png)
+
+**Note**  
+
+cTAKES_HOME must be your current directory unless you are skilled at setting
+paths on your machine.
+
+2. Start the CAS Visual Debugger by running this command:
+
+**Windows**:   
+
+  
+**Linux**:   
+
+  
+The application may take a minute to start on slower hardware.
+
+![screenshot illustrating step](/download/attachments/75014322/worddav6ae43931
+208c6ae30f6e836859a2bb19.png?version=1&modificationDate=1334686366000)
+
+3. An analysis engine (AE) needs to be loaded in order to process text.
+
+Use the **Run** -> **Load AE** menu bar command. Navigate to the file
+
+Click **Open**.
+
+![screenshot illustrating step](/download/attachments/75014322/worddav74c75ed2
+8c7c02f2be44fe41d0b65a16.png?version=1&modificationDate=1334686366000)
+
+4. Copy the text in the example at the right (next cell) and paste the
+contents into the Text section of CVD, replacing the text that is already
+there.
+
+This example file can also be found in test data:
+
+Dr. Nutritious
+
+  
+Medical Nutrition Therapy for Hyperlipidemia
+
+  
+Referral from: Julie Tester, RD, LD, CNSD
+
+Phone contact: (555) 555-1212
+
+Height: 144 cm Current Weight: 45 kg Date of current weight: 02-29-2001
+
+Admit Weight: 53 kg BMI: 18 kg/m2
+
+Diet: General
+
+Daily Calorie needs (kcals): 1500 calories, assessed as HB + 20% for activity.
+
+Daily Protein needs: 40 grams, assessed as 1.0 g/kg.
+
+Pt has been on a 3-day calorie count and has had an average intake of 1100
+calories.
+
+She was instructed to drink 2-3 cans of liquid supplement to help promote
+weight gain.
+
+She agrees with the plan and has my number for further assessment. May want a
+Resting
+
+Metabolic Rate as well. She takes an aspirin a day for knee pain.
+
+3. From the menu bar, click **Run** -> **Run AggregatePlaintextProcessor**.
+
+  
+You'll get a list of all the annotations in the Analysis Results frame.
+
+![screenshot illustrating step](/download/attachments/75014322/worddav2c25a00d
+f26e72f2c00fae8d12e5c3a5.png?version=1&modificationDate=1334686366000)
+
+4. Named entities are now recognized in this clinical document. Annotations of
+MedicationEventMention and EntityMention are created. To find one, in the
+**Analysis Results frame**, click on the key in front of:
+
+AnnotationIndex
+
+uima.tcas.Annotation
+
+edu.mayo.bmi.uima.core.type.textsem.IdentifiedAnnotation
+
+edu.mayo.bmi.uima.core.type.textsem.EntityMention
+
+and
+
+edu.mayo.bmi.uima.core.type.textsem.EventMention
+
+edu.mayo.bmi.uima.core.type.textsem.EventMention.MedicationEventMention
+
+
+  
+Then select **edu.mayo.bmi.uima.core.type.****textsem.****EntityMention** or *
+*edu.mayo.bmi.uima.core.type.****textsem.****EventMention.****Medication****Ev
+entMention**.This will show an Annotation Index in the lower frame. Select any
+annotation in that lower frame and you will see the text discovered in the
+Text frame on the right. You may close CVD if you wish.
+
+![screenshot illustrating step](/download/attachments/75014322/worddavcfd76769
+7d1e58970d516f1312a8a6e4.png?version=1&modificationDate=1334686366000)
+
+### Collection processing engine (CPE)
+
+Step
+
+Example
+
+1. Open a command prompt and change to the cTAKES_HOME directory:
+
+**Windows**:   
+
+  
+**Linux**:   
+
+![](/images/icons/emoticons/warning.png)
+
+**Note**  
+
+Note that cTAKES_HOME must be your current directory unless you are skilled at
+setting paths on your machine.
+
+2. Start the collection processing engine by running this command:
+
+**Windows**:   
+
+  
+**Linux**:   
+
+  
+The application may take a minute to start on slower hardware.
+
+![screenshot illustrating step](/download/attachments/75014322/worddav6ae43931
+208c6ae30f6e836859a2bb19.png?version=1&modificationDate=1334686366000)
+
+3. This will bring up the Collection Processing Engine Configurator. In the
+Menu bar click **File** > **Open CPE Descriptor**
+
+![screenshot illustrating step](/download/attachments/75014322/worddavdc5a7f71
+827d9e3ef9c3f0c241804365.png?version=1&modificationDate=1334686366000)
+
+4. Navigate to the file
+
+Click **Open**.
+
+![screenshot illustrating step](/download/attachments/75014322/worddav41b2878f
+e8e61ee02d2c07e89c106cdd.png?version=1&modificationDate=1334686365000)
+
+5. Click the Play button (green/blue **play arrow** near the bottom).
+
+![screenshot illustrating step](/download/attachments/75014322/worddav9acbe68c
+6f920e134a8eb49db96025e7.png?version=1&modificationDate=1334686365000)
+
+6. You should see that one document was processed. You did process a
+collection of documents. In this case the collection only contained one just
+to show how to do it. Close the results window.
+
+![screenshot illustrating step](/download/attachments/75014322/worddav42012486
+af09ef9457b4022474979a83.png?version=1&modificationDate=1334686366000)
+
+7. Close the CPE application. You may be prompted to save changes. Since this
+was just a test you may click the **No** button.
+
+![screenshot illustrating step](/download/attachments/75014322/worddav1c58ee84
+960f01830ce34429b3d96c3c.png?version=1&modificationDate=1334686366000)
+
+8. Open a new command prompt and change to the <cTAKES_HOME>
+
+No example.
+
+9. To test the results there is a comparison tool that will help show that the
+results match expectations with the following syntax:
+
+Where: **_<First File>_** is the first file to compare; **_<Second File>_** is
+the second file to compare; **_<diff-html>_** is where the results are written
+to
+
+  
+Copy and paste the example at the right (next cell) which has had our example
+files already substituted into a command prompt to run. In this case we have
+shipped an example of what the output should be for you to compare against.
+
+**Windows**:
+
+**Linux**:
+
+10. The resulting file will open for you. Look at the comparison to see the
+annotations resulting from this pipeline.
+
+**Windows:**
+
+**Linux**:
+
+![screenshot illustrating step](/download/attachments/75014322/worddavefef1435
+06073a5a3d6c91cbbab2c686.png?version=1&modificationDate=1334686366000)
+
+Using the same CVD and CPE programs in the manner described above, you can
+test all the other components. The analysis engines and collection processing
+engines shipped with cTAKES for some of the annotators are described in the
+following table.
+
+Annotator
+
+Description
+
+Abbreviated
+
+Example Analysis Engine (AE)
+
+Example Collection processing Engine (CPE)
+
+Example test data
+
+Clinical Document Pipeline
+
+the complete cTAKES pipeline to obtain majority of cTAKES annotations
+
+cdp
+
+cTAKES_HOME/cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextProcessor.xml
+
+cTAKES_HOME/cTAKESdesc/cdpdesc/collection_processing_engine/test_plaintext.xml
+
+cTAKES_HOME/testdata/cdptest
+
+Chunker
+
+obtain cTAKES chunking annotations
+
+chunker
+
+cTAKES_HOME/cTAKESdesc/chunkerdesc/analysis_engine/ChunkerAggregate.xml
+
+cTAKES_HOME/cTAKESdesc/chunkerdesc/collection_processing_engine/ChunkerCPE.xml
+
+cTAKES_HOME/testdata/chunkertest
+
+Dependency Parser
+
+obtain dependency parsing tree
+
+dp
+
+cTAKES_HOME/cTAKESdesc/dpdesc/analysis_engine/ClearParserTokenizedInfPosAggreg
+ate.xml
+
+cTAKES_HOME/cTAKESdesc/dpdesc/collection_processing_engine/ClearParserCPE.xml
+
+cTAKES_HOME/testdata/dptest
+
+Drug NER
+
+the annotator to obtain drug annotations
+
+drugner
+
+cTAKES_HOME/cTAKESdesc/drugnerdesc/analysis_engine/DrugAggregatePlaintextProce
+sor.xml
+
+cTAKES_HOME/cTAKESdesc/drugnerdesc/collection_processing_engine/DrugNER_PlainT
+ext_CPE.xml
+
+cTAKES_HOME/testdata/drugnertest
+
+Dictionary Lookup
+
+mapping cTAKES annotations to dictionaries (e.g., SNOMED_CT or RxNorm
+
+lookup
+
+cTAKES_HOME/cTAKESdesc/lookupdesc/analysis_engine/TestAggregateTAE.xml
+
+cTAKES_HOME/cTAKESdesc/lookupdesc/collection_processing_engine/LookupCPE.xml
+
+cTAKES_HOME/testdata/lookuptest
+
+PAD Term Spotter
+
+identifying terms related to PAD
+
+pad
+
+cTAKES_HOME/cTAKESdesc/paddesc/analysis_engine/Radiology_TermSpotterAnnotatorT
+AE.xml
+
+cTAKES_HOME/cTAKESdesc/paddesc/collection_processing_engine/Radiology_Sample.x
+ml
+
+cTAKES_HOME/testdata/padtest
+
+Smoking Status
+
+the annotator to obtain document or patient-level smoking status
+
+smoking
+
+cTAKES_HOME/cTAKESdesc/smokingdesc/analysis_engine/SimulatedProdSmokingTAE.xml
+
+cTAKES_HOME/cTAKESdesc/smokingdesc/collection_processing_engine/Sample_Smoking
+Status_output_flatfile.xml
+
+cTAKES_HOME/testdata/smokingtest
+
+Side Effect
+
+the annotator to find side effect mentions and sentences from clinical
+documents
+
+sideeffect
+
+cTAKES_HOME/cTAKESdesc/sideeffectdesc/analysis_engine/SideEffectAggregateTAE.x
+ml
+
+cTAKES_HOME/cTAKESdesc/sideeffectdesc/collection_processing_engine/SideEffectC
+PE.xml
+
+cTAKES_HOME/testdata/sideeffecttest
+
+## Next Steps
+
+The [cTAKES 2.5 Component Use
+Guide](/display/VKC/cTAKES+2.5+Component+Use+Guide) will help you to
+understand in great detail each of the cTAKES components that have been
+installed. In some cases you can learn how to improve the components. However,
+before you go on to process text in production you will need to consider
+dictionaries and models.
+
+### Dictionaries
+
+#### Bundled UMLS Dictionaries
+
+cTAKES includes the complete UMLS (SNOMED-CT and RxNorm) dictionaries.
+
+  * An rxnorm_index database (a Lucene index) containing drug names from RxNorm
+  * A UMLS database (using two hsqldb tables) containing anatomical sites, procedures, signs/symptoms, and disorders/diseases from SNOMED-CT (umls_ms_2011ab)
+
+To use them, you must have a UMLS username and password, and an Internet
+connection.
+
+![](/images/icons/emoticons/warning.png)
+
+**Note**  
+If you do not have a UMLS username and password, you may request one at [UMLS
+Terminology Services](https://uts.nlm.nih.gov/license.html)
+
+In order to use the UMLS dictionaries shipped with cTAKES you will need to do
+two things:
+
+(1) Change the UMLSUser and UMLSPW <nameValuePair> strings in these descriptor
+files with your UMLS username and password.
+
+  * Dictionary Lookup: <cTAKES_HOME>/cTAKESdesc/lookupdesc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml
+  * (optional) Drug NER: <cTAKES_HOME>/cTAKESdesc/drugnerdesc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml
+
+The following shows where in the files you would make the changes. (Do not
+change the <configurationParameters> by the same name.)
+
+(2) Include the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within your
+aggregate Analysis Engine or switch to the ones provided by cTAKES. cTAKES has
+provided duplicates of shipped Analysis Engine descriptors, put UMLS in the
+name, and placed DictionaryLookupAnnotatorUMLS.xml within them for these
+components:
+
+  * Dictionary Lookup
+  * Clinical Documents pipeline
+  * Drug NER
+  * Side Effect
+
+So you simply need to switch to using those descriptors. For example, if you
+were using AggregateCdaProcessor.xml in the Clinical Documents pipeline you
+would switch to using AggregateCdaUMLSProcessor.xml instead and you will now
+hook into the complete dictionaries.
+
+You can, of course, modify your own aggregate Analysis Engine files and place
+the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within them.
+
+Since this is an in-memory database implementation, please be patient during
+the initial load as it could take approximately 20-30 seconds for the database
+to initialize.
+
+If you would like to go back to using the small sample dictionaries that do
+not require a UMLS username, use the DictionaryLookupAnnotator.xml (UMLS is
+not in the file name) Analyis Engine descriptor in your aggregate. Just
+removing your password from the DictionaryLookupAnnotatorUMLS.xml files will
+not switch you back to the small sample dictionaries.
+
+#### LVG
+
+We have successfully tested the 2008 release of the full [LVG](http://lexsrv2.
+nlm.nih.gov/LexSysGroup/Projects/lvg/current/docs/userDoc/tools/lvg.html)
+data. In order to use this release of the full LVG data you should:
+
+  1. Download either the full version or the lite version from [NIH Lexical Tools](http://lexsrv2.nlm.nih.gov/LexSysGroup/Projects/lvg/2008/web/download.html)
+  2. Extract the TGZ file that you downloaded with a tool like 7-zip (available online) to a temporary directory. On some operating systems, like Windows, this may need to be done in two steps, 1) to uncompress and 2) to unzip.
+  3. Replace the directory <cTAKES_HOME>/resources/lvgresources/lvg/data/HSqlDb with data/HSqlDb from your extracted download. Replacing the entire directory is appropriate.
+  4. In the future, you can upgrade to later versions of LVG by editing the <cTAKES_HOME>/resources/lvgresources/lvg/data/config/lvg.properties file, replacing "lvg2008" with the name of the new release.
+
+#### Building Your Own Dictionaries
+
+To install customized dictionaries for RxNorm, SNOMED-CT, or other
+vocabularies that are available through the UMLS, see the following posts on
+the cTAKES forums:
+
+  * [https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=423](https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=423)
+  * [https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=80&start=20#p1459](https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=80&start=20#p1459)
+
+### Models
+
+Some models included in cTAKES may not represent your data distribution well.
+If you want to build or train your own models, please read the [cTAKES 2.5
+Component Use Guide](/display/VKC/cTAKES+2.5+Component+Use+Guide),
+particularly:
+
+  * [Training a sentence detector model](https://wiki.nci.nih.gov/display/VKC/cTAKES+2.5+-+Core#cTAKES2.5-Core-ToolsTrainingasentencedetectormodel)
+  * Training a Part of Speech (POS) tagger model (Building a model Obtaining training data)
+  * Creating a Part of Speech (POS) tag dictionary (Building a tag dictionary)
+  * Training a chunker model (Building a model - Prepare GENIA training data)
+  * Training a dependency parser (Dependency Parser)