You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@ctakes.apache.org by bl...@apache.org on 2012/11/15 21:16:12 UTC

svn commit: r1409972 - /incubator/ctakes/site/trunk/content/ctakes/2.6.0/user-guide-2.6.mdtext

Author: bleeker
Date: Thu Nov 15 20:16:11 2012
New Revision: 1409972

URL: http://svn.apache.org/viewvc?rev=1409972&view=rev
Log:
CMS commit to ctakes by bleeker

Modified:
    incubator/ctakes/site/trunk/content/ctakes/2.6.0/user-guide-2.6.mdtext

Modified: incubator/ctakes/site/trunk/content/ctakes/2.6.0/user-guide-2.6.mdtext
URL: http://svn.apache.org/viewvc/incubator/ctakes/site/trunk/content/ctakes/2.6.0/user-guide-2.6.mdtext?rev=1409972&r1=1409971&r2=1409972&view=diff
==============================================================================
--- incubator/ctakes/site/trunk/content/ctakes/2.6.0/user-guide-2.6.mdtext (original)
+++ incubator/ctakes/site/trunk/content/ctakes/2.6.0/user-guide-2.6.mdtext Thu Nov 15 20:16:11 2012
@@ -16,18 +16,288 @@ Notice:    Licensed to the Apache Softwa
            specific language governing permissions and limitations
            under the License.
 
-#cTAKES 2.6 User Guide
-This does not include package name updates to reflect apache.org, and maven was not used to generate the build. The build was done like the build for 2.5.
+#This page is under construction
 
-The "install" is to just unzip the archive.
+#cTAKES 3.0 User Guide
 
-To run cTAKES, you can start the CVD (or CPE) GUI using the script files [SH|BAT] files found within the top level directory.
-Once in the CVD GUI, select an aggregate to load, such as cTAKESdesc/cpddesc/analysis_engine/AggregatePlaintextProcessor.xml, then run the aggregate you just loaded using the menu options.
+cTAKES users are those who wish to use cTAKES as it is without code modifications.
+With these instructions you can install cTAKES, configure it, and use it to process text.
+cTAKES is built around analysis of text associated with a medical record. If you were planning to expand, change, or modify the
+code within cTAKES, refer to the [cTAKES 3.0 Developer Guide](/3.0.0/developer-guide-3.0).
 
-The archive includes source, compiled class files, and a jar.
+There are GUIs for the configuration and viewing of results, however, there are no summaries, statistics, or pretty graphs.
+The results are lots of annotations recorded in [UIMA XMI files](http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.xmi_emf).
+You can see and sift through the results, but more processing is required to reap the benefits of the annotations.
+The process that you set up to do these annotations is called a pipeline.
 
-The source from which this release was built is in SVN at https://svn.apache.org/repos/asf/incubator/ctakes/branches/SHARPn-cTAKES/
+These instructions will cover installation of cTAKES and test of some components 
+including trained models for sentence detection and tagging parts of speech,
+dictionaries from a subset of the UMLS, a very small subset of the full LVG
+resource, etc.
 
-Also included there, within "files for pipeline root", are the ANT scripts used to merge the source directories and build the archive (similar to the way 2.5 was).
+Further exploitation of the software's ability may require a few additional steps.
+For example, you may want to use a different dictionary in order to include vocabulary from your institution.
 
-For all other documentation please refer to the [cTAKES 2.5 documentation](https://wiki.nci.nih.gov/display/VKC/cTAKES+2.5).
\ No newline at end of file
+## Install cTAKES
+
+1. Make sure you have [Java](http://www.java.com/en/download/faq/develop.xml) 1.6 or higher. Many systems come with Java already
+installed. Run this command to check your version:
+<pre>
+java -version
+</pre>
+2. Download the [**cTAKES-3.0.zip**](NotYetAvailable) file.
+Save the file to a temporary location on your machine.
+
+3. Unzip the ZIP file into a directory that you want to be the cTAKES installed home directory.
+This directory we will call **&lt;cTAKES_HOME&gt;**. You will need to refer to this later. **Windows**: <code>c:\cTAKES-3.0</code> **Linux**: <code>    /usr/bin/cTAKES-3.0</code>  
+
+## Process documents using cTAKES
+
+cTAKES allows you to use most components in two different ways:
+
+  1. Using cTAKES CAS Visual Debugger (CVD) to view the results stored as XMI files or run the annotators or
+  2. Using cTAKES collection processing engine (CPE) to process documents in &lt;cTAKES_HOME&gt;/testdata directory
+
+### CAS Visual Debugger (CVD)
+
+The main purpose of the [CAS Visual Debugger (CVD)](http://uima.apache.org/downloads/releaseDocs/2.2.2-incubating/docs/html/tools/tools.html#ugr.tools.cvd) 
+is to let you browse all the data that is created when you run an component over some text. 
+Components are also called an "analysis engine" as they can be made up of multiple annotators.
+
+1. Open a command prompt and change to the &lt;cTAKES_HOME&gt; directory.  
+**Windows**: <code>cd \cTAKES-3.0</code> **Linux**: <code>cd /usr/bin/cTAKES-3.0</code>  
+&nbsp;  
+**Note:** &lt;cTAKES_HOME&gt; must be your current directory unless you are skilled at setting
+paths on your machine.
+
+2. Start the CAS Visual Debugger by running this command. The application may take a minute to start on slower hardware:  
+**Windows**: <code>runctakesCVD.bat</code> **Linux**: <code>runctakesCVD.sh</code>
+
+3. An analysis engine (AE) needs to be loaded in order to process text.  
+Use the **Run** -> **Load AE** menu bar command. Navigate to the file: <code>&lt;cTAKES_HOME&gt;/cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextProcessor.xml</code> Click **Open**.
+
+4. Copy the text in this example and paste the contents into the Text section of CVD, replacing the text that is already
+there. This example file can also be found in test data: <code>&lt;cTAKES_HOME&gt;/testdata/cdptest/testinput/plaintext/testpatient_plaintext_1.txt</code>
+<pre>
+Dr. Nutritious
+&nbsp;
+Medical Nutrition Therapy for Hyperlipidemia
+&nbsp;
+Referral from: Julie Tester, RD, LD, CNSD
+Phone contact: (555) 555-1212
+Height: 144 cm Current Weight: 45 kg Date of current weight: 02-29-2001
+Admit Weight: 53 kg BMI: 18 kg/m2
+Diet: General
+Daily Calorie needs (kcals): 1500 calories, assessed as HB + 20% for activity.
+Daily Protein needs: 40 grams, assessed as 1.0 g/kg.
+Pt has been on a 3-day calorie count and has had an average intake of 1100 calories.
+She was instructed to drink 2-3 cans of liquid supplement to help promote weight gain.
+She agrees with the plan and has my number for further assessment. May want a Resting
+Metabolic Rate as well. She takes an aspirin a day for knee pain.
+</pre>
+5. From the menu bar, click **Run** -> **Run AggregatePlaintextProcessor**.
+You'll get a list of all the annotations in the Analysis Results frame.
+
+6. Named entities are now recognized in this clinical text. Annotations of
+MedicationEventMention and EntityMention are created. To find one, in the
+**Analysis Results frame**, click on the keys in front of:
+<pre>
+AnnotationIndex
+uima.tcas.Annotation
+edu.mayo.bmi.uima.core.type.textsem.IdentifiedAnnotation
+edu.mayo.bmi.uima.core.type.textsem.EntityMention
+and
+edu.mayo.bmi.uima.core.type.textsem.EventMention
+edu.mayo.bmi.uima.core.type.textsem.EventMention.MedicationEventMention
+</pre>
+7. Then select **edu.mayo.bmi.uima.core.type.textsem.EntityMention** or 
+**edu.mayo.bmi.uima.core.type.textsem.EventMention.MedicationEventMention**.  
+This will show an Annotation Index in the lower frame. Select any
+annotation in that lower frame and you will see the text discovered in the
+Text frame on the right.
+
+
+### Collection processing engine (CPE)
+
+The [Collection Processing Engine (CPE) Configuration GUI](http://uima.apache.org/downloads/releaseDocs/2.2.2-incubating/docs/html/tools/tools.html#ugr.tools.cpe) is for configuring components (aka analysis engine) to process documents (called a pipeline).
+
+1. Open a command prompt and change to the &lt;cTAKES_HOME&gt; directory.  
+**Windows**: <code>cd \cTAKES-3.0</code> **Linux**: <code>cd /usr/bin/cTAKES-3.0</code>  
+&nbsp;  
+**Note:** &lt;cTAKES_HOME&gt; must be your current directory unless you are skilled at setting
+paths on your machine.
+
+2. Start the Collection Processing Engine (CPE) by running this command. The application may take a minute to start on slower hardware:  
+**Windows**: <code>runctakesCPE.bat</code> **Linux**: <code>runctakesCPE.sh</code>
+
+3. This will bring up the Collection Processing Engine Configurator. In the
+Menu bar click **File** > **Open CPE Descriptor**.
+
+4. Navigate to the file: <code>&lt;cTAKES_HOME&gt;/cTAKESdesc/cdpdesc/collection_processing_engine/test_plaintext.xml</code> Click **Open**.
+
+5. Click the Play button (green/blue **play arrow** near the bottom).
+
+6. You should see that one document was processed. A collection of documents was processed, however, 
+in this case, the collection only contained one just
+to show how to do it.  
+Close the results window.
+
+7. Close the CPE application. You may be prompted to save changes. Since this
+was just a test you may click the **No** button.
+
+### Validate CPE Results
+1. Open a command prompt and change to the &lt;cTAKES_HOME&gt; directory.  
+**Windows**: <code>cd \cTAKES-3.0</code> **Linux**: <code>cd /usr/bin/cTAKES-3.0</code>  
+
+2. To test the results, you will use a comparison tool that will help show that the
+results match expectations. Enter this command:
+<pre>
+java -cp cTAKES.jar edu.mayo.bmi.utils.xcas_comparison.Compare <First File> <Second File> <diff-html>
+</pre>
+Where: **_&lt;First File&gt;_** is the first file to compare; **_&lt;Second File&gt;_** is
+the second file to compare; **_&lt;diff-html&gt;_** is where the results are written
+to. For example:
+**Windows**:
+<pre>
+java -cp cTAKES.jar edu.mayo.bmi.utils.xcas_comparison.Compare ^
+"testdata\cdptest\testoutput\plaintext\sample_note_plaintext.xml" ^
+"testdata\cdptest\testsampleoutput\plaintext\sample_note_plaintext.xml" ^
+c:\stuff\diff-html.html
+</pre>
+**Linux**:
+<pre>
+java edu.mayo.bmi.utils.xcas_comparison.Compare \
+"/usr/bin/cTAKES2.5/testdata/cdptest/testoutput/plaintext\sample_note_plaintext.xml" \
+"/usr/bin/cTAKES2.5/testdata/cdptest/testsampleoutput/plaintext/sample_note_plaintext.xml" \
+/tmp/diff-html.html
+</pre>
+Copy and paste the example above, which has had our example
+files already substituted, into a command prompt to run. In this case we have
+shipped an example of what the output should be for you to compare against.
+
+3. The resulting file will open for you. Look at the comparison to see the
+annotations resulting from this pipeline.
+**Windows**: <code>c:\stuff\diff-html.html</code> **Linux**: <code>/tmp/diff-html.html</code>  
+
+Using the same CVD and CPE programs in the manner described above, you can
+test all the other components. The analysis engines and collection processing
+engines shipped with cTAKES for some of the annotators are described in the
+following table.
+
+|Annotator|Description|Abbreviated|Example Analysis Engine (AE)|Example Collection processing Engine (CPE)|Example test data|
+|---------|-----------|-----------|----------------------------|------------------------------------------|-----------------|
+|Clinical Document Pipeline|the complete cTAKES pipeline to obtain majority of cTAKES annotations|cdp|cTAKES_HOME/cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextProcessor.xml|cTAKES_HOME/cTAKESdesc/cdpdesc/collection_processing_engine/test_plaintext.xml|cTAKES_HOME/testdata/cdptest|
+|Chunker|obtain cTAKES chunking annotations|chunker|cTAKES_HOME/cTAKESdesc/chunkerdesc/analysis_engine/ChunkerAggregate.xml|cTAKES_HOME/cTAKESdesc/chunkerdesc/collection_processing_engine/ChunkerCPE.xml|cTAKES_HOME/testdata/chunkertest|
+|Dependency Parser|obtain dependency parsing tree|dp|cTAKES_HOME/cTAKESdesc/dpdesc/analysis_engine/ClearParserTokenizedInfPosAggregate.xml|cTAKES_HOME/cTAKESdesc/dpdesc/collection_processing_engine/ClearParserCPE.xml|cTAKES_HOME/testdata/dptest|
+|Drug NER|the annotator to obtain drug annotations|drugner|cTAKES_HOME/cTAKESdesc/drugnerdesc/analysis_engine/DrugAggregatePlaintextProcesor.xml|cTAKES_HOME/cTAKESdesc/drugnerdesc/collection_processing_engine/DrugNER_PlainText_CPE.xml|cTAKES_HOME/testdata/drugnertest|
+|Dictionary Lookup|mapping cTAKES annotations to dictionaries (e.g., SNOMED_CT or RxNorm|lookup|cTAKES_HOME/cTAKESdesc/lookupdesc/analysis_engine/TestAggregateTAE.xml|cTAKES_HOME/cTAKESdesc/lookupdesc/collection_processing_engine/LookupCPE.xml|cTAKES_HOME/testdata/lookuptest|
+|PAD Term Spotter|identifying terms related to PAD|pad|cTAKES_HOME/cTAKESdesc/paddesc/analysis_engine/Radiology_TermSpotterAnnotatorTAE.xml|vcTAKES_HOME/cTAKESdesc/paddesc/collection_processing_engine/Radiology_Sample.xml|cTAKES_HOME/testdata/padtest|
+|Smoking Status|the annotator to obtain document or patient-level smoking status|smoking|cTAKES_HOME/cTAKESdesc/smokingdesc/analysis_engine/SimulatedProdSmokingTAE.xml|cTAKES_HOME/cTAKESdesc/smokingdesc/collection_processing_engine/Sample_SmokingStatus_output_flatfile.xml|cTAKES_HOME/testdata/smokingtest|
+|Side Effect|the annotator to find side effect mentions and sentences from clinical documents|sideeffect|cTAKES_HOME/cTAKESdesc/sideeffectdesc/analysis_engine/SideEffectAggregateTAE.xml|cTAKES_HOME/cTAKESdesc/sideeffectdesc/collection_processing_engine/SideEffectCPE.xml|cTAKES_HOME/testdata/sideeffecttest|
+
+## Next Steps
+
+The [cTAKES 3.0 Component Use Guide](3.0.0/component-use-guide-3.0) will help you to
+understand in great detail each of the cTAKES components that have been
+installed. In some cases you can learn how to improve the components. However,
+before you go on to process text in production you will need to consider
+dictionaries and models.
+
+### Dictionaries
+
+#### Bundled UMLS Dictionaries
+
+cTAKES includes the complete UMLS (SNOMED-CT and RxNorm) dictionaries.
+
+* An rxnorm_index database (a Lucene index) containing drug names from RxNorm
+* A UMLS database (using two hsqldb tables) containing anatomical sites, procedures, signs/symptoms, and disorders/diseases from SNOMED-CT (umls_ms_2011ab)
+
+To use them, you must have a UMLS username and password, and an Internet
+connection.
+
+**Note**: If you do not have a UMLS username and password, you may request one at [UMLS
+Terminology Services](https://uts.nlm.nih.gov/license.html).
+
+In order to use the UMLS dictionaries shipped with cTAKES you will need to do
+two things:
+
+1. Change the UMLSUser and UMLSPW &lt;nameValuePair&gt; strings in these descriptor
+files with your UMLS username and password.
+ * Dictionary Lookup: &lt;cTAKES_HOME&gt;/cTAKESdesc/lookupdesc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml
+ * (optional) Drug NER: &lt;cTAKES_HOME&gt;/cTAKESdesc/drugnerdesc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml
+The following shows where in the files you would make the changes. (Do not
+change the &lt;configurationParameters&gt; by the same name.)
+<pre>
+&lt;nameValuePair&gt;
+&lt;name&gt;UMLSUser&lt;/name&gt;
+&lt;value&gt;
+&lt;string&gt;YOUR_UMLS_USERNAME_HERE&lt;/string&gt;
+&lt;/value&gt;
+&lt;/nameValuePair&gt;
+&lt;nameValuePair&gt;
+&lt;name&gt;UMLSPW&lt;/name&gt;
+&lt;value&gt;
+&lt;string&gt;YOUR_UMLS_PASSWORD_HERE&lt;/string&gt;
+&lt;/value&gt;
+&lt;/nameValuePair&gt;
+</pre>
+2. Include the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within your
+aggregate Analysis Engine or switch to the ones provided by cTAKES. cTAKES has
+provided duplicates of shipped Analysis Engine descriptors, put UMLS in the
+name, and placed DictionaryLookupAnnotatorUMLS.xml within them for these
+components:
+ * Dictionary Lookup
+ * Clinical Documents pipeline
+ * Drug NER
+ * Side Effect
+
+So you simply need to switch to using those descriptors. For example, if you
+were using AggregateCdaProcessor.xml in the Clinical Documents pipeline you
+would switch to using AggregateCdaUMLSProcessor.xml instead and you will now
+hook into the complete dictionaries.
+
+You can, of course, modify your own aggregate Analysis Engine files and place
+the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within them.
+
+Since this is an in-memory database implementation, please be patient during
+the initial load as it could take approximately 20-30 seconds for the database
+to initialize.
+
+If you would like to go back to using the small sample dictionaries that do
+not require a UMLS username, use the DictionaryLookupAnnotator.xml (UMLS is
+not in the file name) Analyis Engine descriptor in your aggregate. Just
+removing your password from the DictionaryLookupAnnotatorUMLS.xml files will
+not switch you back to the small sample dictionaries.
+
+#### LVG
+
+We have successfully tested the 2008 release of the full [LVG](http://lexsrv2.
+nlm.nih.gov/LexSysGroup/Projects/lvg/current/docs/userDoc/tools/lvg.html)
+data. In order to use this release of the full LVG data you should:
+
+  1. Download either the full version or the lite version from [NIH Lexical Tools](http://lexsrv2.nlm.nih.gov/LexSysGroup/Projects/lvg/2008/web/download.html)
+  2. Extract the TGZ file that you downloaded with a tool like 7-zip (available online) to a temporary directory. On some operating systems, like Windows, this may need to be done in two steps, 1) to uncompress and 2) to unzip.
+  3. Replace the directory &lt;cTAKES_HOME&gt;/resources/lvgresources/lvg/data/HSqlDb with data/HSqlDb from your extracted download. Replacing the entire directory is appropriate.
+  4. In the future, you can upgrade to later versions of LVG by editing the &lt;cTAKES_HOME&gt;/resources/lvgresources/lvg/data/config/lvg.properties file, replacing "lvg2008" with the name of the new release.
+
+#### Building Your Own Dictionaries
+
+To install customized dictionaries for RxNorm, SNOMED-CT, or other
+vocabularies that are available through the UMLS, see the following posts on
+the cTAKES forums:
+
+  * [https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=423](https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=423)
+  * [https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=80&start=20#p1459](https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=80&start=20#p1459)
+
+### Models
+
+Some models included in cTAKES may not represent your data distribution well.
+If you want to build or train your own models, please read the [cTAKES 3.0 
+Component Use Guide](3.0.0/component-use-guide-3.0),
+particularly:
+
+  * [Training a sentence detector model](NotYet Available)
+  * Training a Part of Speech (POS) tagger model (Building a model Obtaining training data)
+  * Creating a Part of Speech (POS) tag dictionary (Building a tag dictionary)
+  * Training a chunker model (Building a model - Prepare GENIA training data)
+  * Training a dependency parser (Dependency Parser)
\ No newline at end of file