You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@ctakes.apache.org by bu...@apache.org on 2012/11/16 17:46:46 UTC

svn commit: r838602 - in /websites/staging/ctakes/trunk/content: ./ ctakes/2.6.0/ctakes-2.6-Smoking-Status.html

Author: buildbot
Date: Fri Nov 16 16:46:45 2012
New Revision: 838602

Log:
Staging update by buildbot for ctakes

Added:
    websites/staging/ctakes/trunk/content/ctakes/2.6.0/ctakes-2.6-Smoking-Status.html
Modified:
    websites/staging/ctakes/trunk/content/   (props changed)

Propchange: websites/staging/ctakes/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Fri Nov 16 16:46:45 2012
@@ -1 +1 @@
-1410451
+1410453

Added: websites/staging/ctakes/trunk/content/ctakes/2.6.0/ctakes-2.6-Smoking-Status.html
==============================================================================
--- websites/staging/ctakes/trunk/content/ctakes/2.6.0/ctakes-2.6-Smoking-Status.html (added)
+++ websites/staging/ctakes/trunk/content/ctakes/2.6.0/ctakes-2.6-Smoking-Status.html Fri Nov 16 16:46:45 2012
@@ -0,0 +1,372 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+ 
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+ 
+       http://www.apache.org/licenses/LICENSE- 2.0
+ 
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+<link href="/ctakes/css/ctakes.css" rel="stylesheet" type="text/css">
+
+<title>cTAKES 2.6 Smoking Status</title>
+<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+
+</head>
+ 
+<body>
+ <div class="banner">
+      <div id="bannerleft">
+		<a href="http://www.apache.org/"><img src="http://www.apache.org/images/asf_logo_wide.gif" alt="The Apache Software Foundation" border="0"/></a>
+	<br/>
+			<img alt="cTAKES logo" src="/ctakes/images/ctakes_logo.jpg" border="0"/>
+      </div>  
+    <div id="bannerright">	
+	      <img id="asf-logo" alt="Apache Incubator" src="http://incubator.apache.org/images/egg-logo.png" border="0"/></a>			
+	  </div>
+ </div>  
+  <div id="clear"></div>
+
+
+  <div id="sidenav">
+    <h1 id="general">General</h1>
+<ul>
+<li><a href="/ctakes/index.html">About</a></li>
+<li><a href="/ctakes/gettingstarted.html">Getting Started</a></li>
+<li><a href="/ctakes/downloads.html">Downloads</a></li>
+<li><a href="/ctakes/glossary.html">Glossary</a></li>
+</ul>
+<h1 id="community">Community</h1>
+<ul>
+<li><a href="/ctakes/get-involved.html">Get Involved</a></li>
+<li><a href="https://issues.apache.org/jira/browse/ctakes">Bug Tracker</a></li>
+<li><a href="/ctakes/mailing-lists.html">Mailing Lists</a></li>
+<li><a href="/ctakes/people.html">People</a></li>
+<li><a href="http://incubator.apache.org/projects/ctakes.html">Incubator page</a></li>
+<li><a href="/ctakes/license.html">License</a></li>
+<li><a href="/ctakes/history.html">History</a></li>
+<li><a href="/ctakes/community-faqs.html">Community FAQs</a></li>
+</ul>
+<h1 id="users">Users</h1>
+<ul>
+<li><a href="/ctakes/userguide.html">User Guide</a></li>
+<li><a href="/ctakes/user-faqs.html">User FAQs</a></li>
+</ul>
+<h1 id="developers">Developers</h1>
+<ul>
+<li><a href="/ctakes/developerguide.html">Developer Guide</a></li>
+<li><a href="/ctakes/developer-faqs.html">Developer FAQs</a></li>
+</ul>
+<h1 id="ppmc">PPMC</h1>
+<ul>
+<li><a href="/ctakes/ppmc-faqs.html">PPMC FAQs</a></li>
+<li><a href="/ctakes/ctakes-release-guide.html">Release Guide</a> <br />
+</li>
+</ul>
+<h1 id="asf">ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+</ul>
+  </div>
+  <div id="contenta">
+    <h1 id="ctakes-26-smoking-status">cTAKES 2.6 - Smoking status</h1>
+<h2 id="overview-of-smoking-status">Overview of Smoking status</h2>
+<p>The "smoking status" pipeline processes flat files or CDA (Clinical Document
+Architecture) documents to classify patient records into five pre-determined
+categories - past smoker (P), current smoker (C), smoker (S), nonsmoker (N),
+and unknown (U), where a past and current smoker are distinguished based on
+temporal expressions in the patient's medical records.</p>
+<h2 id="analysis-engines-annotator">Analysis engines (annotator)</h2>
+<h3 id="simulatedprodsmokingtaexml">SimulatedProdSmokingTAE.xml</h3>
+<p>The file desc/analysis_engine/SimulatedProdSmokingTAE.xml provides a working
+example of the smoking status pipeline, utilizing the aggregate TAEs. This
+Aggregate includes Token, Sentence, SentenceAdjuster, ClassifiableEntries
+(which in turn invokes the ProductionPostSentenceAggregate annotators
+internally).</p>
+<p>Shipped with this annotator:</p>
+<ul>
+<li>ExternalBaseAggregateTAE</li>
+<li>SentenceAdjuster</li>
+<li>ClassifiableEntriesAnnotator</li>
+</ul>
+<p><img alt="" src="/images/icons/emoticons/information.png" /></p>
+<p>SimulatedProdSmokingTAE_CDA.xml is also provided to process CDA documents. The
+aggregate flow will contain the annotator version
+ExternalBaseAggregateTAE_CDA.xml which will process the document as a Clinical
+Document Architecture (CDA) file.</p>
+<h3 id="productionpostsentenceaggregate_step1xml">ProductionPostSentenceAggregate_step1.xml</h3>
+<p>The file desc/analysis_engine/ProductionPostSentenceAggregate_step1.xml
+Aggregate TAE is used to run the first step classification stage via the
+KuRuleBasedClassifierAnnotator.</p>
+<ul>
+<li>TokenizerAnnotator (core project)</li>
+<li>KuRuleBasedClassifierAnnotator</li>
+</ul>
+<p><img alt="" src="/images/icons/emoticons/information.png" /></p>
+<p>This annotator is not contained in the aggregate flow, but introduced via the
+resource settings of the ClassifiableEntriesAnnotator (see the method
+initialize() in this class).
+UIMAFramework.produceAnalysisEngine(taeSpecifierStep1, ResMgr, null)
+instantiates the AE and
+CasCreationUtils.createCas(taeStep1.getAnalysisEngineMetaData()).getJCas()
+retrieves the CAS.</p>
+<h3 id="productionpostsentenceaggregate_step2_libsvmxml">ProductionPostSentenceAggregate_step2_libsvm.xml</h3>
+<p>The file desc/analysis_engine/ProductionPostSentenceAggregate_step2_libsvm.xml
+is the Aggregate TAE used to run the second classification stage via the
+libSVM training module. Shipped with this annotator:</p>
+<ul>
+<li>PcsClassifierAnnotator_libsvm,</li>
+<li>ArtificialSentenceAnnotator,</li>
+<li>SentenceAdjuster,</li>
+<li>SmokingStatusDictionaryLookupAnnotator,</li>
+<li>NegationAnnotator.</li>
+</ul>
+<p><img alt="" src="/images/icons/emoticons/information.png" /></p>
+<p>This annotator is not contained in the aggregate flow, but introduced via the
+resource settings of the ClassifiableEntriesAnnotator (see the method
+initialize() in this class).
+UIMAFramework.produceAnalysisEngine(taeSpecifierStep2, ResMgr, null)
+instantiates the AE and the ClassifiableEntriesAnnotator process method will
+process if the smoking status is known.</p>
+<h3 id="externalbaseaggregatetaexml">ExternalBaseAggregateTAE.xml</h3>
+<p>The file desc/analysis_engine/ExternalBaseAggregateTAE.xml provides an
+aggregate flow for the external annotations, SimpleSegmentAnnotator,
+TokenizerAnnotator, SentenceDetectorAnnotator, and LvgAnnotator. Shipped with
+this annotator:</p>
+<ul>
+<li>SimpleSegmentAnnotator,</li>
+<li>TokenizerAnnotator (core project),</li>
+<li>SentDetectorAnnotator (core project),</li>
+<li>LvgAnnotation (LVG project).</li>
+</ul>
+<p><img alt="" src="/images/icons/emoticons/information.png" /></p>
+<p>ExternalBaseAggregateTAE_CDA.xml is also provided to process CDA documents.
+The aggregate flow will contain the specialized class CdaCasInitializer
+(replacing the SimpleSegmentAnnotator used by flat file/non-CDA version) which
+will process the document as a Clinical Document Architecture (CDA) file. This
+annotator is contained in the SimulatedProdSmokingTAE_CDA aggregate. Red text
+indicates shipped with this annotator.</p>
+<h3 id="sentenceadjusterxml">SentenceAdjuster.xml</h3>
+<p>The file desc/analysis_engine/SentenceAdjuster.xml drives the java class
+edu.mayo.bmi.smoking.ae.SentenceAdjuster annotator that uses some patterns and
+some rules about those patterns to adjust certain annotations. This annotator
+was extended to handle sentence boundaries for the Smoking status
+classification.</p>
+<p>Example: "Tobacco: none" has two sentences as detected by the original cTAKES
+sentence boundary detector. This annotator merges them into one sentence to
+enable correct negation detection.</p>
+<p><strong>Parameters</strong><br />
+UseSegments &lt;Boolean/Single-valued/Optional&gt;</p>
+<p>(Default Value = 'false') Flag whether to use segments or full doc text.</p>
+<p>SegmentsToSkip &lt;String/Multi-valued/Optional&gt;</p>
+<p>WordsToIgnore &lt;String/Multi-valued/Optional&gt;</p>
+<p>(Default Value = 'null') Set of words that PostModifier should ignore (act as
+if the word was not there) when looking for a pattern match.</p>
+<p>WordsInPattern &lt;String/Multi-valued/Required&gt;</p>
+<p>(Default Value = 'no none never quit smoked ;') The list of words ("none",
+"no", etc) used in the pattern.</p>
+<h3 id="classifiableentriesannotatorxml">ClassifiableEntriesAnnotator.xml</h3>
+<p>The file desc/analysis_engine/ClassifiableEntriesAnnotator.xml drives the java
+class edu.mayo.bmi.smoking.ae.ClassifiableEntries. Converts Sentences to
+ClassifiableEntries (required by SmokingStatus pipeline) and ultimately to
+RecordSentence.</p>
+<p><strong>Parameters</strong><br />
+TruthFile &lt;String/Single-valued/Optional&gt;</p>
+<p>(Default Value = 'null') Delimited Truth file. Delimiter is expected to be the
+TAB char. If not specified, then the classification feature of the
+RecordSentence object will not be set.</p>
+<p>AllowedClassifications &lt;String/Multi-valued/Optional&gt;</p>
+<p>(Default Value = '"SMOKER" "CURRENT_SMOKER" "NON_SMOKER" "PAST_SMOKER
+UNKNOWN"') See edu.mayo.bmi.smoking.Const.java for permitted string values.</p>
+<p>SectionsToIgnore &lt;String/Multi-valued/Optional&gt;</p>
+<p>(Default Value = '"20109" "20138"') Sections to ignore for ClassifiableEntries
+- Family History (20109). A given patient's smoking status could be confused
+by smoking status of others. To avoid this confusion there is an option to
+exclude certain sections such as family history.</p>
+<p>ConWordsFile &lt;Boolean/Single-valued/Optional&gt;</p>
+<p>(Default Value =
+'$main_root/resources/ss/data/context/negationContradictionWords.txt')
+Contradiction words list. If this word appears in sentence do not negate.</p>
+<p><strong>Resources</strong><br />
+UimaDescriptorStep1</p>
+<p>(Default Value =
+'$main_root/desc/analysis_engine/ProductionPostSentenceAggregate_step1.xml')
+Annotator responsible for the first classification step, namely,
+KuRuleBasedClassifierAnnotator.</p>
+<p>UimaDescriptorStep2</p>
+<p>(Default Value = '$main_root/desc/analysis_engine/ProductionPostSentenceAggreg
+ate_step2_libsvm.xml') Annotator responsible for second classification step.</p>
+<p><img alt="" src="/images/icons/emoticons/information.png" /></p>
+<p>The UimaDescriptorStep1/UimaDescriptorStep2 are introduced as resources via
+the ClassifiableEntriesAnnotator annotator during the initialization step.
+This allows the aggregates specified to be instantiated and analysis
+processing to be handled on a separate asynchronized thread. This enhances
+performance overall by ensuring the resources required by the process method
+will have output of the ProductionPostSentenceAggregates prepared without
+requiring a synchronized data flow (i.e. explicit aggregate flow via component
+descriptor aggregate flow).</p>
+<h3 id="kurulebasedclassifierannotatorxml">KuRuleBasedClassifierAnnotator.xml</h3>
+<p>The file desc/analysis_engine/KuRuleBasedClassifierAnnotator.xml drives the
+java class edu.mayo.bmi.smoking.ae.KuRuleBasedClassifierAnnotator. Known vs
+Unknown classifier using smoking related keywords.</p>
+<p><strong>Parameters</strong><br />
+CaseSensitive &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = 'false') Specifies if a distinction between lower and upper
+case text will be considered.</p>
+<p>classAttribute &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = 'smoking_status') Value used by the NominalAttributeValue via
+setAttributeName.</p>
+<p>SmokingWordsFile &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = 'ss/data/KU/keywords.txt') Smoking related keywords to
+identify "known" class.</p>
+<p>UnknownWordsFile &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = 'ss/data/KU/unknown_words.txt') If this word/phrase appears,
+treat the sentence as UNKNOWN.</p>
+<h3 id="pcsclassifierannotator_libsvmxml">PcsClassifierAnnotator_libsvm.xml</h3>
+<p>The file desc/analysis_engine/PcsClassifierAnnotator.xml smoking status
+classifier using libsvm. This annotator plays the same role as
+PcsBOWFeatureAnnotator.xml, PcsClassifierAnnotator.xml, and
+BOWFeatureRemovalAnnotator.xml, which use libsvm.</p>
+<p><strong>Parameters</strong><br />
+CaseSensitive &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = 'false') Specifies if a distinction between lower and upper
+case text will be considered.</p>
+<p><strong>Resources</strong><br />
+StopWordsFile</p>
+<p>(Default Value = 'file:ss/data/PCS/stopwords_PCS.txt)'</p>
+<p>Resource file that provides terms used as stop words, e.g. "a" "an" "the".</p>
+<p>PCSKeyWordFile</p>
+<p>(Default Value = 'file:ss/data/PCS/keywords_PCS.txt)'</p>
+<p>Resource file that provides terms used as PCS key words, e.g. '"refrain"
+"discussed" "to_quit" (if bigram it is connected by underscore, i.e. "_")'.</p>
+<p>PathOfModel</p>
+<p>(Default Value = 'file:ss/data/PCS/pcs_libsvm-2.91.model')</p>
+<p>Resource file that provides trained model for smoking status classification.</p>
+<h3 id="artificialsentenceannotatorxml">ArtificialSentenceAnnotator.xml</h3>
+<p>The file desc/analysis_engine/ArtificialSentenceAnnotator.xml drives the java
+class edu.mayo.bmi.uima.core.ae.CopyAnnotator. Artificially creates a new
+SentenceAnnotation object by treating the entire document as a sentence. The
+offset values from the DocumentAnnotation object are transferred over to the
+new SentenceAnnotation object.</p>
+<p><strong>Parameters</strong><br />
+srcObjClass &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = 'false') Source JCas object class.</p>
+<p>This must be an object that already exists in the JCas.</p>
+<p>destObjClass &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = 'false') Destination JCas object class.</p>
+<p>A new JCas object will be created.</p>
+<p>dataBindMap &lt;String/Multi-valued/Required&gt;</p>
+<p>(Default Value = 'false')</p>
+<p>Binds data from source to destination.</p>
+<p>Format for each entry is the getter method name of the source to the setter
+method name of the destination. e.g. getMyValue|setMyValue</p>
+<h3 id="smokingstatusdictionarylookupannotatorxml">SmokingStatusDictionaryLookupAnnotator.xml</h3>
+<p>The file desc/analysis_engine/SmokingStatusDictionaryLookupAnnotator.xml
+drives the java class edu.mayo.bmi.uima.lookup.ae.DictionaryLookupAnnotator.
+Performs dictionary lookup and stores the hits as NamedEntityAnnotation
+objects.</p>
+<p><strong>Resources</strong><br />
+LookupDescriptor</p>
+<p>(Default Value = 'file:ss/data/SmokingStatusLookupConfig.xml)'</p>
+<p>Defines which dictionaries will be used, the implementation specifics, and
+metaField configuration.</p>
+<p>SmokerDictionary</p>
+<p>(Default Value = 'file:ss/data/smoker.dictionary)'</p>
+<p>Resource file that provides terms used as smoking words, e.g. '"smokes"
+"tobacco"'.</p>
+<p>NonSmokerDictionary</p>
+<p>(Default Value = 'file:ss/data/nonsmoker.dictionary')</p>
+<p>Resource file that provides terms used as non-smoking words, e.g. '"non-
+smoker"'.</p>
+<h3 id="negationannotatorxml">NegationAnnotator.xml</h3>
+<p>The file desc/analysis_engine/NegationAnnotator.xml drives the java class
+edu.mayo.bmi.uima.context.ContextAnnotator. Boundary tokens moved to external
+resource - ss/data/context/boundaryData.txt.</p>
+<p><strong>Resources</strong><br />
+BoundaryData</p>
+<p>(Default Value = 'file:ss/data/context/boundaryData.txt')</p>
+<p>Resource file that provides terms used as sentence boundaries, e.g.
+'"nevertheless" "how" ";" "."'.</p>
+<p><img alt="" src="/images/icons/emoticons/information.png" /></p>
+<p>The parameters provided act the same way that the core's version of the
+'NegationAnnotator', but since the boundary stop words are different for the
+smoking status pipeline, a separate implementation was necessary. However,
+current release of 'NegationAnnotator' does not use this resource.</p>
+<h2 id="cas-consumers-recordresolutioncasconsumerxml">CAS consumers - RecordResolutionCasConsumer.xml</h2>
+<p>The CAS consumer provided in
+/desc/cas_consumper/RecordResolutionCasConsumer.xml drives the java class
+edu.mayo.bmi.smoking.cc.RecordResolutionCasConsumer iterates over all
+sentences (each CAS equals one sentence) for a record and resolves the final
+classification value for the record. Output is saved to an delimited file.
+Additionally, optionally provides the overall patient level classification
+based on record level classification.</p>
+<p><strong>Parameters</strong><br />
+OutputFile &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = 'c:\temp\record_resolution.txt')</p>
+<p>Specifies the location of the detail and summary report.</p>
+<p>Delimiter &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = '|')</p>
+<p>Specifies the delimiter for the output file.</p>
+<p>ProcessingCDADocument &lt;Boolean/Single-valued/Required&gt;</p>
+<p>(Default Value = 'false')</p>
+<p>Specifies whether the processed files should be handled as CDA documents.</p>
+<p>RunPatientLevelClassification &lt;Boolean/Single-valued/Required&gt;</p>
+<p>(Default Value = 'false')</p>
+<p>Specifies whether the post processing step of generating a summary patient
+level classification is done.</p>
+<p>FinalClassificationOutputFile &lt;String/Single-valued/Optional&gt;</p>
+<p>(Default Value = 'null')</p>
+<p>Specifies name and location of the summary report file which holds the final
+patient level classifications.</p>
+<p><strong>Resources</strong><br />
+libsvm-2.91.jar</p>
+<p>The support vector machine (SVM) classificiation tool provided at
+/lib/libsvm-2.91.jar used to train the smoking status model.</p>
+<h2 id="how-to-create-your-own-smoking-status-classifier-model">How to Create your own smoking status classifier model</h2>
+<ul>
+<li>Create sentence-level smoking status data with the format of: sentence|class_label (class_label: P, C, S).</li>
+</ul>
+<p>He quit smoking three years ago.|P She is smoking currently.|C The patient has
+a history of tobacco use.|S</p>
+<ul>
+<li>Run the script edu.mayo.bmi.smoking.MLutil.GenerateTrainingData.java on the sentence-level smoking status data to generate the libSVM training data.</li>
+</ul>
+<p>In this script, the variable "dataFile" in main() must point to the sentence-
+level smoking status data. Set the other variables also if necessary. Users
+might create their own keywordFile that contains keywords used in smoking
+status classification (see GenerateTrainingData.java for details.)</p>
+<ul>
+<li>Create new model on the libSVM training data.</li>
+</ul>
+<p>The command with our options used in the current model is:</p>
+<p><strong>java -classpath path_of_libsvm_jar_file svm_train -s 0 -t 1 -g 1 -r 1 -d 1 training_data_file new_model</strong><br />
+Users might use their own customized libSVM options.</p>
+<ul>
+<li>Save new_model in the resources/ss/data/PCS/</li>
+<li>Change the Resources of "PathOfModel" in PcsClassifierAnnotator_libsvm.xml to "new_model"</li>
+</ul>
+  </div>
+ 
+ <div id="footera">
+    <div id="copyrighta">
+      <p>Copyright &#169; 2011 The Apache Software Foundation, Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/>Apache and the Apache feather logo are trademarks of The Apache Software Foundation.</p>
+    </div>
+ </div>
+ 
+</body>
+</html>
+