You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@ctakes.apache.org by bu...@apache.org on 2012/11/15 23:59:50 UTC

svn commit: r838547 - in /websites/staging/ctakes/trunk/content: ./ ctakes/2.6.0/ctakes-2.6-Document-Preprocessor.html

Author: buildbot
Date: Thu Nov 15 22:59:50 2012
New Revision: 838547

Log:
Staging update by buildbot for ctakes

Added:
    websites/staging/ctakes/trunk/content/ctakes/2.6.0/ctakes-2.6-Document-Preprocessor.html
Modified:
    websites/staging/ctakes/trunk/content/   (props changed)

Propchange: websites/staging/ctakes/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu Nov 15 22:59:50 2012
@@ -1 +1 @@
-1410087
+1410089

Added: websites/staging/ctakes/trunk/content/ctakes/2.6.0/ctakes-2.6-Document-Preprocessor.html
==============================================================================
--- websites/staging/ctakes/trunk/content/ctakes/2.6.0/ctakes-2.6-Document-Preprocessor.html (added)
+++ websites/staging/ctakes/trunk/content/ctakes/2.6.0/ctakes-2.6-Document-Preprocessor.html Thu Nov 15 22:59:50 2012
@@ -0,0 +1,126 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+ 
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+ 
+       http://www.apache.org/licenses/LICENSE- 2.0
+ 
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+<link href="/ctakes/css/ctakes.css" rel="stylesheet" type="text/css">
+
+<title>cTAKES 2.6 Document Preprocessor</title>
+<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+
+</head>
+ 
+<body>
+ <div class="banner">
+      <div id="bannerleft">
+		<a href="http://www.apache.org/"><img src="http://www.apache.org/images/asf_logo_wide.gif" alt="The Apache Software Foundation" border="0"/></a>
+	<br/>
+			<img alt="cTAKES logo" src="/ctakes/images/ctakes_logo.jpg" border="0"/>
+      </div>  
+    <div id="bannerright">	
+	      <img id="asf-logo" alt="Apache Incubator" src="http://incubator.apache.org/images/egg-logo.png" border="0"/></a>			
+	  </div>
+ </div>  
+  <div id="clear"></div>
+
+
+  <div id="sidenav">
+    <h1 id="general">General</h1>
+<ul>
+<li><a href="/ctakes/index.html">About</a></li>
+<li><a href="/ctakes/gettingstarted.html">Getting Started</a></li>
+<li><a href="/ctakes/downloads.html">Downloads</a></li>
+<li><a href="/ctakes/glossary.html">Glossary</a></li>
+</ul>
+<h1 id="community">Community</h1>
+<ul>
+<li><a href="/ctakes/get-involved.html">Get Involved</a></li>
+<li><a href="https://issues.apache.org/jira/browse/ctakes">Bug Tracker</a></li>
+<li><a href="/ctakes/mailing-lists.html">Mailing Lists</a></li>
+<li><a href="/ctakes/people.html">People</a></li>
+<li><a href="http://incubator.apache.org/projects/ctakes.html">Incubator page</a></li>
+<li><a href="/ctakes/license.html">License</a></li>
+<li><a href="/ctakes/history.html">History</a></li>
+<li><a href="/ctakes/community-faqs.html">Community FAQs</a></li>
+</ul>
+<h1 id="users">Users</h1>
+<ul>
+<li><a href="/ctakes/userguide.html">User Guide</a></li>
+<li><a href="/ctakes/user-faqs.html">User FAQs</a></li>
+</ul>
+<h1 id="developers">Developers</h1>
+<ul>
+<li><a href="/ctakes/developerguide.html">Developer Guide</a></li>
+<li><a href="/ctakes/developer-faqs.html">Developer FAQs</a></li>
+</ul>
+<h1 id="ppmc">PPMC</h1>
+<ul>
+<li><a href="/ctakes/ppmc-faqs.html">PPMC FAQs</a></li>
+<li><a href="/ctakes/ctakes-release-guide.html">Release Guide</a> <br />
+</li>
+</ul>
+<h1 id="asf">ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+</ul>
+  </div>
+  <div id="contenta">
+    <h1 id="ctakes-26-document-preprocessor">cTAKES 2.6 - Document Preprocessor</h1>
+<h2 id="overview-of-document-preprocessor">Overview of Document Preprocessor</h2>
+<p>This component provides a CdaCasInitializer SECTION annotator that transforms
+a Clinical Document Architecture (CDA) document into plain text, provided the
+CDA document conforms to the DTD schema.</p>
+<p>As part of the conversion to plain text, section (segment) markers are
+inserted into the text and hyphens are inserted into words that should be
+hyphenated. The resulting text is stored in a new View, which has its own
+Sofa.</p>
+<p>Sections are detected and Segment (also called "section") annotations are
+added to the CAS. Document level data is extracted and stored in the CAS as
+Property annotations.</p>
+<p><img alt="" src="/images/icons/emoticons/information.png" /></p>
+<p>This does not handle all CDA documents. The CDA document must conform to the
+DTD resources/cda/NotesIIST_RTF.DTD.</p>
+<h2 id="analysis-engines-annotators">Analysis engines (annotators)</h2>
+<h3 id="aggregateaexml">AggregateAE.xml</h3>
+<p>The file cTAKESdesc/docpredesc/AggregateAE.xml defines a pipeline for
+preprocessing documents. The pipeline is a simple pipeline with only one
+delegate analysis engine (one annotator), the CdaCasInitializer, and is
+included for testing. Typically the CdaCasInitializer.xml descriptor is
+included in a more complete pipeline rather than using the AggregateAE.xml
+descriptor that is in this project.</p>
+<h3 id="cdacasinitializerxml">CdaCasInitializer.xml</h3>
+<p>The CdaCasInitializer descriptor defines the analysis engine (annotator) for
+preprocessing documents. It creates a plain text view from a CDA view. The
+plain text view can then be annotated, using other components, for tokens,
+parts of speech, chunks, etc.</p>
+<p><strong>Parameters</strong><br />
+(none)</p>
+  </div>
+ 
+ <div id="footera">
+    <div id="copyrighta">
+      <p>Copyright &#169; 2011 The Apache Software Foundation, Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/>Apache and the Apache feather logo are trademarks of The Apache Software Foundation.</p>
+    </div>
+ </div>
+ 
+</body>
+</html>
+