You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@ctakes.apache.org by "Sean Finan (JIRA)" <ji...@apache.org> on 2015/07/23 16:59:04 UTC
[jira] [Created] (CTAKES-372) Penn TreeBank Tokenizer could use
some attention
Sean Finan created CTAKES-372:
---------------------------------
Summary: Penn TreeBank Tokenizer could use some attention
Key: CTAKES-372
URL: https://issues.apache.org/jira/browse/CTAKES-372
Project: cTAKES
Issue Type: Improvement
Components: ctakes-core
Affects Versions: future enhancement
Reporter: Sean Finan
Priority: Minor
The ptb tokenizer currently in use by ctakes has some inconsistencies. See https://issues.apache.org/jira/browse/CTAKES-371 It also does not seem to incorporate some of the clinical rules set out in http://clear.colorado.edu/compsem/documents/treebank_guidelines.pdf
Some major refactoring is also in order ... as are numerous test cases.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)