You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@ctakes.apache.org by "James Joseph Masanz (JIRA)" <ji...@apache.org> on 2013/09/17 11:07:54 UTC
[jira] [Closed] (CTAKES-227) Broca's -> PunctuationToken instead of
ContractionToken - caused by apostrophe seen as sentence ending
[ https://issues.apache.org/jira/browse/CTAKES-227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
James Joseph Masanz closed CTAKES-227.
--------------------------------------
> Broca's -> PunctuationToken instead of ContractionToken - caused by apostrophe seen as sentence ending
> ------------------------------------------------------------------------------------------------------
>
> Key: CTAKES-227
> URL: https://issues.apache.org/jira/browse/CTAKES-227
> Project: cTAKES
> Issue Type: Bug
> Components: ctakes-core
> Affects Versions: 3.1
> Reporter: James Joseph Masanz
> Assignee: James Joseph Masanz
> Fix For: 3.1
>
>
> The recently rebuilt sentence detector (currently in trunk and the 3.1.0 branch) is sometimes taking the apostrophe as a sentence break where the ctakes-3.0.0-incubating model didn’t.
> The training data used for the recently rebuilt model only contains only 7 lines that end with an apostrophe (single quote) followed immediately by a newline
> It has >100K occurrences of 's
> It has >175K occurrences of the ' character in all.
> The place I noticed this is in testfakenote.txt.xml in ctakes-regression-test.
> The word "Broca's" used to have a ContractionToken but since a sentence is now ending on the apostrophe, the apostrophe is getting annotated as a PunctuationToken.
> See more in the thread started at
> http://markmail.org/message/wavipejszlspzo5u
> including examples that split correctly and incorrectly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira