You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Shady Hussein <sh...@gmail.com> on 2013/02/11 11:33:03 UTC

Minor DateAnnotation bug

Hi,
  While working with DateAnnotation and add some new state machines in the
DateFSM.java, i found a minor bug regarding the starting and ending index
of DateAnnotation.

Consider the small example

"October 2003 November 2010 cTAKES is the best framework".

The result is supposed to be "October 2003" and "November 2010", but cTAKES
detects "October 2003" and "October 2003 November 2010".

This is because the FSM detects the first one and as it has no record in
the "tokenStartMap" so it assumes the starting index as "0". Then it starts
detecting the second date but also there is no record for it in the map
yet(as there is a value in the map only when the state is a starting state,
in other words a condition that is not satisfying any state), so it assumes
the starting index is "0".

Thats why for example if there is an intermediate token between the two
dates, it will work fine.

The solution is simply to put a record in the map before resetting the FSM.
so this line should be put "tokenStartMap.put(fsm, new Integer(i));".

Hope this will help :)

-- 
Thanks and best Regards,

Shady AbdelAziz

RE: Minor DateAnnotation bug

Posted by "Masanz, James J." <Ma...@mayo.edu>.
Thanks Shady!
I created an issues for this:
https://issues.apache.org/jira/browse/CTAKES-158

Regards, 
James Masanz

> -----Original Message-----
> From: ctakes-dev-return-1186-Masanz.James=mayo.edu@incubator.apache.org
> [mailto:ctakes-dev-return-1186-Masanz.James=mayo.edu@incubator.apache.org]
> On Behalf Of Shady Hussein
> Sent: Monday, February 11, 2013 4:33 AM
> To: ctakes-dev@incubator.apache.org
> Subject: Minor DateAnnotation bug
> 
> Hi,
>   While working with DateAnnotation and add some new state machines in the
> DateFSM.java, i found a minor bug regarding the starting and ending index
> of DateAnnotation.
> 
> Consider the small example
> 
> "October 2003 November 2010 cTAKES is the best framework".
> 
> The result is supposed to be "October 2003" and "November 2010", but
> cTAKES detects "October 2003" and "October 2003 November 2010".
> 
> This is because the FSM detects the first one and as it has no record in
> the "tokenStartMap" so it assumes the starting index as "0". Then it
> starts detecting the second date but also there is no record for it in the
> map yet(as there is a value in the map only when the state is a starting
> state, in other words a condition that is not satisfying any state), so it
> assumes the starting index is "0".
> 
> Thats why for example if there is an intermediate token between the two
> dates, it will work fine.
> 
> The solution is simply to put a record in the map before resetting the
> FSM.
> so this line should be put "tokenStartMap.put(fsm, new Integer(i));".
> 
> Hope this will help :)
> 
> --
> Thanks and best Regards,
> 
> Shady AbdelAziz