You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Rob van Dalen (JIRA)" <de...@uima.apache.org> on 2019/03/14 08:13:00 UTC
[jira] [Created] (UIMA-6001) Problem with matching items in
MarkFast with whitespacers visible
Rob van Dalen created UIMA-6001:
-----------------------------------
Summary: Problem with matching items in MarkFast with whitespacers visible
Key: UIMA-6001
URL: https://issues.apache.org/jira/browse/UIMA-6001
Project: UIMA
Issue Type: Bug
Components: Ruta
Affects Versions: 2.6.1ruta
Reporter: Rob van Dalen
Assignee: Peter Klügl
Fix For: 2.7.0ruta
The change / fix in UIMA-4556 cause some problems when using a CSV file with whitespaces.
When we have a dictionary with whitespaces between words and
>> Param PARAM_DICT_REMOVE_WS is TRUE:
When WS are visible in the token stream:
- words with spacers are not recognized (as expected).
When WS are NOT visible in the token stream:
- all items in the dictionary will be recognized
- all items will also be recognized if you add whitespaces between words. For example: IlikeRUTA, Ilike Ruta, I like Ruta all result in the same match.
>> Param PARAM_DICT_REMOVE_WS is FALSE:
When WS are visible in the token stream:
- not all entries in the dictionary will be recognized
When WS are NOT visible in the token stream:
- also not all entries in the dictionary will be recognized
The problem that this cause is that the default value to ignore whitespaces is always true (hardcoded).
{code:java}
private IBooleanExpression ignoreWS = new SimpleBooleanExpression(true);
{code}
This is not correct because if you want to use whitespaces (if they are important) that won't work. The matcher should use the same value as set in the PARAM_DICT_REMOVE_WS parameter or the value that is set via setIgnoreWS method.
-I attached a patch to fix this issue.-
I'm working on a patch.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)