You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@uima.apache.org by "Luca Dini (CELI) (Created) (JIRA)" <de...@uima.apache.org> on 2012/01/30 11:01:10 UTC

[jira] [Created] (UIMA-2359) Different results of Text Maker in windows and unix

Different results of Text Maker in windows and unix
---------------------------------------------------

                 Key: UIMA-2359
                 URL: https://issues.apache.org/jira/browse/UIMA-2359
             Project: UIMA
          Issue Type: Bug
          Components: Sandbox, TextMarker
    Affects Versions: build-resources-2
         Environment: Windows
            Reporter: Luca Dini (CELI)
            Priority: Minor


The class AbstractApplyScriptHandlerJob when called from the workbenck calls, for reding text to be analyzed the method:
 org.apache.uima.pear.util.FileUtil.loadTextFile(new File(each), "UTF-8");
Such a method return nelines in window as 2 new lines. Therefore basic TextMarker annotations appears like:
line BREAK BREAK
line BREAK BREAK
Therefore grammars written on windows must take into account the double break which make them not applicable when running on unix or when using other read methods, such as:
    		Scanner sc = new Scanner(inFile, "UTF-8");
    		String out = "";
    		while (sc.hasNextLine()) {
    			out += sc.nextLine() + "\n";
    		}

Relates to:
https://issues.apache.org/jira/browse/UIMA-2133t

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (UIMA-2359) Different results of Text Maker in windows and unix

Posted by "Peter Klügl (JIRA)" <de...@uima.apache.org>.

     [ https://issues.apache.org/jira/browse/UIMA-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Klügl resolved UIMA-2359.
-------------------------------

    Resolution: Fixed

I do not want to change the behavior of BREAK, because I think there could be situations where someone want to distinguish between \r and \n. However, I added in UIMA-2452 a PlainTextAnnotator, which creates platform-independent Line annotations. Thus, I'd say this issue is resolved.
                
> Different results of Text Maker in windows and unix
> ---------------------------------------------------
>
>                 Key: UIMA-2359
>                 URL: https://issues.apache.org/jira/browse/UIMA-2359
>             Project: UIMA
>          Issue Type: Bug
>          Components: Sandbox, TextMarker
>    Affects Versions: build-resources-2
>         Environment: Windows
>            Reporter: Luca Dini (CELI)
>            Assignee: Peter Klügl
>            Priority: Minor
>              Labels: patch
>
> The class AbstractApplyScriptHandlerJob when called from the workbenck calls, for reding text to be analyzed the method:
>  org.apache.uima.pear.util.FileUtil.loadTextFile(new File(each), "UTF-8");
> Such a method return nelines in window as 2 new lines. Therefore basic TextMarker annotations appears like:
> line BREAK BREAK
> line BREAK BREAK
> Therefore grammars written on windows must take into account the double break which make them not applicable when running on unix or when using other read methods, such as:
>     		Scanner sc = new Scanner(inFile, "UTF-8");
>     		String out = "";
>     		while (sc.hasNextLine()) {
>     			out += sc.nextLine() + "\n";
>     		}
> Relates to:
> https://issues.apache.org/jira/browse/UIMA-2133t

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (UIMA-2359) Different results of Text Maker in windows and unix

Posted by "Peter Klügl (JIRA)" <de...@uima.apache.org>.

    [ https://issues.apache.org/jira/browse/UIMA-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409428#comment-13409428 ] 

Peter Klügl commented on UIMA-2359:
-----------------------------------

Is there a generic solution for this problem? I would not restrict the functionality to either of both cases. Should only one break be created by the lexer? In my applications, I solved this on the rule-level, but I am open to any suggestions and improvements.
                
> Different results of Text Maker in windows and unix
> ---------------------------------------------------
>
>                 Key: UIMA-2359
>                 URL: https://issues.apache.org/jira/browse/UIMA-2359
>             Project: UIMA
>          Issue Type: Bug
>          Components: Sandbox, TextMarker
>    Affects Versions: build-resources-2
>         Environment: Windows
>            Reporter: Luca Dini (CELI)
>            Assignee: Peter Klügl
>            Priority: Minor
>              Labels: patch
>
> The class AbstractApplyScriptHandlerJob when called from the workbenck calls, for reding text to be analyzed the method:
>  org.apache.uima.pear.util.FileUtil.loadTextFile(new File(each), "UTF-8");
> Such a method return nelines in window as 2 new lines. Therefore basic TextMarker annotations appears like:
> line BREAK BREAK
> line BREAK BREAK
> Therefore grammars written on windows must take into account the double break which make them not applicable when running on unix or when using other read methods, such as:
>     		Scanner sc = new Scanner(inFile, "UTF-8");
>     		String out = "";
>     		while (sc.hasNextLine()) {
>     			out += sc.nextLine() + "\n";
>     		}
> Relates to:
> https://issues.apache.org/jira/browse/UIMA-2133t

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (UIMA-2359) Different results of Text Maker in windows and unix

Posted by "Peter Klügl (Assigned JIRA)" <de...@uima.apache.org>.

     [ https://issues.apache.org/jira/browse/UIMA-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Klügl reassigned UIMA-2359:
---------------------------------

    Assignee: Peter Klügl
    
> Different results of Text Maker in windows and unix
> ---------------------------------------------------
>
>                 Key: UIMA-2359
>                 URL: https://issues.apache.org/jira/browse/UIMA-2359
>             Project: UIMA
>          Issue Type: Bug
>          Components: Sandbox, TextMarker
>    Affects Versions: build-resources-2
>         Environment: Windows
>            Reporter: Luca Dini (CELI)
>            Assignee: Peter Klügl
>            Priority: Minor
>              Labels: patch
>
> The class AbstractApplyScriptHandlerJob when called from the workbenck calls, for reding text to be analyzed the method:
>  org.apache.uima.pear.util.FileUtil.loadTextFile(new File(each), "UTF-8");
> Such a method return nelines in window as 2 new lines. Therefore basic TextMarker annotations appears like:
> line BREAK BREAK
> line BREAK BREAK
> Therefore grammars written on windows must take into account the double break which make them not applicable when running on unix or when using other read methods, such as:
>     		Scanner sc = new Scanner(inFile, "UTF-8");
>     		String out = "";
>     		while (sc.hasNextLine()) {
>     			out += sc.nextLine() + "\n";
>     		}
> Relates to:
> https://issues.apache.org/jira/browse/UIMA-2133t

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira