You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Keith R. Bennett (JIRA)" <ji...@apache.org> on 2007/09/28 07:10:50 UTC

[jira] Updated: (TIKA-38) TXTParser appends a space to the text found in the file.

     [ https://issues.apache.org/jira/browse/TIKA-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Keith R. Bennett updated TIKA-38:
---------------------------------

    Attachment: tika38.patch

In making these code changes, I made some assumptions.  If they are not valid, then the code change needs to be changed.  Here they are:

1) We want the text file being parsed to come through the parser exactly as it is stored, character by character, except that any line termination sequences should be translated to a newline ('\n') (as they would normally be represented in a Java string).

2) The BufferedReader does the line ending translation for us, so we will only see '\n' as a line terminator.

3) Using StringBuilder is better than using StringBuffer (now that we know we are using Java 1.5 we have the option).

4) Calling StringBuilder.read() is better than calling StringBuilder.readLine() because with readLine() we have no way of knowing whether or not a newline terminated the last line.  Also, we don't have to store a possibly arbitrarily long string in memory.

5) Calling StringBuilder.read() is slightly simpler than calling StringBuilder.read(char[],int,int) and may not be significantly slower (?).



> TXTParser appends a space to the text found in the file.
> --------------------------------------------------------
>
>                 Key: TIKA-38
>                 URL: https://issues.apache.org/jira/browse/TIKA-38
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>         Attachments: tika38.patch
>
>
> TXTParser adds a space to the content it reads from a file.  When it parsed a file containing "1", it returned as the full text "1 " (space appended).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.