You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Keith R. Bennett (JIRA)" <ji...@apache.org> on 2007/09/28 00:20:51 UTC

[jira] Created: (TIKA-38) TXTParser appends a space to the text found in the file.

TXTParser appends a space to the text found in the file.
--------------------------------------------------------

                 Key: TIKA-38
                 URL: https://issues.apache.org/jira/browse/TIKA-38
             Project: Tika
          Issue Type: Bug
          Components: general
    Affects Versions: 0.1-incubator
            Reporter: Keith R. Bennett
            Priority: Minor
             Fix For: 0.1-incubator


TXTParser adds a space to the content it reads from a file.  When it parsed a file containing "1", it returned as the full text "1 " (space appended).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (TIKA-38) TXTParser appends a space to the text found in the file.

Posted by "Rida Benjelloun (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rida Benjelloun reassigned TIKA-38:
-----------------------------------

    Assignee: Rida Benjelloun

> TXTParser appends a space to the text found in the file.
> --------------------------------------------------------
>
>                 Key: TIKA-38
>                 URL: https://issues.apache.org/jira/browse/TIKA-38
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Assignee: Rida Benjelloun
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>         Attachments: tika38.patch
>
>
> TXTParser adds a space to the content it reads from a file.  When it parsed a file containing "1", it returned as the full text "1 " (space appended).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (TIKA-38) TXTParser appends a space to the text found in the file.

Posted by "Rida Benjelloun (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rida Benjelloun closed TIKA-38.
-------------------------------


> TXTParser appends a space to the text found in the file.
> --------------------------------------------------------
>
>                 Key: TIKA-38
>                 URL: https://issues.apache.org/jira/browse/TIKA-38
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Assignee: Rida Benjelloun
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>         Attachments: tika38.patch
>
>
> TXTParser adds a space to the content it reads from a file.  When it parsed a file containing "1", it returned as the full text "1 " (space appended).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (TIKA-38) TXTParser appends a space to the text found in the file.

Posted by "Rida Benjelloun (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rida Benjelloun resolved TIKA-38.
---------------------------------

    Resolution: Fixed

SVN commit

> TXTParser appends a space to the text found in the file.
> --------------------------------------------------------
>
>                 Key: TIKA-38
>                 URL: https://issues.apache.org/jira/browse/TIKA-38
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Assignee: Rida Benjelloun
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>         Attachments: tika38.patch
>
>
> TXTParser adds a space to the content it reads from a file.  When it parsed a file containing "1", it returned as the full text "1 " (space appended).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-38) TXTParser appends a space to the text found in the file.

Posted by "Keith R. Bennett (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Keith R. Bennett updated TIKA-38:
---------------------------------

    Attachment: tika38.patch

In making these code changes, I made some assumptions.  If they are not valid, then the code change needs to be changed.  Here they are:

1) We want the text file being parsed to come through the parser exactly as it is stored, character by character, except that any line termination sequences should be translated to a newline ('\n') (as they would normally be represented in a Java string).

2) The BufferedReader does the line ending translation for us, so we will only see '\n' as a line terminator.

3) Using StringBuilder is better than using StringBuffer (now that we know we are using Java 1.5 we have the option).

4) Calling StringBuilder.read() is better than calling StringBuilder.readLine() because with readLine() we have no way of knowing whether or not a newline terminated the last line.  Also, we don't have to store a possibly arbitrarily long string in memory.

5) Calling StringBuilder.read() is slightly simpler than calling StringBuilder.read(char[],int,int) and may not be significantly slower (?).



> TXTParser appends a space to the text found in the file.
> --------------------------------------------------------
>
>                 Key: TIKA-38
>                 URL: https://issues.apache.org/jira/browse/TIKA-38
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>         Attachments: tika38.patch
>
>
> TXTParser adds a space to the content it reads from a file.  When it parsed a file containing "1", it returned as the full text "1 " (space appended).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-38) TXTParser appends a space to the text found in the file.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-38?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531145 ] 

Doug Cutting commented on TIKA-38:
----------------------------------

Another friendly note from the svn commit-message police: Issue names in Jira are case-sensitive.  In order to automatically create the link from the Jira issue to subversion, the commit message must contain the exact issue name, TIKA-38 in this case.

For the record, here's the link that would have been added:

http://svn.apache.org/viewvc?rev=580362&view=rev


> TXTParser appends a space to the text found in the file.
> --------------------------------------------------------
>
>                 Key: TIKA-38
>                 URL: https://issues.apache.org/jira/browse/TIKA-38
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Assignee: Rida Benjelloun
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>         Attachments: tika38.patch
>
>
> TXTParser adds a space to the content it reads from a file.  When it parsed a file containing "1", it returned as the full text "1 " (space appended).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.