You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2006/09/08 19:49:22 UTC

[jira] Created: (HADOOP-517) readLine function of UTF8ByteArrayUtils does not handle end of line correctly

readLine function of UTF8ByteArrayUtils does not handle end of line correctly
-----------------------------------------------------------------------------

                 Key: HADOOP-517
                 URL: http://issues.apache.org/jira/browse/HADOOP-517
             Project: Hadoop
          Issue Type: Bug
         Environment: windows
            Reporter: Hairong Kuang
         Assigned To: Hairong Kuang


The readLine function does not read ahead looking for '\n' after it reads '\r'. Therefore it incorrectly generates two lines, wth the second one empty in the window platform.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-517) readLine function of UTF8ByteArrayUtils does not handle end of line correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-517?page=all ]

Hairong Kuang updated HADOOP-517:
---------------------------------

    Attachment: eol.patch

The attached file fixed the problem.

> readLine function of UTF8ByteArrayUtils does not handle end of line correctly
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-517
>                 URL: http://issues.apache.org/jira/browse/HADOOP-517
>             Project: Hadoop
>          Issue Type: Bug
>         Environment: windows
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>         Attachments: eol.patch
>
>
> The readLine function does not read ahead looking for '\n' after it reads '\r'. Therefore it incorrectly generates two lines, wth the second one empty in the window platform.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-517) readLine function of UTF8ByteArrayUtils does not handle end of line correctly

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-517?page=comments#action_12433464 ] 
            
Doug Cutting commented on HADOOP-517:
-------------------------------------

Can you use the same logic for readLine() as in HADOOP-474?  Ideally we could use the same implementation even.  But I don't like the 'new PushbackInputStream()'. mark() and reset() seem better, if the input stream supports them...

> readLine function of UTF8ByteArrayUtils does not handle end of line correctly
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-517
>                 URL: http://issues.apache.org/jira/browse/HADOOP-517
>             Project: Hadoop
>          Issue Type: Bug
>         Environment: windows
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>         Attachments: eol.patch
>
>
> The readLine function does not read ahead looking for '\n' after it reads '\r'. Therefore it incorrectly generates two lines, wth the second one empty in the window platform.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-517) readLine function of UTF8ByteArrayUtils does not handle end of line correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-517?page=all ]

Hairong Kuang updated HADOOP-517:
---------------------------------

    Attachment: eol.patch

In hadoop streaming all the input streams passed to readLine support mark and reset. So the attached new patch followed Doug's suggestion.

> readLine function of UTF8ByteArrayUtils does not handle end of line correctly
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-517
>                 URL: http://issues.apache.org/jira/browse/HADOOP-517
>             Project: Hadoop
>          Issue Type: Bug
>         Environment: windows
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>             Fix For: 0.6.0
>
>         Attachments: eol.patch, eol.patch
>
>
> The readLine function does not read ahead looking for '\n' after it reads '\r'. Therefore it incorrectly generates two lines, wth the second one empty in the window platform.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-517) readLine function of UTF8ByteArrayUtils does not handle end of line correctly

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-517?page=all ]

Doug Cutting updated HADOOP-517:
--------------------------------

    Fix Version/s: 0.6.0

> readLine function of UTF8ByteArrayUtils does not handle end of line correctly
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-517
>                 URL: http://issues.apache.org/jira/browse/HADOOP-517
>             Project: Hadoop
>          Issue Type: Bug
>         Environment: windows
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>             Fix For: 0.6.0
>
>         Attachments: eol.patch
>
>
> The readLine function does not read ahead looking for '\n' after it reads '\r'. Therefore it incorrectly generates two lines, wth the second one empty in the window platform.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-517) readLine function of UTF8ByteArrayUtils does not handle end of line correctly

Posted by "Michel Tourn (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-517?page=comments#action_12433465 ] 
            
Michel Tourn commented on HADOOP-517:
-------------------------------------

A PushbackInputStream seems overkill for what we are talking about. And performance is critical here.

There is a standard efficient code pattern that will consume:
'\r'
or 
'\n'
or 
\r\n

(for Mac, Unix, Windows respectively)


> readLine function of UTF8ByteArrayUtils does not handle end of line correctly
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-517
>                 URL: http://issues.apache.org/jira/browse/HADOOP-517
>             Project: Hadoop
>          Issue Type: Bug
>         Environment: windows
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>         Attachments: eol.patch
>
>
> The readLine function does not read ahead looking for '\n' after it reads '\r'. Therefore it incorrectly generates two lines, wth the second one empty in the window platform.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Resolved: (HADOOP-517) readLine function of UTF8ByteArrayUtils does not handle end of line correctly

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-517?page=all ]

Doug Cutting resolved HADOOP-517.
---------------------------------

    Resolution: Fixed

I just committed this.  Thanks, Hairong!

> readLine function of UTF8ByteArrayUtils does not handle end of line correctly
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-517
>                 URL: http://issues.apache.org/jira/browse/HADOOP-517
>             Project: Hadoop
>          Issue Type: Bug
>         Environment: windows
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>             Fix For: 0.6.0
>
>         Attachments: eol.patch, eol.patch
>
>
> The readLine function does not read ahead looking for '\n' after it reads '\r'. Therefore it incorrectly generates two lines, wth the second one empty in the window platform.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira