You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "David Bowen (JIRA)" <ji...@apache.org> on 2007/03/27 19:39:32 UTC

[jira] Created: (HADOOP-1162) Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.

Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.
-----------------------------------------------------------------------------------------

                 Key: HADOOP-1162
                 URL: https://issues.apache.org/jira/browse/HADOOP-1162
             Project: Hadoop
          Issue Type: Bug
            Reporter: David Bowen


The unit test only tests that an empty byte buffer can be serialized and deserialized.  If you put in some bytes in the range 0..15 the test will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1162) Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Milind Bhandarkar updated HADOOP-1162:
--------------------------------------

    Attachment: jute-patch.txt

This bug also affects the XML serialization of buffers. This patch fixes it, and also modifies TestrecordIO to check for the proper serialization of Buffers.

> Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1162
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.12.2
>            Reporter: David Bowen
>             Fix For: 0.12.3
>
>         Attachments: jute-patch.txt
>
>
> The unit test only tests that an empty byte buffer can be serialized and deserialized.  If you put in some bytes in the range 0..15 the test will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1162) Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484520 ] 

Milind Bhandarkar commented on HADOOP-1162:
-------------------------------------------

I agree that the unit test needs to be enhanced. However, I have another program in the same directory called recordBench, that generates buffers with random bytes. It seems to work. I will doublecheck though.

> Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1162
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: David Bowen
>
> The unit test only tests that an empty byte buffer can be serialized and deserialized.  If you put in some bytes in the range 0..15 the test will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1162) Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484623 ] 

Milind Bhandarkar commented on HADOOP-1162:
-------------------------------------------

No, I use hex pairs. So, one byte is serialized as two chars between 0-9a-f.

> Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1162
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.12.2
>            Reporter: David Bowen
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.12.3
>
>         Attachments: jute-patch.txt
>
>
> The unit test only tests that an empty byte buffer can be serialized and deserialized.  If you put in some bytes in the range 0..15 the test will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1162) Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.

Posted by "David Bowen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484611 ] 

David Bowen commented on HADOOP-1162:
-------------------------------------

Not quite two times, I think.  Are you switching to base-64 encoding?

- David




> Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1162
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.12.2
>            Reporter: David Bowen
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.12.3
>
>         Attachments: jute-patch.txt
>
>
> The unit test only tests that an empty byte buffer can be serialized and deserialized.  If you put in some bytes in the range 0..15 the test will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1162) Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.

Posted by "David Bowen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484577 ] 

David Bowen commented on HADOOP-1162:
-------------------------------------


+1.  Code reviewed and looks fine.  A small safe fix.

> Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1162
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.12.2
>            Reporter: David Bowen
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.12.3
>
>         Attachments: jute-patch.txt
>
>
> The unit test only tests that an empty byte buffer can be serialized and deserialized.  If you put in some bytes in the range 0..15 the test will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (HADOOP-1162) Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.

Posted by David Bowen <db...@yahoo-inc.com>.
>
> Oh, i misunderstood your question. I am switching buffer serialization to just plain bytes except for 5 characters that are escaped (essentially similar to string serialization as if the string were iso-8859-1.)
>   
I'm not sure I follow.  I think string serialization implies UTF-8
encoding?  That means bytes in the range 128-255 would take 2 bytes.  If
we assume that in a byte buffer, all byte values are equally probable,
then the average space for CSV serialization, per byte, would be 1.5
bytes, or 12 bits.  Right?  Actually a little more because you escape
those 5 characters too.

So why not use base64 encoding?  The expansion factor would be less,
since it essentially uses 8 bits to represent 6.  Also, it omits control
characters which I think would be a problem with what you're suggesting
- we need the CSV files to be human readable, so I think you'd have to
escape them too.

Or else just leave the encoding as two hex digits per byte. 


[jira] Commented: (HADOOP-1162) Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484625 ] 

Milind Bhandarkar commented on HADOOP-1162:
-------------------------------------------

Oh, i misunderstood your question. I am switching buffer serialization to just plain bytes except for 5 characters that are escaped (essentially similar to string serialization as if the string were iso-8859-1.)

> Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1162
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.12.2
>            Reporter: David Bowen
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.12.3
>
>         Attachments: jute-patch.txt
>
>
> The unit test only tests that an empty byte buffer can be serialized and deserialized.  If you put in some bytes in the range 0..15 the test will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1162) Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484524 ] 

Milind Bhandarkar commented on HADOOP-1162:
-------------------------------------------

I found the bug. It is in Buffer.toString. However, I am in the process of changing CSV serialization of Buffers. (Currently it is supposed to be hex pairs) which results in two times the optimal space.

> Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1162
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: David Bowen
>
> The unit test only tests that an empty byte buffer can be serialized and deserialized.  If you put in some bytes in the range 0..15 the test will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1162) Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484819 ] 

Hadoop QA commented on HADOOP-1162:
-----------------------------------

Integrated in Hadoop-Nightly #39 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/39/)

> Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1162
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.12.2
>            Reporter: David Bowen
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.12.3
>
>         Attachments: jute-patch.txt
>
>
> The unit test only tests that an empty byte buffer can be serialized and deserialized.  If you put in some bytes in the range 0..15 the test will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1162) Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Milind Bhandarkar updated HADOOP-1162:
--------------------------------------

        Fix Version/s: 0.12.3
    Affects Version/s: 0.12.2
               Status: Patch Available  (was: Open)

> Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1162
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.12.2
>            Reporter: David Bowen
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.12.3
>
>         Attachments: jute-patch.txt
>
>
> The unit test only tests that an empty byte buffer can be serialized and deserialized.  If you put in some bytes in the range 0..15 the test will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1162) Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1162:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Milind!

> Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1162
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.12.2
>            Reporter: David Bowen
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.12.3
>
>         Attachments: jute-patch.txt
>
>
> The unit test only tests that an empty byte buffer can be serialized and deserialized.  If you put in some bytes in the range 0..15 the test will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-1162) Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Milind Bhandarkar reassigned HADOOP-1162:
-----------------------------------------

    Assignee: Milind Bhandarkar

> Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1162
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.12.2
>            Reporter: David Bowen
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.12.3
>
>         Attachments: jute-patch.txt
>
>
> The unit test only tests that an empty byte buffer can be serialized and deserialized.  If you put in some bytes in the range 0..15 the test will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1162) Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484537 ] 

Hadoop QA commented on HADOOP-1162:
-----------------------------------

+1, because http://issues.apache.org/jira/secure/attachment/12354338/jute-patch.txt applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/522597. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

> Record IO: seariliizing a byte buffer to CSV fails if buffer contains bytes less than 16.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1162
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.12.2
>            Reporter: David Bowen
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.12.3
>
>         Attachments: jute-patch.txt
>
>
> The unit test only tests that an empty byte buffer can be serialized and deserialized.  If you put in some bytes in the range 0..15 the test will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.