You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2008/10/14 18:55:44 UTC
[jira] Created: (PIG-494) Utf8StorageConverter.bytesToCharArray
does not properly do utf8 conversions
Utf8StorageConverter.bytesToCharArray does not properly do utf8 conversions
---------------------------------------------------------------------------
Key: PIG-494
URL: https://issues.apache.org/jira/browse/PIG-494
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: types_branch
Reporter: Alan Gates
Fix For: types_branch
This function just does new String(bytes[]). It needs instead to use a CharsetDecoder (see BufferedPositionedInputStream.readLine in pig 1.x). This causes non-ascii characters to be incorrectly translated from byte arrays to strings.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-494) Utf8StorageConverter.bytesToCharArray
does not properly do utf8 conversions
Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pradeep Kamath updated PIG-494:
-------------------------------
Assignee: Pradeep Kamath
Status: Patch Available (was: Open)
Attached Patch - I used
{code}
String(byte[] bytes, String charsetName)
Constructs a new String by decoding the specified array of bytes using the specified charset.
{code}
instead of using CharsetDecoder.
I had to likewise make a change in PigStorage to use
{code}
getBytes(String charsetName)
Encodes this String into a sequence of bytes using the named charset, storing the result into a new byte array.
{code}
In both the above calls I use "UTF-8" as charset name.
With these changes, users of PigStorage will have to be aware that PigStorage assumes input data to it is in UTF-8 and output from it is in UTF-8 for chararray fields.
> Utf8StorageConverter.bytesToCharArray does not properly do utf8 conversions
> ---------------------------------------------------------------------------
>
> Key: PIG-494
> URL: https://issues.apache.org/jira/browse/PIG-494
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Reporter: Alan Gates
> Assignee: Pradeep Kamath
> Fix For: types_branch
>
>
> This function just does new String(bytes[]). It needs instead to use a CharsetDecoder (see BufferedPositionedInputStream.readLine in pig 1.x). This causes non-ascii characters to be incorrectly translated from byte arrays to strings.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-494) Utf8StorageConverter.bytesToCharArray
does not properly do utf8 conversions
Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich updated PIG-494:
-------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
patch committed; thanks pradeep!
> Utf8StorageConverter.bytesToCharArray does not properly do utf8 conversions
> ---------------------------------------------------------------------------
>
> Key: PIG-494
> URL: https://issues.apache.org/jira/browse/PIG-494
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Reporter: Alan Gates
> Assignee: Pradeep Kamath
> Fix For: types_branch
>
> Attachments: PIG-494.patch
>
>
> This function just does new String(bytes[]). It needs instead to use a CharsetDecoder (see BufferedPositionedInputStream.readLine in pig 1.x). This causes non-ascii characters to be incorrectly translated from byte arrays to strings.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-494) Utf8StorageConverter.bytesToCharArray
does not properly do utf8 conversions
Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pradeep Kamath updated PIG-494:
-------------------------------
Attachment: PIG-494.patch
> Utf8StorageConverter.bytesToCharArray does not properly do utf8 conversions
> ---------------------------------------------------------------------------
>
> Key: PIG-494
> URL: https://issues.apache.org/jira/browse/PIG-494
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Reporter: Alan Gates
> Assignee: Pradeep Kamath
> Fix For: types_branch
>
> Attachments: PIG-494.patch
>
>
> This function just does new String(bytes[]). It needs instead to use a CharsetDecoder (see BufferedPositionedInputStream.readLine in pig 1.x). This causes non-ascii characters to be incorrectly translated from byte arrays to strings.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.