You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2008/10/16 23:16:44 UTC

[jira] Created: (PIG-497) dump does not deal with non-ascii data

dump does not deal with non-ascii data
--------------------------------------

                 Key: PIG-497
                 URL: https://issues.apache.org/jira/browse/PIG-497
             Project: Pig
          Issue Type: Bug
    Affects Versions: types_branch
            Reporter: Olga Natkovich
             Fix For: types_branch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-497) dump does not deal with non-ascii data

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-497:
-------------------------------

    Assignee: Pradeep Kamath  (was: Santhosh Srinivasan)
      Status: Patch Available  (was: Open)

Patch attached.
There were three issues which were resolved:
- DataReaderWriter was using DataOutput.writeBytes(String) instead of DataOutput.writeUTF(String). Likewise it was using DataInput.readFully(bytes[]) instead of DataInput.readUTF(). The earlier calls get only lower 8bits out of each character in the string which would mess up multi byte UTF8 data
- illustrate and dump eventually use System.out.println to output results and System.out.println() writes bytes in platform default encoding which is typically UTF-16. This was changed to System.write(String.getBytes("UTF-8")

> dump does not deal with non-ascii data
> --------------------------------------
>
>                 Key: PIG-497
>                 URL: https://issues.apache.org/jira/browse/PIG-497
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-497) dump does not deal with non-ascii data

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-497:
-------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

patch committed, thanks pradeep

> dump does not deal with non-ascii data
> --------------------------------------
>
>                 Key: PIG-497
>                 URL: https://issues.apache.org/jira/browse/PIG-497
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>
>         Attachments: PIG-497-2.patch, PIG-497.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-497) dump does not deal with non-ascii data

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-497:
-------------------------------

    Attachment: PIG-497.patch

> dump does not deal with non-ascii data
> --------------------------------------
>
>                 Key: PIG-497
>                 URL: https://issues.apache.org/jira/browse/PIG-497
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>
>         Attachments: PIG-497.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-497) dump does not deal with non-ascii data

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-497:
-------------------------------

    Attachment: PIG-497-2.patch

New version of patch attached with following changes:
- Rolled back the changes to GruntParser.java and ExampleGenerator.java. In the earlier patch System.out.println(String) was replaced by System.out.write(String.getBytes("UTF-8"). This would force the output to always be in UTF-8 for "illustrate" and "dump" commands. This has been reverted back to System.out.println() so that the output is in the VM's default charset (which can be controlled by the LANG environment variable in UNIX). This is to allow users to choose their charset for output.
- Changed Util.createInputFile (helper function used by the unit test) to write the input file in UTF-8 encoding so that the unit test introduced in this patch can run without the need to have "LANG" environment variable set up.

> dump does not deal with non-ascii data
> --------------------------------------
>
>                 Key: PIG-497
>                 URL: https://issues.apache.org/jira/browse/PIG-497
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>
>         Attachments: PIG-497-2.patch, PIG-497.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-497) dump does not deal with non-ascii data

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Santhosh Srinivasan updated PIG-497:
------------------------------------

    Assignee: Santhosh Srinivasan

> dump does not deal with non-ascii data
> --------------------------------------
>
>                 Key: PIG-497
>                 URL: https://issues.apache.org/jira/browse/PIG-497
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.