You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2016/05/31 23:02:13 UTC

[jira] [Resolved] (PIG-4821) Pig chararray field with special UTF-8 chars as part of tuple join key produces wrong results in Tez

     [ https://issues.apache.org/jira/browse/PIG-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy resolved PIG-4821.
-------------------------------------
       Resolution: Fixed
     Hadoop Flags: Reviewed
    Fix Version/s: 0.15.1

Committed to branch-0.15, branch-0.16 and trunk. Thanks for the review Daniel.

> Pig chararray field with special UTF-8 chars as part of tuple join key produces wrong results in Tez
> ----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-4821
>                 URL: https://issues.apache.org/jira/browse/PIG-4821
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0, 0.15.1
>
>         Attachments: PIG-4821-1.patch
>
>
> SedesHelper.writeChararray does writeUTF, but we do str1 = new String(bb1.array(), bb1.position(), casz1, BinInterSedes.UTF8); when reading it in the BinInterSedesTupleRawComparator https://github.com/apache/pig/blob/e0c5f265c68491395d8303c86195445be3d8aecf/src/org/apache/pig/data/BinInterSedes.java#L959-L964. For some reason, this works fine in my MAC (both jdk7 and jdk8) but not in Linux. Not sure about the actual cause and have not dug into it. Suspecting either charset environment or the specific update of jdk 8 (different in my MAC and Linux).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)