You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2016/07/24 23:48:20 UTC

[jira] [Updated] (PIG-4958) Tez autoparallelism estimation for order by is higher than mapreduce

     [ https://issues.apache.org/jira/browse/PIG-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy updated PIG-4958:
------------------------------------
    Attachment: PIG-4958-withoutsecurity.patch

Attached initial version of the patch. Patch is complete except that it currently does not work in a secure cluster.  client.getVertexStatus gives this error message and fails. 

{code}
2016-07-24 23:38:11,693 [WARN] [TezChild] |ipc.Client|: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
{code}

Still trying to figure out how to get it working playing with ugi.doAs, etc. Will update patch once that is done.

> Tez autoparallelism estimation for order by is higher than mapreduce
> --------------------------------------------------------------------
>
>                 Key: PIG-4958
>                 URL: https://issues.apache.org/jira/browse/PIG-4958
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.17.0
>
>         Attachments: PIG-4958-withoutsecurity.patch
>
>
>   The input size is calculated from the size of the samples in memory. Size in memory is usually 4x or more than the serialized size. Mapreduce estimates the number of reducers based on serialized size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)