You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2015/07/08 01:40:05 UTC

[jira] [Created] (PIG-4627) [Pig on Tez] Group by on multiple keys is slow and Self join does not handle null values correctly

Rohini Palaniswamy created PIG-4627:
---------------------------------------

             Summary: [Pig on Tez] Group by on multiple keys is slow and Self join does not handle null values correctly
                 Key: PIG-4627
                 URL: https://issues.apache.org/jira/browse/PIG-4627
             Project: Pig
          Issue Type: Bug
            Reporter: Rohini Palaniswamy
            Assignee: Rohini Palaniswamy
             Fix For: 0.16.0, 0.15.1


  These are issues with using slow comparators or bugs in comparators.

  Tez is using PigTupleSortComparator and mapreduce is using PigTupleWritableComparator on the mapside for comparing tuples.  PigTupleSortComparator is very inefficient and makes it really slow for group by. 

  Self join does not produce right results in case of null after PIG-4495 which writes multiple inputs into same tez input. Need the https://issues.apache.org/jira/secure/attachment/12628162/PIG-3761-1.patch fix of  PIG-3761 to handle that by comparing indexes in the raw comparators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)