You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Adam Szita (JIRA)" <ji...@apache.org> on 2017/04/27 13:01:04 UTC

[jira] [Commented] (PIG-5230) Fix the RuntimeException throws in SecondaryKeySortUtil

    [ https://issues.apache.org/jira/browse/PIG-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15986566#comment-15986566 ] 

Adam Szita commented on PIG-5230:
---------------------------------

[~kellyzly] right, I left this untouched. I think the intent here was to signal if no tuples were processed by AccumulateByKey function. Since I started using the {{initialzed}} flag I think we should keep depending on that. See [^PIG-5230.2.patch]

> Fix the RuntimeException throws in SecondaryKeySortUtil
> -------------------------------------------------------
>
>                 Key: PIG-5230
>                 URL: https://issues.apache.org/jira/browse/PIG-5230
>             Project: Pig
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>         Attachments: PIG-5230.2.patch, PIG-5230.patch
>
>
> there is possibility that [curKey is null| https://github.com/apache/pig/blob/63968e3132ad1fee06dffcacb8ea5d399e0edef5/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SecondaryKeySortUtil.java#L116] after PIG-5164.  we should remove the code to avoid RuntimeException.
> following script can trigger the exception.
> {code}
> a = load './studenttab10k.mk1' as (name, age:int, gpa:float);
> a1 = filter a by gpa is null or gpa >= 3.9;
> a2 = filter a by gpa < 2;
> b = union a1, a2;
> c = load './voternulltab10k' as (name, age, registration, contributions);
> d = join b by name left outer, c by name using 'replicated';
> e = stream d through `cat` as (name, age, gpa, name1, age1, registration, contributions);
> f = foreach e generate name, age, gpa, registration, contributions;
> g = group f by name;
> g1 = group f by name; -- Two separate groupbys to ensure secondary key partitioner
> h = foreach g { 
>     inner1 = order f by age, gpa, registration, contributions;
>     inner2 = limit inner1 1;
>     generate inner2, SUM(f.age); };
> i = foreach g1 {
>     inner1 = order f by age asc, gpa desc, registration asc, contributions desc;
>     inner2 = limit inner1 1;
>     generate inner2, SUM(f.age); };
> store h into './MultiQuery_Union_3.1.out';
> store i into './MultiQuery_Union_3.2.out';
> {code}
> cat studenttab10k.mk1
> {code}
> ulysses thompson	64	1.90
> katie carson	25	3.65
> 	65	0.73
> holly davidson	57	2.43
> fred miller	55	3.77
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)