You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2015/05/26 23:18:19 UTC

[jira] [Commented] (PIG-4541) Skewed full outer join does not return records if any relation is empty. Outer join does not return any record if left relation is empty

    [ https://issues.apache.org/jira/browse/PIG-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559889#comment-14559889 ] 

Daniel Dai commented on PIG-4541:
---------------------------------

[~dghosal], I tried Pig 0.14.0 with some of your queries, I cannot reproduce the issue:
1.txt:
1,2
{code}
a = load '1.txt' using PigStorage(',') as (a0, a1);
b = filter a by a0==100;
c = join b by a0 full, a by a0 using 'skewed' parallel 10;
dump c;
{code}
{code}
a = load '1.txt' using PigStorage(',') as (a0, a1);
b = filter a by a0==100;
c = join b by a0 full, a by a0 parallel 2;
dump c;
{code}

Both cases I get the right result.

There is a regression in trunk I will fix shortly.

> Skewed full outer join does not return records if any relation is empty. Outer join does not return any record if left relation is empty
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-4541
>                 URL: https://issues.apache.org/jira/browse/PIG-4541
>             Project: Pig
>          Issue Type: Bug
>          Components: build
>    Affects Versions: 0.14.0
>         Environment: HDP 2.2.4
>            Reporter: Dipankar
>            Assignee: Daniel Dai
>             Fix For: 0.15.1
>
>
> Test1:
> Perform full join on two relation with left relation being blank and right containing records
> empty_relation = FILTER a_relation by (join_column=='eliminate everything');
> Test_output = JOIN empty_relation by (join_column) FULL , non_empty_relation by (join_column);
> Result : Zero records returned.
> Test2:
> Perform full join on two relation with left relation being blank and right containing records using skewed
> Test_output = JOIN empty_relation by (join_column) FULL , non_empty_relation by (join_column) using ‘skewed’;
> Result : Zero records returned.
> Test3:
> Perform full join on two relation with left relation being blank and right containing records using parallel
> Test_output = JOIN empty_relation by (join_column) FULL , non_empty_relation by (join_column) PARALLEL 10;
> Result : Zero records returned.
> Test4:
> Perform full join on two relation with left relation being non empty  and right not containing records using parallel
> Test_output = JOIN , non_empty_relation by (join_column) FULL , empty_relation by (join_column) PARALLEL 10;
> Result : valid records  returned.
> Observation:
> 1) If the either relation is blank , skewed full outer join does not return anything
> 2) If the non empty relation is kept on left, everything works except skewed
> 3) FULL OUTER will only work if the left relation is not empty
> 4) Skewed will only work if both relation is non empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)