You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2008/09/17 02:27:44 UTC

[jira] Created: (PIG-435) wrong columns produced if incomplete definition provided during load

wrong columns produced if incomplete definition provided during load
--------------------------------------------------------------------

                 Key: PIG-435
                 URL: https://issues.apache.org/jira/browse/PIG-435
             Project: Pig
          Issue Type: Bug
    Affects Versions: types_branch
            Reporter: Olga Natkovich
            Assignee: Pradeep Kamath
             Fix For: types_branch


Scrip:

A = load 'studenttab10k' as (name); -- note that data has more than 1 column
B = load 'votertab10k' as (name, age, reg, contrib);
D = COGROUP A by name, B by name;  
E = foreach D generate flatten(A), flatten(B); 
F = foreach E generate registration, contr;
dump F;

The dump produces the wrong columns. This is because even though we declared only one column, we actually load all columns of A. So any place where we explicitely or implicitely use A.* as the case in flatten, we would produce the wrong results.

The long term solution is actually to push projections into the load. Shorter term the proposal is to notice if the script uses A.* and stick a project after the load. Note that we don't need to do that if types are declared because there will be already casting foreach there.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-435) wrong columns produced if incomplete definition provided during load

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-435:
-------------------------------

    Priority: Minor  (was: Major)

Needs further discussion

> wrong columns produced if incomplete definition provided during load
> --------------------------------------------------------------------
>
>                 Key: PIG-435
>                 URL: https://issues.apache.org/jira/browse/PIG-435
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Pradeep Kamath
>            Priority: Minor
>             Fix For: types_branch
>
>
> Scrip:
> A = load 'studenttab10k' as (name); -- note that data has more than 1 column
> B = load 'votertab10k' as (name, age, reg, contrib);
> D = COGROUP A by name, B by name;  
> E = foreach D generate flatten(A), flatten(B); 
> F = foreach E generate registration, contr;
> dump F;
> The dump produces the wrong columns. This is because even though we declared only one column, we actually load all columns of A. So any place where we explicitely or implicitely use A.* as the case in flatten, we would produce the wrong results.
> The long term solution is actually to push projections into the load. Shorter term the proposal is to notice if the script uses A.* and stick a project after the load. Note that we don't need to do that if types are declared because there will be already casting foreach there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.